We are working on creating a single 22b from this model

by rombodawg - opened Apr 10, 2024

Apr 10, 2024

Currently me and a friend are attempting to remove 1 22b expert from this mixtral model to hopefully create it own mistral 22b parameter standalone model. If we succeed, would you like us to upload the weights to this hf account?

rombodawg

Apr 10, 2024

We already have code to do something similar, but we just need to adjust it slightly
https://github.com/MeNicefellow/Mixtral-Expert-Trimmer

CyberTimon

Apr 10, 2024

Do you think it's possible to maybe create a 6x22b or 4x22b model to make it fit into 2x24gb cards better?

rombodawg

Apr 10, 2024

@CyberTimon Unfortunatly this is not possible without severly degrading the performance. The resulting model would basically be useless without fully retraining the router and possibly the entire model. So we are hoping by only removing 1 model and using it by itself it would work well as a standalone model without MoE

CyberTimon

Apr 10, 2024

Ah that's unfortunate. But as far as I understand megablocks / MoE your experiment will also not work. 1 "expert" learns for example sentence positions or have more activations when asking history related facts etc so how are you planning to extract a "working" 22b model?

rombodawg

Apr 10, 2024

"how are you planning to extract a "working" 22b model?"

With alot of hope and prayer

jam4t

Apr 10, 2024

A noobie question, Mistral have released the 8x22B model which is 260 GB (on torrent). So how can this be used for inference ? Does it require the entire model to be laoded into memory, and therefore > 260GB of RAM. Or is this model supposed to be used to create smaller models, that can then be used on normal desktops with decent GPU/RAM) ?

prince-canuma

Unofficial Mistral Community org Apr 10, 2024

You can use the BnB 4bit quantized version:

https://huggingface.co./mistral-community/Mixtral-8x22B-v0.1-4bit

Lambent

Apr 10, 2024

If you manage to grab 1 expert why not each of all 8? It's possible some kind of merge would make them more useful from there? (Or less useful!)

ehartford

Unofficial Mistral Community org Apr 10, 2024

great idea

mrfakename

Unofficial Mistral Community org Apr 10, 2024

Currently me and a friend are attempting to remove 1 22b expert from this mixtral model to hopefully create it own mistral 22b parameter standalone model. If we succeed, would you like us to upload the weights to this hf account?

Just fyi, the author of MergeKit did something similar with mixtral 8x7b and each expert didn’t generate comprehensible text (see DeMixtral), also merging experts together didn’t work. So you might need to fine tune quite a bit to fix it

devingulliver

Apr 11, 2024

@mrfakename I was able to find demixtral, but couldn't find any reports of merging all the experts together. Can you help me find the source on failing to merge experts? Thanks in advance

mrfakename

Unofficial Mistral Community org Apr 11, 2024

Unfortunately also uninterpretable garbage. :( Maybe there's a merge technique that would make something work, but I haven't found one yet.

devingulliver

Apr 11, 2024

Thank you! (for future reference that was said by cg in this GH issue thread)

rombodawg

Apr 11, 2024

Looks like someone did it, but the model seems to lack knowledge
https://huggingface.co./Vezora/Mistral-22B-v0.1

mrfakename

Unofficial Mistral Community org Apr 11, 2024

Looks like someone did it, but the model seems to lack knowledge
https://huggingface.co./Vezora/Mistral-22B-v0.1

The model generated incomprehensible text so they QLoRA'd it and it became a usable model

rekrek

Jan 24

Hi, I am currently playing with the 1x22b version of Vezora-Mistral-22B-v0.2 and dolphin-2.9.1-mixtral-1x22b.

I ran a evaluations against Vezora-Mistral-22b-v0.2 and dolphin-2.9.1-mixtral-1x22b to get PL_Alpha_Hill as describe in AlphaLoRA: Assigning LoRA Experts Based on Layer Training Quality

AlphaLoRA measures the layer training quality based on the HT characteristic of the layer ESDs, which is quantified by the HT metric PL_Alpha_Hill

I am actually curious, is 8x22b massively under trained ?? Does that means it has lots of unused potential ? I was thinking to bring the 1x22b back to MoE but with QLoRA, adaptive rank per layer and uneven expert spreading. Using Parameter-Efficient-MoE and AlphaLoRA.

A 22b q4_k_m + LoRA experts MoE could become the holy grail for 3090 or APU ? A mini Deepseek V3 ?

It would be possible in the specific case of 8x22b to have a minimum of 8 experts per layer corresponding to the actual Mixtral extraction with some layers having lower LoRA rank like r64 and some with much larger LoRA ranks. Also for layers having PL_Alpha_Hill larger than 3.0, it could have double the experts and use some way to trick the router to choose from one of the clones during retraining. Since clone LoRAs would be identical to their original part, it would still work theoretically if the router ask the clone instead of the original. On low PL_Alpha_Hill, we could collapse 2 LoRA into 1 like 8->4 experts by merging and have virtual experts.

Or basically require a rerun of hermes or dolphin training ?

This is the results (note I had to hack the at the code a bit to run on a 3090, so check my merge request).

Thanks to all great contributors !

Paging : @ehartford @Vezora

PL_Alpha_Hill

dolphin-2.9.1-mixtral-1x22b
PL_Alpha_Hill for layer 1: 1.7081                                                                                                                                                                                                                                                                                                                                                             
PL_Alpha_Hill for layer 2: 2.4446                                                                                                                                                                                                                                                                                                                                                             
PL_Alpha_Hill for layer 3: 2.5381                                                                                                                                                                                                                                                                                                                                                             
PL_Alpha_Hill for layer 4: 2.4501                                                                                                                                                                                                                                                                                                                                                             
PL_Alpha_Hill for layer 5: 3.0330                                                                                                                                                                                                                                                                                                                                                             
PL_Alpha_Hill for layer 6: 2.7296                                                                                                                                                                                                                                                                                                                                                             
PL_Alpha_Hill for layer 7: 3.6432                                                                                                                                                                                                                                                                                                                                                             
PL_Alpha_Hill for layer 8: 2.7240                                                                                                                                                                                                                                                                                                                                                             
PL_Alpha_Hill for layer 9: 2.7166                                                                                                                                                                                                                                                                                                                                                             
PL_Alpha_Hill for layer 10: 3.2729                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 11: 2.3617                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 12: 2.9306                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 13: 2.5345                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 14: 2.2117                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 15: 2.7240                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 16: 3.5451                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 17: 2.4778                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 18: 2.8558                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 19: 2.3630                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 20: 2.3130                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 21: 3.5228                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 22: 3.3735                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 23: 3.4345                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 24: 4.1321                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 25: 2.6299                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 26: 3.2742 
PL_Alpha_Hill for layer 27: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 28: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 29: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 30: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 31: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 32: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 33: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 34: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 35: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 36: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 37: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 38: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 39: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 40: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 41: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 42: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 43: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 44: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 45: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 46: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 47: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 48: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 49: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 50: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 51: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 52: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 53: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 54: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 55: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 56: 1.0000 

Vezora-Mistral-22B-v0.2
PL_Alpha_Hill for layer 1: 1.7365                                                                                                                                                                                                                                                                                                                                                             
PL_Alpha_Hill for layer 2: 2.3165                                                                                                                                                                                                                                                                                                                                                             
PL_Alpha_Hill for layer 3: 2.3017                                                                                                                                                                                                                                                                                                                                                             
PL_Alpha_Hill for layer 4: 2.6464                                                                                                                                                                                                                                                                                                                                                             
PL_Alpha_Hill for layer 5: 2.7786                                                                                                                                                                                                                                                                                                                                                             
PL_Alpha_Hill for layer 6: 2.7357                                                                                                                                                                                                                                                                                                                                                             
PL_Alpha_Hill for layer 7: 3.7902                                                                                                                                                                                                                                                                                                                                                             
PL_Alpha_Hill for layer 8: 2.8875                                                                                                                                                                                                                                                                                                                                                             
PL_Alpha_Hill for layer 9: 2.6768                                                                                                                                                                                                                                                                                                                                                             
PL_Alpha_Hill for layer 10: 3.2108                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 11: 2.3427                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 12: 3.0512                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 13: 2.6569                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 14: 2.2624                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 15: 3.1096                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 16: 2.5973                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 17: 2.4316                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 18: 3.0323                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 19: 2.4080                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 20: 2.4066                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 21: 3.7185                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 22: 3.4868                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 23: 3.6200                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 24: 4.1489                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 25: 3.0064                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 26: 3.4390                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 27: 3.3214                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 28: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 29: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 30: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 31: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 32: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 33: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 34: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 35: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 36: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 37: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 38: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 39: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 40: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 41: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 42: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 43: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 44: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 45: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 46: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 47: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 48: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 49: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 50: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 51: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 52: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 53: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 54: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 55: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 56: 1.0000 


Mistral NeMo 12b instruct
PL_Alpha_Hill for layer 1: 2.8483                                                                                                                                                                                                                                                                                                                                                             
PL_Alpha_Hill for layer 2: 3.9687                                                                                                                                                                                                                                                                                                                                                             
PL_Alpha_Hill for layer 3: 3.6483                                                                                                                                                                                                                                                                                                                                                             
PL_Alpha_Hill for layer 4: 4.6750                                                                                                                                                                                                                                                                                                                                                             
PL_Alpha_Hill for layer 5: 3.3442                                                                                                                                                                                                                                                                                                                                                             
PL_Alpha_Hill for layer 6: 3.6857                                                                                                                                                                                                                                                                                                                                                             
PL_Alpha_Hill for layer 7: 3.8457                                                                                                                                                                                                                                                                                                                                                             
PL_Alpha_Hill for layer 8: 3.5505                                                                                                                                                                                                                                                                                                                                                             
PL_Alpha_Hill for layer 9: 3.4881                                                                                                                                                                                                                                                                                                                                                             
PL_Alpha_Hill for layer 10: 2.7972                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 11: 4.1843                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 12: 3.5826                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 13: 3.2662                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 14: 3.3232                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 15: 3.9827                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 16: 2.9114                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 17: 3.0528                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 18: 4.3605                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 19: 3.4614                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 20: 3.3892                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 21: 4.2361                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 22: 4.4134                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 23: 4.8992                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 24: 4.1821                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 25: 5.0604                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 26: 6.5571                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 27: 4.4651                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 28: 5.2947                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 29: 4.7900                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 30: 4.4452                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 31: 4.7342                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 32: 4.7901                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 33: 4.4934                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 34: 4.8650                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 35: 3.8529                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 36: 4.2417                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 37: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 38: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 39: 1.0000                                                                                                                                                                                                                                                                                                                                                            
PL_Alpha_Hill for layer 40: 1.0000

ehartford

Unofficial Mistral Community org Jan 24

It's possible
8x22b was pretty much a flop, 72b was stronger and smaller.
As to why, I could only guess.
In fact we had pretty much given up on MoE until Deepseek proved it could be good

rekrek

Jan 24

•

edited Jan 24

I think MoE should be an answer to dense model, it's just that we didn't find the right configuration.

I think Mixed Size Intelligent Routing Contracting and Expanding QLoRA MoE has no functional example. I mean use something like a classifier ie. ModernBert on input and get : Languages, Task descriptions, Task Complexity, Knowledge domains, has code. Like Nvidia's classifiers as example.

Then pass that to a route planner that would instruct the routers in the MoE what LoRAs are available for a given token at a given layer. The planner could disable whole layers if the task is easy and choose best top_k 8 expert at each layer but leave the task to the router to choose from those 8.

I read that it can be as fast as 2ms to load a LoRA from CPU RAM to VRAM. Combine that with thousands of LoRAs per layers. Also the model could shrink by disabling certain layers and become 1/5 of it's size (like how mergekit can reduce layers and LoRAs will patch the gaps between layers). So a very "large dense-like" model like 72b could fit in NVRAM if only 22b where activated most of the time and be really slow if it's asked a very hard task that would require all layers to be activated. But then it could also be as fast as a 1b-3b-8b model it the task is easy like tool calling.

An expanding or contracting model is something also not yet available. Something like take a 8b model, expand it's middle by duplicating like 4-8 layers a few times, apply a LoRa on those virtual layers and finetune it to be more intelligent and you get a 22b model with slightly(?) less performance than a real 22b but that can fit in quite less NVRAM. We could also do the reverse and compress the middle layers by extracting LoRa from that layer to a later layer that will stay there. So compressing a 22b -> 8b + a bunch of LoRa and virtual layers. For now, there seems to be some problem with KV cache. There is some talk about [FrankenModels])https://github.com/turboderp-org/exllamav2/pull/275) that could lead to model expansion/contraction.

P.S. Retraining the routers in Mixtral could be done like Mergekit MoE random gate. Freeze the layers and train only the routers. The best way would be a small set of very diverse multilingual dataset that has it's answers made with the original Mixtral 8x22b. I think that would help the gates reinitialize itself quickly before training other datasets with the layers unfrozen.

But then again, I might be mistaken.

Thanks for your great work @ehartford !

ehartford

Unofficial Mistral Community org Jan 24

By the way with this speak of merge kit and Moe,
I'm working on a self merge of Deepseek-v3. The hope is to merge every other layer together and make a model half the size and hopefully still pretty good

nlpguy

Jan 24

If you do that consider pruning different layers for different experts.

rekrek

Jan 24

@ehartford

Check this post that I just made about model compression.

Also, just don't merge all layers the same way in pairs. I really think you should look at PL_Alpha_Hill and skip some layers for the merge while you may merge 3 or more of some other layers. Merging layers could keep the model working by removing the routers between layers ?

You could identify experts with less perplexity and merge many of them (if they are for the same layer), but keep large perplexity experts almost untouched. Use virtual experts to route to them.

Is it for size more than for speed ?

Then if it's for size, re-read my proposition for virtual experts, try merging experts and add a virtual route to the missing expert that points to the merged expert.

Deepseek seems 256 experts and 61 layers. If you convert all experts to lora for a start ? And quantization the model at q2. It should be a first step. It it works, lower lora ranks until the model starts breaking. Then check virtual layers and virtual experts to further shrink it. Or just go with layer pruning/merging but fine-tuning will be needed.

Good luck !

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment