Differences in model architecture
#3
by
dfrank
- opened
Hi, I have tried to quantize the model to 4 bits without success, following the same procedure I use for the original model tiiuae/falcon-7b
. When I checked the model architecture they look quite similar but for some reason the ehartford/WizardLM-Uncensored-Falcon-7b
have one extra dimension (I believe is related with one extra token). Do you know why?
As a reference, I leave you the architectures, look at the embedding layer.
tiiuae/falcon-7b:
RWForCausalLM(
(transformer): RWModel(
(word_embeddings): Embedding(65024, 4544)
(h): ModuleList(
(0-31): 32 x DecoderLayer(
(input_layernorm): LayerNorm((4544,), eps=1e-05, elementwise_affine=True)
(self_attention): Attention(
(maybe_rotary): RotaryEmbedding()
(query_key_value): Linear4bit(in_features=4544, out_features=4672, bias=False)
(dense): Linear4bit(in_features=4544, out_features=4544, bias=False)
(attention_dropout): Dropout(p=0.0, inplace=False)
)
(mlp): MLP(
(dense_h_to_4h): Linear4bit(in_features=4544, out_features=18176, bias=False)
(act): GELU(approximate='none')
(dense_4h_to_h): Linear4bit(in_features=18176, out_features=4544, bias=False)
)
)
)
(ln_f): LayerNorm((4544,), eps=1e-05, elementwise_affine=True)
)
(lm_head): Linear(in_features=4544, out_features=65024, bias=False)
)
ehartford/WizardLM-Uncensored-Falcon-7b:
RWForCausalLM(
(transformer): RWModel(
(word_embeddings): Embedding(65025, 4544)
(h): ModuleList(
(0-31): 32 x DecoderLayer(
(input_layernorm): LayerNorm((4544,), eps=1e-05, elementwise_affine=True)
(self_attention): Attention(
(maybe_rotary): RotaryEmbedding()
(query_key_value): Linear4bit(in_features=4544, out_features=4672, bias=False)
(dense): Linear4bit(in_features=4544, out_features=4544, bias=False)
(attention_dropout): Dropout(p=0.0, inplace=False)
)
(mlp): MLP(
(dense_h_to_4h): Linear4bit(in_features=4544, out_features=18176, bias=False)
(act): GELU(approximate='none')
(dense_4h_to_h): Linear4bit(in_features=18176, out_features=4544, bias=False)
)
)
)
(ln_f): LayerNorm((4544,), eps=1e-05, elementwise_affine=True)
)
(lm_head): Linear(in_features=4544, out_features=65025, bias=False)
I don't really know the reason why it is that way