https://huggingface.co./leafspark/IridiumLlama-72B-v0.1

#218

by leafspark - opened Aug 14

Aug 14

It's a merge between Qwen2 72b Instruct, magnum 72b and calme2.1 72b, converted to llama. Quant please thanks! Also quick question: do you requant from q8_0 after safetensors conversion or fp/bf16?

mradermacher

Owner Aug 14

It's queued. You can watch its progress at http://hf.tst.eu/status.html, if you can guess how to interpret that.

I always quantize from the source precision (defined by whatever llama.cpp thinks this is, usually f32, f16 or bf16, depending on the tensor and version of llama.cpp). I don't think anybody would first quant to Q8_0 and then further, as this doesn't seem to offer any advantage (in fact would probably be slower), but some people do first quantize to f16 or bf16 (in my experience, usually because they mistakenly think they have to).

mradermacher changed discussion status to closed Aug 14

mradermacher

Owner Aug 14

The model lacks the config.json file.

mradermacher changed discussion status to open Aug 14

leafspark

Aug 14

Sorry for the oversight; just fixed it: https://huggingface.co./leafspark/IridiumLlama-72B-v0.1/blob/main/config.json

mradermacher

Owner Aug 14

Restarted it, cheers!

mradermacher changed discussion status to closed Aug 14

mradermacher

Owner Aug 15

Out of curiosity, why did you make a llama conversion and not quant the original model?

leafspark

Aug 15

I uploaded it first (the conversion process resharded it into 31 safetensors from 936 individual tensor files, which was another reason) and decided to just use this one since the Llama architecture is fairly similar to Qwen2 afaik (at the cost of the context length, 128k -> 32k). However it may be better supported in some tools too, for example exllamav2.

On that note I just finished the Qwen2 model upload (leafspark/Iridium-72B-v0.1), I noticed your queue was large so I held off on requesting quants.

mradermacher

Owner Aug 15

Ah, I see - well, at least llama.cpp should handle qwen2 directly without losing context. And should in theory be able to handle the safetensor file, uh, mess :) Don't worry about the queue length, the queue is there so I can make better scheduling decisions. I'll try to quant Iridium-72B-v0.1 and see what happens.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment