DevQuasar

I've run the open llm leaderboard evaluations + hellaswag on deepseek-ai/DeepSeek-R1-Distill-Llama-8B and compared to meta-llama/Llama-3.1-8B-Instruct and at first glance R1 do not beat Llama overall.

If anyone wants to double check the results are posted here:
https://github.com/csabakecskemeti/lm_eval_results

Am I made some mistake, or (at least this distilled version) not as good/better than the competition?

I'll run the same on the Qwen 7B distilled version too.

6 replies

csabakecskemeti

posted an update 8 days ago

Post

465

NVIDIA's new AceInstruct and AceMath models quantized here:

DevQuasar/nvidia-aceinstruct-and-acemath-678d716f736603ddc8d7cbd4

(some still uploading please be patient)

csabakecskemeti

posted an update 24 days ago

Post

573

Managed to run the Q2 quantized Deepseek V3 base locally
The quants are uploading (probably ~10-12hrs) here: DevQuasar/deepseek-ai.DeepSeek-V3-Base-GGUF

1 reply

csabakecskemeti

posted an update 25 days ago

Post

616

Just wondering why the number of parameters changed in the model attributes/Model size from 685B to 684B after converting deepseek-ai/DeepSeek-V3-Base from FP8 to BF16:
DevQuasar/deepseek-ai.DeepSeek-V3-Base-bf16
and not just for me:
opensourcerelease/DeepSeek-V3-Base-bf16

??

csabakecskemeti

posted an update 27 days ago

Post

1544

Happy New Year, Huggingface community!
In 2025, I'll continue my quantization (and some fine-tuning) efforts to support the open-source AI and Make knowledge free for everyone.

https://huggingface.co./DevQuasar
https://devquasar.com/

1 reply

csabakecskemeti

posted an update 28 days ago

Post

2115

The deepseek-ai/DeepSeek-V3-Base
model has featured today on CNBC tech news. The whale made a splash by using FP8 and shrink the cost of training significantly!

https://youtu.be/NJljq429cGk?si=kgk-ogPTMfJKsaA2

3 replies

csabakecskemeti

posted an update 30 days ago

Post

1472

I've built a small utility to split safetensors file by file.
The issue/need came up when I've tried to convert the new Deepseek V3 model from FP8 to BF16.
The only Ada architecture GPU I have is an RTX 4080 and the 16GB vram was just wasn't enough for the conversion.

BTW: I'll upload the bf16 version here:
DevQuasar/deepseek-ai.DeepSeek-V3-Base-bf16
(it will take a while - days with my upload speed)
If anyone has access the resources to test it I'd appreciate a feedback if it's working or not.

The tool, is available from here:
https://github.com/csabakecskemeti/ai_utils/blob/main/safetensor_splitter.py
It's splitting every file to n pieces by the layers if possible, and create a new "model.safetensors.index.json" file.
I've tested it with Llama 3.1 8B and multiple split sizes, and validated by using inference pipeline.
use --help for usage
Please note current version expects the model is already multiple file and have a "model.safetensors.index.json" layer-safetensor mapping file.

csabakecskemeti

posted an update about 1 month ago

Post

1227

tiiuae Falcon3 10B Q8 playground:
https://huggingface.co./spaces/DevQuasar/Mi50

Also find my tiiuae Falcon3 Quant collection here:
https://huggingface.co./collections/DevQuasar/tiiuae-falcon3-676236626f3c57d1a19c6c1d

Enjoy!

csabakecskemeti

posted an update about 1 month ago

Post

4522

The AMD Instinct MI50 (~$110) is surprisingly fast for inference Quantized models.

This runs a Llama 3.1 8B Q8 with Llama.cpp
https://huggingface.co./spaces/DevQuasar/Mi50

A little blogpost about the HW
http://devquasar.com/uncategorized/amd-radeon-instinct-mi50-cheap-inference/

csabakecskemeti

posted an update about 2 months ago

Post

1168

Fine Tuned a Llama3.2 3B on the MS Orca-Agents dataset for Analytical-Reasoning
r=16, Alpha=32

If you want to give it a try:

Model:
DevQuasar/analytical_reasoning_r16a32_unsloth-Llama-3.2-3B-Instruct-bnb-4bit

Adapter:
DevQuasar/analytical_reasoning_r16a32_unsloth-Llama-3.2-3B-Instruct-bnb-4bit_adapter

Quants:
DevQuasar/analytical_reasoning_r16a32_unsloth-Llama-3.2-3B-Instruct-bnb-4bit-GGUF

1 reply

csabakecskemeti

posted an update 2 months ago

Post

1182

I have this small utility: no_more_typo
It is running in the background and able to call the LLM model to update the text on the clipboard. I think it would be ideal to fix typos and syntax.
I have just added the option to use custom prompt templates to perform different tasks.

Details, code and executable:
https://github.com/csabakecskemeti/no_more_typo

https://devquasar.com/no-more-typo/

csabakecskemeti

posted an update 2 months ago

Post

294

Repurposed my older AI workstation to a homelab server, it has received 2xV100 + 1xP40
I can reach huge 210k token context size with MegaBeam-Mistral-7B-512k-GGUF ~70+tok/s, or run Llama-3.1-Nemotron-70B-Instruct-HF-GGUF with 50k Context ~10tok/s (V100 only 40k ctx and 15tok/s).
Also able to Lora finetune with similar performace as an RTX3090.
It moved to the garage to no complaints for the noise from the family. Will move to a Rack soon :D

2 replies

csabakecskemeti

posted an update 2 months ago

Post

1244

Some time ago, I built a predictive LLM router that routes chat requests between small and large LLM models based on prompt classification. It dynamically selects the most suitable model depending on the complexity of the user input, ensuring optimal performance while maintaining conversation context. I also fine-tuned a RoBERTa model to use with the package, but you can plug and play any classifier of your choice.

Project's homepage:
https://devquasar.com/llm-predictive-router/
Pypi:
https://pypi.org/project/llm-predictive-router/
Model:
DevQuasar/roberta-prompt_classifier-v0.1
Training data:
DevQuasar/llm_router_dataset-synth
Git:
https://github.com/csabakecskemeti/llm_predictive_router_package

Feel free to check it out, and/or contribute.

AI & ML interests

Recent Activity

Team members 2

DevQuasar's activity