csabakecskemeti (Csaba Kecskemeti)

replied to their post 4 days ago

No success so far, the training data contains some larger contexts and it fails just before complete the first epoch.
(dataset: DevQuasar/brainstorm-v3.1_vicnua_1k)

If anyone has further suggestion to the bnb config (with ROCm on MI100)?
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16
)

Now testing with my other dataset that is smaller seems I have a lower memory need
DevQuasar/brainstorm_vicuna_1k

replied to their post 5 days ago

It's failed by the morning, need to find more room to decrease the memory

replied to their post 5 days ago

The machine itself is also funny. This my my GPU test bench.
Now also testing the PWM fan control and jetkvm

posted an update 5 days ago

Post

779

Fine tuning on the edge. Pushing the MI100 to it's limits.
QWQ-32B 4bit QLORA fine tuning
VRAM usage 31.498G/31.984G :D

3 replies

·

replied to their post 9 days ago

QLORA model loaded in 4bits

replied to their post 9 days ago

Updated the post with GGUF (Q4,Q8) performance metrics

replied to their post 9 days ago

Good callout will add this evening
Llama 3 8b q8 was around 80t/s generation

posted an update 11 days ago

Post

1930

-UPDATED-
4bit inference is working! The blogpost is updated with code snippet and requirements.txt
https://devquasar.com/uncategorized/all-about-amd-and-rocm/
-UPDATED-
I've played around with an MI100 and ROCm and collected my experience in a blogpost:
https://devquasar.com/uncategorized/all-about-amd-and-rocm/
Unfortunately I've could not make inference or training work with model loaded in 8bit or use BnB, but did everything else and documented my findings.

4 replies

·

replied to their post 15 days ago

@sometimesanotion you might have more experience with AMD than me :)

replied to their post 15 days ago

So far I'm managed to have a working bnb up:

(bnbtest) kecso@gpu-testbench2:~/bitsandbytes/examples$ python -m bitsandbytes
g++ (Ubuntu 14.2.0-4ubuntu2) 14.2.0
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
ROCm specs: rocm_version_string='63', rocm_version_tuple=(6, 3)
PyTorch settings found: ROCM_VERSION=63
The directory listed in your path is found to be non-existent: local/gpu-testbench2
The directory listed in your path is found to be non-existent: @/tmp/.ICE-unix/2803,unix/gpu-testbench2
The directory listed in your path is found to be non-existent: /etc/xdg/xdg-ubuntu
The directory listed in your path is found to be non-existent: /org/gnome/Terminal/screen/6bd83ab2_fd9f_4990_876a_527ef8117ef6
The directory listed in your path is found to be non-existent: //debuginfod.ubuntu.com
WARNING! ROCm runtime files not found in any environmental path.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Checking that the library is importable and ROCm is callable...
SUCCESS!
Installation was successful!

It's able to load the model to vram, but inference fails:
Exception: cublasLt ran into an error!

This is the main problem with anything not NVIDIA. The software is painful!
Keep trying...

reacted to stefan-it's post with 👍 16 days ago

Post

5069

She arrived 😍

[Expect more models soon...]

2 replies

·

replied to their post 16 days ago

All agreed.
I think I have decent amount of NVIDIA GPUs at home (a server with dual v100+p40, and a workstation with 4080 + 2x3090) so I was primarily just curious how much suffering is to use AMD.

Obviously the biggest issue is software. You have to "hack" together the ROCm versions of the things. Every simple step on NVIDIA echosystem trurns to a mini investigation on AMD :)

That's the main reason I'll put together a blogpost with all instructions to make it easier for others.

So far I've made the testbench post (there were some bios settings needed for the card to work - only on MI100, not on MI50 - ), setup ROCm drivers and pytorch for ROCm, managed to run inference with Llama.cpp, and run a 20 epoch LORA on the f32 Llama3.2 3B, and producing a model .

More details later this week in a blogpost

replied to their post 16 days ago

All correct!
What I've called out: a used MI100 is in the same price range as used V100 PCIe that's why I'm comparing with that.

And yes you're right FP16 performance would be more useful to mention (AFAIK V100 112TFLOPS, MI100 184TFLOPS) but regarding comparison it shows the same 164% (claimed) performance for MI100.

Please NOTE I'm building my hobby AI infra at home for myself, so mainly constrained by TOPS/$ :D

posted an update 17 days ago

Post

2750

Testing Training on AMD/ROCm the first time!

I've got my hands on an AMD Instinct MI100. It's about the same price used as a V100 but on paper has more TOPS (V100 14TOPS vs MI100 23TOPS) also the HBM has faster clock so the memory bandwidth is 1.2TB/s.
For quantized inference it's a beast (MI50 was also surprisingly fast)

For LORA training with this quick test I could not make the bnb config works so I'm running the FT on the fill size model.

Will share all the install, setup and setting I've learned in a blog post, together with the cooling shroud 3D design.

8 replies

·

posted an update 26 days ago

Post

1621

I found if we apply the reasoning system prompt (that has been published on the NousResearch/DeepHermes-3-Llama-3-8B-Preview model card) other models are also react to it and start mimicking reasoning. Some better some worse. I've seen internal monologue and self questioning.

Here's a blogpost about it:
http://devquasar.com/ai/reasoning-system-prompt/

reacted to fdaudens's post with 😎 26 days ago

Post

2127

🔊 Meet Kokoro Web - Free, ML speech synthesis on your computer, that'll make you ditch paid services!

28 natural voices, unlimited generations, and WebGPU acceleration. Perfect for journalists and content creators.

Test it with full articles—sounds amazingly human! 🎯🎙️

Xenova/kokoro-web

replied to their post about 1 month ago

Here you go:
https://devquasar.com/guided-browsing/
this is the guided browsing with chrome (canary) built-in Gemini Nano

replied to their post about 1 month ago

Yes I did an tried it with chrome canary.
(I even have a demo page that utilizes it but now can’t recall the name will share later)

It’s working fine but :

still not available just in experimental chrome
How about different browsers
you’ve locked in with one model
what is you hosting your local AI on another local machine

All in all the chrome built it AI provides less flexibility on my view.

Appreciate the comment though

replied to their post about 1 month ago

This is obviously a prototype.
Security is a big concern here, but is believe it’s possible to put together a proxy that is safe and does not allow anything else than forward generate requests between browser and local llm.

posted an update about 1 month ago

Post

1863

Check out my idea:
LLmaaS - Local LLM as a Service

With LLmaaS, I propose leveraging locally running LLMs as a service, providing a standardized way for websites to access and utilize them for LLM-powered operations directly on the user’s device.

Demo, code, more detailed description.
https://devquasar.com/llmaas/
https://github.com/csabakecskemeti/LLmaaS
https://youtu.be/OOWGr8jcP5Q

Call for contributors
Join me a develop the LLmaaS proxy to make this a generic purpose tool to leverage local LLMs on web. Build in security measures.
I'm looking for help to make the proxy more generic support multiple local LLM services without any change on the HTML side.
Also looking for ideas how to make the HTML par more modular and easy to use.

4 replies

·

Csaba Kecskemeti PRO

AI & ML interests

Recent Activity

Organizations

csabakecskemeti's activity