29 10 24

Alex Chen PRO

alexchen4ai

https://alexchen4ai.github.io/blog/

AI & ML interests

NLP

Recent Activity

liked a model 17 days ago

deepseek-ai/DeepSeek-V3-Base

upvoted a paper 26 days ago

No More Adam: Learning Rate Scaling at Initialization is All You Need

liked a model about 1 month ago

NexaAIDev/OmniAudio-2.6B

View all activity

Organizations

alexchen4ai's activity

liked a model 17 days ago

deepseek-ai/DeepSeek-V3-Base

Updated 14 days ago • 12.4k • 1.23k

upvoted a paper 26 days ago

No More Adam: Learning Rate Scaling at Initialization is All You Need

Paper • 2412.11768 • Published 28 days ago • 41

liked 4 models about 1 month ago

New activity in NexaAIDev/OmniVLM-968M about 1 month ago

Regarding Model Weights

#12 opened about 1 month ago by

BimsaraRad

liked a Space about 1 month ago

Running on CPU Upgrade

6.71k

👕

Kolors Virtual Try-On

New activity in NexaAIDev/OmniVLM-968M about 2 months ago

9x token reduction

#10 opened about 2 months ago by

Sijuade

liked 3 models about 2 months ago

NexaAIDev/Qwen2-Audio-7B-GGUF

Audio-Text-to-Text • Updated Nov 25, 2024 • 7.58k • 130

google/siglip-so400m-patch16-256-i18n

Zero-Shot Image Classification • Updated Nov 18, 2024 • 6.16k • 28

bookbot/distil-ast-audioset

Audio Classification • Updated Sep 12, 2023 • 544 • 18

New activity in NexaAIDev/OmniVLM-968M about 2 months ago

Error loading model

#9 opened about 2 months ago by

iojvsuynv

updated a model about 2 months ago

NexaAIDev/OmniVLM-968M

Updated 27 days ago • 1.69k • 494

New activity in NexaAIDev/OmniVLM-968M about 2 months ago

How do you encode an image in only 81 tokens?

#2 opened about 2 months ago by

ChristineLai

Video or multiple frames.

#6 opened about 2 months ago by

monamie

reacted to thomwolf's post with 👍 about 2 months ago

Post

5108

A Little guide to building Large Language Models in 2024

This is a post-recording of a 75min lecture I gave two weeks ago on how to train a LLM from scratch in 2024. I tried to keep it short and comprehensive – focusing on concepts that are crucial for training good LLM but often hidden in tech reports.

In the lecture, I introduce the students to all the important concepts/tools/techniques for training good performance LLM:
* finding, preparing and evaluating web scale data
* understanding model parallelism and efficient training
* fine-tuning/aligning models
* fast inference

There is of course many things and details missing and that I should have added to it, don't hesitate to tell me you're most frustrating omission and I'll add it in a future part. In particular I think I'll add more focus on how to filter topics well and extensively and maybe more practical anecdotes and details.

Now that I recorded it I've been thinking this could be part 1 of a two-parts series with a 2nd fully hands-on video on how to run all these steps with some libraries and recipes we've released recently at HF around LLM training (and could be easily adapted to your other framework anyway):
*datatrove for all things web-scale data preparation: https://github.com/huggingface/datatrove
*nanotron for lightweight 4D parallelism LLM training: https://github.com/huggingface/nanotron
*lighteval for in-training fast parallel LLM evaluations: https://github.com/huggingface/lighteval

Here is the link to watch the lecture on Youtube: https://www.youtube.com/watch?v=2-SPH9hIKT8
And here is the link to the Google slides: https://docs.google.com/presentation/d/1IkzESdOwdmwvPxIELYJi8--K3EZ98_cL6c5ZcLKSyVg/edit#slide=id.p

Enjoy and happy to hear feedback on it and what to add, correct, extend in a second part.

2 replies

liked a Space about 2 months ago

Running

👁️

Omnivlm Dpo Demo

New activity in NexaAIDev/OmniVLM-968M about 2 months ago

about ocr

#1 opened about 2 months ago by

MiaHawthorne

How do you encode an image in only 81 tokens?

#2 opened about 2 months ago by

ChristineLai