Jaward Sesay

Jaward

AI & ML interests

I like to train large deep neural nets too 🧠🤖💥 | First Paper (AutoAgents: A Framework for Automatic Agent Generation) Accepted @ IJCAI 2024 | Role Model Karpathy

Recent Activity

posted an update about 11 hours ago

Lightweight (nanoGPT) implementation of hybrid norm - an intuitive normalization method that combines the strength of both pre-norm (i.e QKV-norm in MHA) and post-norm in the feed-forward network. Code: https://github.com/Jaykef/ai-algorithms/blob/main/hybrid_normalization.ipynb

upvoted a paper about 15 hours ago

HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization

upvoted a paper 6 days ago

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

View all activity

Organizations

Jaward's activity

posted an update about 11 hours ago

Post

143

Lightweight (nanoGPT) implementation of hybrid norm - an intuitive normalization method that combines the strength of both pre-norm (i.e QKV-norm in MHA) and post-norm in the feed-forward network.
Code: https://github.com/Jaykef/ai-algorithms/blob/main/hybrid_normalization.ipynb

upvoted a paper about 15 hours ago

HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization

Paper • 2503.04598 • Published 3 days ago • 16

upvoted a paper 6 days ago

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Paper • 2503.01743 • Published 6 days ago • 65

updated a model 6 days ago

Jaward/smollm2_360m_grpo_gsm8k_reasoner

Text Generation • Updated 6 days ago • 29 • 1

liked a model 9 days ago

Jaward/smollm2_360m_grpo_gsm8k_reasoner

Text Generation • Updated 6 days ago • 29 • 1

posted an update 9 days ago

Post

4916

made a few improvements on custom grpo trainer:
- added sequence similarity reward (seems to work)
- improved vllm support (5x inference speed)
- adjusted reward scores (this helped with format/accuracy)
- can now push to hf hub (already pushed mine lol: Jaward/smollm2_360m_grpo_gsm8k_reasoner)

Code: https://github.com/Jaykef/ai-algorithms/blob/main/smollm2_360M_135M_grpo_gsm8k.ipynb

published a model 9 days ago

Jaward/smollm2_360m_grpo_gsm8k_reasoner

Text Generation • Updated 6 days ago • 29 • 1

liked a dataset 11 days ago

facebook/natural_reasoning

Viewer • Updated 17 days ago • 1.15M • 8.56k • 354

replied to their post 19 days ago

bro if you had read the repo you would see that this implementation is for educational purpose, it's not done because it's easy. Not to mention unsloth is using trl's GRPO trainer which is super slow on cpu and does not scale for models under 500M params, I tried it both on cpu and gpu. This custom implementation cuts most of the heavy lifting allowing you to train and scale faster even on cpu, plus a bunch of custom configs with a simplified GRPO trainer in under 500 lines of code. There's a lot one can learn from it.

posted an update 21 days ago

Post

3862

Finally here it is: a faster, custom, scalable GRPO trainer for smaller models with < 500M params, can train on 8gb ram cpu, also supports gpu for sanity sake (includes support for vllm + flash attention). Using smolLM2-135M/360M-instructs as ref & base models. Experience your own “aha” moment 🐳 on 8gb ram.
Code: https://github.com/Jaykef/ai-algorithms/blob/main/smollm2_360M_135M_grpo_gsm8k.ipynb

2 replies

liked a model 26 days ago

HuggingFaceTB/SmolVLM-Instruct

Image-Text-to-Text • Updated 4 days ago • 77.9k • 406

liked a model 30 days ago

HuggingFaceTB/SmolLM2-135M-Instruct

Text Generation • Updated Feb 6 • 181k • • 149

posted an update about 1 month ago

Post

3450

ByteDance drops OmniHuman🔥
This is peak SOTA performance - flawless natural gestures with perfect lip sync and facial expressions. This is the second time they've released SOTA level talking-heads only this time with hands and body motion.
Project: https://omnihuman-lab.github.io/

3 replies

upvoted 2 papers about 1 month ago

Process Reinforcement through Implicit Rewards

Paper • 2502.01456 • Published Feb 3 • 55

OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

Paper • 2502.01061 • Published Feb 3 • 186

posted an update about 1 month ago

Post

1505

The beauty in GRPO is the fact that it doesn’t care if the rewards are rule-based or learned, the hack: let the data self-normalize— trajectories in a batch compete against their mean, no value model, no extra params, just clean, efficient RL that cuts memory usage by 50%, while maintaining SOTA performance. btw it was introduced 9months prior to R1: arxiv.org/pdf/2402.03300