sometimesanotion PRO

sometimesanotion

AI & ML interests

Agentic LLM services, model merging, finetunes, distillation

Recent Activity

Organizations

None yet

sometimesanotion's activity

reacted to hba123's post with 🚀 1 day ago
view post
Post
1146
Blindly applying algorithms without understanding the math behind them is not a good idea frmpv. So, I am on a quest to fix this!

I wrote my first hugging face article on how you would derive closed-form solutions for KL-regularised reinforcement learning problems - what is used for DPO.


Check it out: https://huggingface.co./blog/hba123/derivingdpo
New activity in CultriX/SeQwence-14Bv3 7 days ago
New activity in Aashraf995/QwenStock-14B 8 days ago