3 87 87

Shyam Sunder Kumar

theainerd

AI & ML interests

Natural Language Processing

Recent Activity

reacted to onekq's post with 👍 about 3 hours ago

QwQ-32B is amazing! It ranks below o1-preview, but beats DeepSeek v3 and all Gemini models. https://huggingface.co./spaces/onekq-ai/WebApp1K-models-leaderboard Now we have such a powerful model that can fit into a single GPU, can someone finetune a web app model to push SOTA of my leaderboard? 🤗

reacted to clem's post with 🔥 about 20 hours ago

I was chatting with @peakji , one of the cofounders of Manu AI, who told me he was on Hugging Face (very cool!). He shared an interesting insight which is that agentic capabilities might be more of an alignment problem rather than a foundational capability issue. Similar to the difference between GPT-3 and InstructGPT, some open-source foundation models are simply trained to 'answer everything in one response regardless of the complexity of the question' - after all, that's the user preference in chatbot use cases. Just a bit of post-training on agentic trajectories can make an immediate and dramatic difference. As a thank you to the community, he shared 100 invite code first-come first serve, just use “HUGGINGFACE” to get access!

upvoted a paper 2 days ago

Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers

View all activity

Organizations

theainerd's activity

reacted to onekq's post with 👍 about 3 hours ago

Post

2196

QwQ-32B is amazing!

It ranks below o1-preview, but beats DeepSeek v3 and all Gemini models.
onekq-ai/WebApp1K-models-leaderboard

Now we have such a powerful model that can fit into a single GPU, can someone finetune a web app model to push SOTA of my leaderboard? 🤗

reacted to clem's post with 🔥 about 20 hours ago

Post

1825

I was chatting with @peakji , one of the cofounders of Manu AI, who told me he was on Hugging Face (very cool!).

He shared an interesting insight which is that agentic capabilities might be more of an alignment problem rather than a foundational capability issue. Similar to the difference between GPT-3 and InstructGPT, some open-source foundation models are simply trained to 'answer everything in one response regardless of the complexity of the question' - after all, that's the user preference in chatbot use cases. Just a bit of post-training on agentic trajectories can make an immediate and dramatic difference.

As a thank you to the community, he shared 100 invite code first-come first serve, just use “HUGGINGFACE” to get access!

4 replies

upvoted a paper 2 days ago

Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers

Paper • 2503.00865 • Published 7 days ago • 55

reacted to Kseniase's post with 🔥 6 days ago

Post

6024

9 types of "Chain-of-..." approaches:

Chain-of-Thought (CoT) prompting enhances reasoning in AI models by breaking down complex problems into step-by-step logical sequences. It continues proving its effectiveness, especially in top-performing reasoning models. However, there are other similar methods, that expand CoT and can be used for different purposes. Here are 9 of them:

1. Chain-of-Action-Thought (COAT) -> Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search (2502.02508)
Helps model decide when to keep thinking, double-check their work, or try a different approach, using special guiding tokens.

2. Chain of Draft (CoD) -> Chain of Draft: Thinking Faster by Writing Less (2502.18600)
It helps model generate short but meaningful reasoning steps, cutting costs and making processing faster

3. Chain-of-Agents -> Chain of Agents: Large Language Models Collaborating on Long-Context Tasks (2406.02818)
Uses multi-agent collaboration: Worker agents process text parts in a structured chain, and manager agent summarizes the results

4. Chain-of-RAG ->https://huggingface.co./papers/2501.14342
Creates retrieval chains, instead of retrieving all info at once. It can dynamically adjust its search process and its parameters like step number

5. Chain-of-Shot Prompting (CoS) -> CoS: Chain-of-Shot Prompting for Long Video Understanding (2502.06428)
Helps models pick frames crucial for understanding a video, using a binary video summary and video co-reasoning module.

6. Chain of Hindsight (CoH) -> Chain of Hindsight Aligns Language Models with Feedback (2302.02676)
Converts all feedback into sequences to fine-tune the model and refine outputs

7. Chain-of-Note (CoN) -> Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models (2311.09210)
Generates sequential reading notes for each retrieved document to assess relevance before integrating info into the final answer

8. Chain of Diagnosis (CoD) -> CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis (2407.13301)
Transforms the diagnostic process into a diagnostic chain

9. Chain(s)-of-Knowledge -> https://www.turingpost.com/p/cok
Enhance LLMs by dynamically pulling in external knowledge to improve accuracy and reduce errors

upvoted an article 10 days ago

Article

SigLIP 2: A better multilingual vision language encoder

17 days ago

• 126

reacted to AdinaY's post with 🔥 10 days ago

Post

2698

Wan2.1 🔥📹 new OPEN video model by Alibaba Wan team!

Model: Wan-AI/Wan2.1-T2V-14B
Demo: Wan-AI/Wan2.1

✨Apache 2.0
✨8.19GB VRAM, runs on most GPUs
✨Multi-Tasking: T2V, I2V, Video Editing, T2I, V2A
✨Text Generation: Supports Chinese & English
✨Powerful Video VAE: Encode/decode 1080P w/ temporal precision

1 reply

reacted to burtenshaw's post with 🔥 11 days ago

Post

6091

Now the Hugging Face agent course is getting real! With frameworks like smolagents, LlamaIndex, and LangChain.

🔗 Follow the org for updates https://huggingface.co./agents-course

This week we are releasing the first framework unit in the course and it’s on smolagents. This is what the unit covers:

- why should you use smolagents vs another library?
- how to build agents that use code
- build multiagents systems
- use vision language models for browser use

The team has been working flat out on this for a few weeks. Led by @sergiopaniego and supported by smolagents author @m-ric .

liked a Space 11 days ago

310

AI Deadlines

⚡

Schedule tasks efficiently using AI-generated deadlines

reacted to stefan-it's post with 👍 11 days ago

Post

5061

She arrived 😍

[Expect more models soon...]

2 replies

upvoted a paper 13 days ago

LightThinker: Thinking Step-by-Step Compression

Paper • 2502.15589 • Published 16 days ago • 26

liked a dataset 15 days ago

facebook/natural_reasoning

Viewer • Updated 16 days ago • 1.15M • 8.56k • 348

upvoted 2 papers 15 days ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published 17 days ago • 128

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Paper • 2502.14499 • Published 17 days ago • 177

upvoted a paper 16 days ago

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published 17 days ago • 94

reacted to cogwheelhead's post with 👍 16 days ago

Post

2513

Me and my team have performed an in-depth investigation comparing o1 to R1 (and other reasoning models)

Link: https://toloka.ai/blog/r1-is-not-on-par-with-o1-and-the-difference-is-qualitative-not-quantitative

It started with us evaluating them on our own university-math benchmarks: U-MATH for problem-solving and μ-MATH for judging solution correctness (see the HF leaderboard: toloka/u-math-leaderboard)

tl;dr: R1 sure is amazing, but what we find is that it lags behind in novelty adaptation and reliability:
* performance drops when updating benchmarks with fresh unseen tasks (e.g. AIME 2024 -> 2025)
* R1-o1 gap widens when evaluating niche subdomains (e.g. university-specific math instead of the more common Olympiad-style contests)
* same with going into altogether unconventional domains (e.g. chess) or skills (e.g. judgment instead of problem-solving)
* R1 also runs into failure modes way more often (e.g. making illegal chess moves or falling into endless generation loops)

Our point here is not to bash on DeepSeek — they've done exceptional work, R1 is a game-changer, and we have no intention to downplay that. R1's release is a perfect opportunity to study where all these models differ and gain understanding on how to move forward from here

liked 2 Spaces 17 days ago

156

Open Object Detection Leaderboard

🏆

Request model evaluation on COCO val 2017 dataset

Paligemma2 Mix

🌖

Generate text or segment objects from an image

liked a dataset 17 days ago

microsoft/IMAGE_UNDERSTANDING

Viewer • Updated Sep 20, 2024 • 10.2k • 629 • 6

upvoted 2 papers 17 days ago

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published 18 days ago • 157

Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs

Paper • 2502.12982 • Published 19 days ago • 14