1 143 609

Motoki Wu

tokestermw

https://motoki.co

AI & ML interests

None yet

Recent Activity

liked a model 1 day ago

Qwen/Qwen2.5-1.5B-Instruct

liked a model 3 days ago

ai21labs/AI21-Jamba-Mini-1.6

liked a model 3 days ago

ai21labs/AI21-Jamba-Large-1.6

View all activity

Organizations

tokestermw's activity

liked a model 1 day ago

Qwen/Qwen2.5-1.5B-Instruct

Text Generation • Updated Sep 25, 2024 • 1.11M • • 353

liked 2 models 3 days ago

ai21labs/AI21-Jamba-Mini-1.6

Text Generation • Updated 4 days ago • 830 • 30

ai21labs/AI21-Jamba-Large-1.6

Text Generation • Updated 4 days ago • 195 • 46

upvoted a collection 4 days ago

Light-R1

Collection

Surpassing R1-Distill from Scratch* with 70k Math Data through Curriculum SFT & DPO • 3 items • Updated 6 days ago • 9

liked 3 models 4 days ago

liked a Space 5 days ago

infini-gram

📖

Search and analyze language model datasets

liked a model 5 days ago

KRLabsOrg/lettucedect-base-modernbert-en-v1

Token Classification • Updated 11 days ago • 3.41k • 14

upvoted a collection 5 days ago

Hallucination detection

Collection

Trained ModernBERT (base and large) for detection hallucinations in LLM responses. The models are trained as token classifications. • 4 items • Updated 5 days ago • 14

liked a dataset 6 days ago

zeroshot/twitter-financial-news-sentiment

Viewer • Updated Feb 23, 2024 • 11.9k • 4.66k • 135

upvoted a paper 9 days ago

Rank1: Test-Time Compute for Reranking in Information Retrieval

Paper • 2502.18418 • Published 12 days ago • 25

liked a model 11 days ago

microsoft/Phi-4-multimodal-instruct

Automatic Speech Recognition • Updated 2 days ago • 231k • 1.04k

upvoted a paper 11 days ago

Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment

Paper • 2502.16894 • Published 14 days ago • 26

upvoted a paper 12 days ago

SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published 12 days ago • 67

liked a model 13 days ago

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

Text Generation • Updated 14 days ago • 1.49M • • 1.01k

upvoted 2 papers 13 days ago

Expect the Unexpected: FailSafe Long Context QA for Finance

Paper • 2502.06329 • Published 28 days ago • 126

InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback

Paper • 2502.15027 • Published 17 days ago • 7

upvoted a paper 14 days ago

SIFT: Grounding LLM Reasoning in Contexts via Stickers

Paper • 2502.14922 • Published 18 days ago • 29

upvoted a collection 14 days ago

Sky-T1-7B

Collection

A series of 7B models trained with different recipes and the corresponding training data. • 8 items • Updated 24 days ago • 6