9 78 149

YangWang92

yangwang92

AI & ML interests

None yet

Recent Activity

new activity 19 minutes ago

VPTQ-community/Qwen2.5-32B-Instruct-v8-k65536-65536-woft:Use aphrodite-engine to infer this model error

liked a model 22 minutes ago

VPTQ-community/deepseek-r1_v_8_k_65536_mixed_mp4

new activity 1 day ago

VPTQ-community/deepseek-r1_v_8_k_65536_mixed_mp4:Create README.md

View all activity

Organizations

yangwang92's activity

upvoted a paper 6 days ago

Process-based Self-Rewarding Language Models

Paper • 2503.03746 • Published 8 days ago • 35

upvoted a collection 10 days ago

Qwen2.5-1M

Collection

The long-context version of Qwen2.5, supporting 1M-token context lengths • 3 items • Updated 15 days ago • 106

upvoted a paper 20 days ago

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published 22 days ago • 161

upvoted a paper 23 days ago

Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

Paper • 2502.10248 • Published 27 days ago • 51

upvoted a collection 29 days ago

CodeI/O

Collection

Collection for CodeI/O @ https://codei-o.github.io/ • 15 items • Updated 28 days ago • 6

upvoted a paper 29 days ago

CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction

Paper • 2502.07316 • Published about 1 month ago • 47

upvoted 2 articles 29 days ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

Jan 28

• 801

Article

Open R1: Update #2

and 6 others •

about 1 month ago

• 201

upvoted a paper 29 days ago

Matryoshka Quantization

Paper • 2502.06786 • Published about 1 month ago • 29

upvoted a paper about 1 month ago

QuEST: Stable Training of LLMs with 1-Bit Weights and Activations

Paper • 2502.05003 • Published Feb 7 • 43

upvoted a collection about 1 month ago

Reasoning Datasets

Collection

Distilled synthetic Reasoning datasets • 7 items • Updated Feb 2 • 56

upvoted 2 papers about 1 month ago

Proximal Policy Optimization Algorithms

Paper • 1707.06347 • Published Jul 20, 2017 • 8

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24 • 25

upvoted an article about 2 months ago

Article

Process Reinforcement through Implicit Rewards

and 1 other •

Jan 3

• 25

upvoted 6 papers about 2 months ago

Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models

Paper • 2501.13629 • Published Jan 23 • 44

Kimi k1.5: Scaling Reinforcement Learning with LLMs

Paper • 2501.12599 • Published Jan 22 • 103

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 345

Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models

Paper • 2501.11873 • Published Jan 21 • 63

Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation

Paper • 2501.12202 • Published Jan 21 • 35

Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

Paper • 2501.09732 • Published Jan 16 • 70