RachidAR's picture

RachidAR

RachidAR

·

RachidARx

AI & ML interests

1.58 bit LLM

Recent Activity

liked a model about 21 hours ago

google/gemma-3-12b-it

upvoted a collection about 21 hours ago

Gemma 3 Release

liked a model about 21 hours ago

google/gemma-3-27b-it

View all activity

Organizations

RachidAR's activity

upvoted a collection about 21 hours ago

Gemma 3 Release

9 items • Updated 1 day ago • 184

upvoted a paper 23 days ago

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published 25 days ago • 142

upvoted 4 papers 28 days ago

Titans: Learning to Memorize at Test Time

Paper • 2501.00663 • Published Dec 31, 2024 • 21

BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 97

BitNet a4.8: 4-bit Activations for 1-bit LLMs

Paper • 2411.04965 • Published Nov 7, 2024 • 66

Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Paper • 2502.06703 • Published about 1 month ago • 142

upvoted an article about 1 month ago

Article

Open-source DeepResearch – Freeing our search agents

Feb 4

• 1.16k

upvoted a paper about 1 month ago

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Paper • 2501.18585 • Published Jan 30 • 56

upvoted a collection 4 months ago

MobileLLM

Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905 • 9 items • Updated Nov 27, 2024 • 111

upvoted 2 papers 5 months ago

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 171

Addition is All You Need for Energy-efficient Language Models

Paper • 2410.00907 • Published Oct 1, 2024 • 146

upvoted 2 collections 6 months ago

Moshi v0.1 Release

MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 13 items • Updated Sep 18, 2024 • 227

Qwen2.5

Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 46 items • Updated 15 days ago • 558

upvoted a paper 6 months ago

OLMoE: Open Mixture-of-Experts Language Models

Paper • 2409.02060 • Published Sep 3, 2024 • 78

upvoted 2 papers 9 months ago

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Paper • 2403.03206 • Published Mar 5, 2024 • 63

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Paper • 2405.21060 • Published May 31, 2024 • 66

upvoted 3 papers 10 months ago

You Only Cache Once: Decoder-Decoder Architectures for Language Models

Paper • 2405.05254 • Published May 8, 2024 • 10

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21, 2024 • 32

TerDiT: Ternary Diffusion Models with Transformers

Paper • 2405.14854 • Published May 23, 2024 • 2

upvoted a collection 10 months ago

Papers

18 items • Updated Oct 7, 2024 • 1