Yatharth Sharma's picture

Yatharth Sharma

YaTharThShaRma999

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 9 minutes ago

RL + Transformer = A General-Purpose Problem Solver

liked a model 1 day ago

baichuan-inc/Baichuan-Omni-1d5

liked a model 1 day ago

baichuan-inc/Baichuan-Omni-1d5-Base

View all activity

Organizations

None yet

YaTharThShaRma999's activity

upvoted a paper 9 minutes ago

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published 4 days ago • 3

upvoted a paper 11 days ago

RepVideo: Rethinking Cross-Layer Representation for Video Generation

Paper • 2501.08994 • Published 12 days ago • 15

upvoted a paper 13 days ago

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction

Paper • 2501.06282 • Published 17 days ago • 40

upvoted a paper 18 days ago

Multi-task retriever fine-tuning for domain-specific and efficient RAG

Paper • 2501.04652 • Published 19 days ago • 10

upvoted a paper 19 days ago

Cosmos World Foundation Model Platform for Physical AI

Paper • 2501.03575 • Published 21 days ago • 66

upvoted a paper 24 days ago

LTX-Video: Realtime Video Latent Diffusion

Paper • 2501.00103 • Published 28 days ago • 41

upvoted 2 papers 25 days ago

HUNYUANPROVER: A Scalable Data Synthesis Framework and Guided Tree Search for Automated Theorem Proving

Paper • 2412.20735 • Published 29 days ago • 11

Xmodel-2 Technical Report

Paper • 2412.19638 • Published Dec 27, 2024 • 25

upvoted 3 papers 27 days ago

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

Paper • 2412.21187 • Published 28 days ago • 36

Slow Perception: Let's Perceive Geometric Figures Step-by-step

Paper • 2412.20631 • Published 29 days ago • 14

TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization

Paper • 2412.21037 • Published 28 days ago • 23

upvoted 9 papers about 1 month ago

IDOL: Instant Photorealistic 3D Human Creation from a Single Image

Paper • 2412.14963 • Published Dec 19, 2024 • 6

Flowing from Words to Pixels: A Framework for Cross-Modality Evolution

Paper • 2412.15213 • Published Dec 19, 2024 • 26

Autoregressive Video Generation without Vector Quantization

Paper • 2412.14169 • Published Dec 18, 2024 • 14

FastVLM: Efficient Vision Encoding for Vision Language Models

Paper • 2412.13303 • Published Dec 17, 2024 • 13

VidTok: A Versatile and Open-Source Video Tokenizer

Paper • 2412.13061 • Published Dec 17, 2024 • 8

ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers

Paper • 2412.12571 • Published Dec 17, 2024 • 8

Learning from Massive Human Videos for Universal Humanoid Pose Control

Paper • 2412.14172 • Published Dec 18, 2024 • 10

LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer

Paper • 2412.13871 • Published Dec 18, 2024 • 18

DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes

Paper • 2412.11100 • Published Dec 15, 2024 • 6