DateLogicQA: Benchmarking Temporal Biases in Large Language Models Paper • 2412.13377 • Published 8 days ago • 2
view post Post 2381 Google drops Gemini 2.0 Flash Thinkinga new experimental model that unlocks stronger reasoning capabilities and shows its thoughts. The model plans (with thoughts visible), can solve complex problems with Flash speeds, and morenow available in anychat, try it out: akhaliq/anychat See translation 🚀 6 6 🔥 4 4 👀 1 1 + Reply
Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation Paper • 2406.02347 • Published Jun 4 • 2
FlexEvent: Event Camera Object Detection at Arbitrary Frequencies Paper • 2412.06708 • Published 16 days ago
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper • 2412.06559 • Published 16 days ago • 68
Unsupervised Video Domain Adaptation for Action Recognition: A Disentanglement Perspective Paper • 2208.07365 • Published Aug 15, 2022
Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier Paper • 2412.04261 • Published 20 days ago • 1
SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding Paper • 2412.04383 • Published 20 days ago • 4
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge Paper • 2411.19799 • Published 26 days ago • 10
Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation Paper • 2412.03304 • Published 21 days ago • 17
view post Post 3807 QwQ-32B-Preview is now available in anychatA reasoning model that is competitive with OpenAI o1-mini and o1-previewtry it out: akhaliq/anychat See translation 1 reply · ❤️ 3 3 👀 2 2 + Reply
view post Post 3676 New model drop in anychatallenai/Llama-3.1-Tulu-3-8B is now availabletry it here: akhaliq/anychat See translation 🔥 4 4 👍 1 1 + Reply
view post Post 2667 anychatsupports chatgpt, gemini, perplexity, claude, meta llama, grok all in one apptry it out there: akhaliq/anychat ❤️ 7 7 🚀 3 3 🔥 2 2 + Reply
Swan and ArabicMTEB: Dialect-Aware, Arabic-Centric, Cross-Lingual, and Cross-Cultural Embedding Models and Benchmarks Paper • 2411.01192 • Published Nov 2 • 3
M-RewardBench: Evaluating Reward Models in Multilingual Settings Paper • 2410.15522 • Published Oct 20 • 11
DynamicCity: Large-Scale LiDAR Generation from Dynamic Scenes Paper • 2410.18084 • Published Oct 23 • 13
Aligning Large Language Models via Self-Steering Optimization Paper • 2410.17131 • Published Oct 22 • 21