Post
1401
What a beginning to this year in open ML π€
Let's unwrap! merve/jan-10-releases-677fe34177759de0edfc9714
Multimodal πΌοΈ
> ByteDance released SA2VA: a family of vision LMs that can take image, video, text and visual prompts
> moondream2 is out with new capabilities like outputting structured data and gaze detection!
> Dataset: Alibaba DAMO lab released multimodal textbook β 22k hours worth of samples from instruction videos π€―
> Dataset: SciCap captioning on scientific documents benchmark dataset is released along with the challenge!
LLMs π¬
> Microsoft released Phi-4, sota open-source 14B language model π₯
> Dolphin is back with Dolphin 3.0 Llama 3.1 8B π¬π¬
> Prime-RL released Eurus-2-7B-PRIME a new language model trained using PRIME alignment
> SmallThinker-3B is a new small reasoning LM based on Owen2.5-3B-Instruct π
> Dataset: QWQ-LONGCOT-500K is the dataset used to train SmallThinker, generated using QwQ-32B-preview π
> Dataset: @cfahlgren1 released React Code Instructions: a dataset of code instruction-code pairs π
> Dataset: Qwen team is on the roll, they just released CodeElo, a dataset of code preferences π©π»βπ»
Embeddings π
> @MoritzLaurer released zero-shot version of ModernBERT large π
> KaLM is a new family of performant multilingual embedding models with MIT license built using Qwen2-0.5B
Image/Video Generation β―οΈ
> NVIDIA released Cosmos, a new family of diffusion/autoregressive World Foundation Models generating worlds from images, videos and texts π₯
> Adobe released TransPixar: a new text-to-video model that can generate assets with transparent backgrounds (a first!)
> Dataset: fal released cosmos-openvid-1m Cosmos-tokenized OpenVid-1M with samples from OpenVid-1M
Others
> Prior Labs released TabPFNv2, the best tabular transformer is out for classification and regression
> Metagene-1 is a new RNA language model that can be used for pathogen detection, zero-shot embedding and genome understanding
Let's unwrap! merve/jan-10-releases-677fe34177759de0edfc9714
Multimodal πΌοΈ
> ByteDance released SA2VA: a family of vision LMs that can take image, video, text and visual prompts
> moondream2 is out with new capabilities like outputting structured data and gaze detection!
> Dataset: Alibaba DAMO lab released multimodal textbook β 22k hours worth of samples from instruction videos π€―
> Dataset: SciCap captioning on scientific documents benchmark dataset is released along with the challenge!
LLMs π¬
> Microsoft released Phi-4, sota open-source 14B language model π₯
> Dolphin is back with Dolphin 3.0 Llama 3.1 8B π¬π¬
> Prime-RL released Eurus-2-7B-PRIME a new language model trained using PRIME alignment
> SmallThinker-3B is a new small reasoning LM based on Owen2.5-3B-Instruct π
> Dataset: QWQ-LONGCOT-500K is the dataset used to train SmallThinker, generated using QwQ-32B-preview π
> Dataset: @cfahlgren1 released React Code Instructions: a dataset of code instruction-code pairs π
> Dataset: Qwen team is on the roll, they just released CodeElo, a dataset of code preferences π©π»βπ»
Embeddings π
> @MoritzLaurer released zero-shot version of ModernBERT large π
> KaLM is a new family of performant multilingual embedding models with MIT license built using Qwen2-0.5B
Image/Video Generation β―οΈ
> NVIDIA released Cosmos, a new family of diffusion/autoregressive World Foundation Models generating worlds from images, videos and texts π₯
> Adobe released TransPixar: a new text-to-video model that can generate assets with transparent backgrounds (a first!)
> Dataset: fal released cosmos-openvid-1m Cosmos-tokenized OpenVid-1M with samples from OpenVid-1M
Others
> Prior Labs released TabPFNv2, the best tabular transformer is out for classification and regression
> Metagene-1 is a new RNA language model that can be used for pathogen detection, zero-shot embedding and genome understanding