deepseek-ai/Janus-Pro-7B
Any-to-Any
β’
Updated
β’
975
Note A unified model for dense grounded understanding of images & videos.
Note 660B reasoning models with MIT license
Note A non transformer based ( ViT-MLP-LLM framework) VLM
Note 456B LLM with 1M tokens training context
Note Math model
Note End-side multimodal LLM that supports real time conversation and video understanding.
Note RNN+Transfomers
Note TTS
Note Medical LLM
Note Dataset designed specifically for natural language processing (NLP) tasks in the education sector.
Note A multimodel dataset for vision language pretraining , includes 6.5M images + 0.8B text from 22k hours of instructional videos
Text-to-3D and Image-to-3D Generation
A unified multimodal understanding and generation model.