JustinLin610 (Junyang Lin)

posted an update 11 months ago

Post

3753

Finally, Qwen1.5-110B is out! With weights and demo!

Blog: https://qwenlm.github.io/blog/qwen1.5-110b/
Demo: Qwen/Qwen1.5-110B-Chat-demo
Base: Qwen/Qwen1.5-110B
Chat: Qwen/Qwen1.5-110B-Chat

This model has some specific features:
* GQA
* 32K token context length
* Multilingual support

We feel good about its performance on benchmarks, including those for base models and chat models, but we still need more of your testing and feedback to help us know its capabilities and limitations!

Additionally, the base model has not learned chatml tokens. Yeah if you use chatml format, you need to be careful about it!

Enjoy and stay tuned for Qwen2!

1 reply

·

reacted to osanseviero's post with 👍🔥❤️ 11 months ago

Post

3556

Diaries of Open Source. Part 12 🤗

🚀Alibaba releases Qwen1.5-MoE-A2.7B, an interesting MoE with 2.7B activated parameters and 64 experts
Blog https://qwenlm.github.io/blog/qwen-moe/
Demo: Qwen/qwen1.5-MoE-A2.7B-Chat-demo
Models: https://hf.co/Qwen
GitHub: https://github.com/QwenLM/Qwen1.5

🎵VoiceCraft, SOTA speech editing and text to speech
GitHub: https://github.com/jasonppy/VoiceCraft
Model: pyp1/VoiceCraft

🐍 AI21Labs release Jamba, an SSM-Transformer, pretrained MoE which allows a large context window (256K) and high throughput
Blog https://www.ai21.com/blog/announcing-jamba
Model ai21labs/Jamba-v0.1

✨ Berkeley releases Starling-LM-7B, an RLHF-ed model, and -RM-34B, a Yi-based reward model very good for its size
Starling Beta: Nexusflow/Starling-LM-7B-beta
Starling RM: Nexusflow/Starling-RM-34B

🖥️Stability releases Stable Code Instruct 3B, an instruct model for code generation
Blog: https://stability.ai/news/introducing-stable-code-instruct-3b
Demo: stabilityai/stable-code-instruct-3b
Report: https://stability.ai/s/Stable_Code_TechReport_release.pdf

📚Common Corpus: the largest public domain dataset for training LLMs
Blog: https://hf.co/blog/Pclanglais/common-corpus
Dataset: https://hf.co/collections/PleIAs/common-corpus-65d46e3ea3980fdcd66a5613

Misc:
⚡GaLore: a very memory-efficient technique that allows pretraining models in consumer GPUs https://hf.co/blog/galore
Moirai
📈Moirai, foundation models for time series forecasting https://hf.co/collections/Salesforce/moirai-10-r-models-65c8d3a94c51428c300e0742
🔥 Mistral-ORPO-Capybara-7K, a high-quality Mistral fine-tune using ORPO, a new alignment technique kaist-ai/mistral-orpo-capybara-7k
🤯APISR, an anime super-resolution upscaling model HikariDawn/APISR

4 replies

·

replied to their post 11 months ago

gguf quantization?

posted an update 12 months ago

Post

4032

Just now, we release a small MoE model, Qwen1.5-MoE-A2.7B, a 14B model with 2.7B activated parameters. Leaving the hype, I would love to share more things here in HF. But if you don't know much about this, check our blog for more info: https://qwenlm.github.io/blog/qwen-moe/

At the beginning, it was trying with the MoE stuff, making Megatron work well with MegaBlocks. As always, we worked with small ones first. However, we have been struggling with a lot of details.

With megablocks and so many tricks that make training MoE models work, it is almost impossible to fail. The challenge is actually how good your model is. Then things became more complex than I had expected. Finegrained experts actually pissed me off but damn it works for the model at this scale. However, it brings complexity to the model, and this is somehow why at this moment our codes are not merged into llama.cpp cuz it really brings problems. Shared experts might be good, but we need more engineering efforts to really unleash its benefits in inference acceleration.

For the community, this is actually our first time releasing an MoE model. We don't know what will happen to us, but we are prepared for complaints. I just hope that we can really make things clear, and provide a good recipe to play with our MoE model just like people playing with Mixtral.

1 reply

·

posted an update about 1 year ago

Post

https://qwen.readthedocs.io/ 🔥 The official doc of Qwen1.5 is coming! This is a bilingual doc (English and Chinese, and it will be multilingual if I have time for them). The doc includes instructions for simple inference, running locally with GGUF, ollama, etc., quantization, finetuning, deployment, etc. We will continue adding more stuff to the doc. Stay tuned!

2 replies

·

replied to their post about 1 year ago

No I did not say it is SOTA. It is impossible for such a small model to be very powerful but it might be useful in some cases I guess.

reacted to yuchenlin's post with ❤️ about 1 year ago

Post

Introducing Vision Arena (beta)! Based on the lmsys's ChatbotArena, we create a simple demo for testing different Vision LMs (VLMs). We now support GPT-4V, Gemini-Pro-Vision, and Llava. More updates and models will come soon! We are still in the development stage and for now and we'd love to hear your feedback and suggestions! Please help us vote for better VLMs in your own use cases here! :D Kudos to Yujie Lu (UCSB)!
WildVision/vision-arena

3 replies

·

posted an update about 1 year ago

Post

Yesterday we just released Qwen1.5. Maybe someday I can tell more about the experience. But this is is at least a good release even if it is not yet SOTA. There is not so many SOTA by the way. This time, we actually fixed a lot of problems.

1. Context lengths are finally unified for all sizes. Previously, a lot of users kept telling us that 14B only supports 2K (Yeah even dynamic NTK does not work that well and it can only be extended to around 4-5K. Let alone those know nothing about how to use dynamic NTK).

2. If you carefully use our base language models, you will find that they understand special tokens of ChatML, which means that you can directly use LoRA to train on data with ChatML format. Why you can't do this before? This is because if the base language model does not understand the special tokens, you need to make them trained, which means that you should turn on the training of embedding. This is disgusting and it often leads to problems when you use ZeRO3.

3. We did strengthen our base language models except for 72. You should find better base language models, especially for 7 and 14. Why not 72? Nah, hard to say, but will make it better.

4. About the multilingual capabilities. Yes we finally build up our multilingual evaluation system and find out that our new base language models have nice performance in multilingual evaluation for base language models. This tells us that we should pay more attention to the post-training with multilingual data. And we did that too. This is why this time we tell you something about multilingual performance. It is for sure much much better than our models before this release.

5. Chat models are the most promising stuff. Before this release, we gave you the SFT models. But this time, we had very nice SFT+DPO models. Yeah not only annotators like them but also users like them. I am sure you developers will feel that way too.

5 replies

·

Junyang Lin

AI & ML interests

Recent Activity

Organizations

JustinLin610's activity