openmmlab (OpenMMLab)

vansin

posted an update 3 months ago

Post

869

Try InternThinker~

https://internlm-chat.intern-ai.org.cn/internthinker

vansin

posted an update 3 months ago

Post

1270

Amazing !!!! test Post

vansin

authored a paper 6 months ago

TRR360D: A dataset for 360 degree rotated rectangular box table detection

Paper • 2303.01894 • Published Mar 3, 2023

xianbao

posted an update 6 months ago

Post

2029

With the open-weight release of CogVideoX-5B from THUDM, i.e. GLM team, the Video Generation Model (how about calling it VGM) field has officially became the next booming "LLM"

What does the landscape look like? What are other video generation models? This collection below is all your need.

xianbao/video-generation-models-66c350163c74f60f5c412af6

The above video is generated by @a-r-r-o-w with CogVideoX-5B, taken from a nice lookout for the field!

ly015

authored a paper 8 months ago

GTA: A Benchmark for General Tool Agents

Paper • 2407.08713 • Published Jul 11, 2024 • 17

xianbao

authored a paper 9 months ago

PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents

Paper • 2406.13923 • Published Jun 20, 2024 • 23

xianbao

posted an update 10 months ago

Post

1975

Why Apache 2.0 Matters for LLMs 🤔

@01AI_Yi recently switched from a permissive & commercially friendly license, to Apache 2.0. And the community loved it! 🚀

@JustinLin610 also had a poll on model license and the majority votes for Apache 2.0.

Why it is a Big Deal? ⬇️

📚 Legal Simplicity: Custom licenses need costly & time-consuming legal review. Apache 2.0 is well-known & easier for legal teams to handle.

👩‍💻 Developer-Friendly: Legal docs are a pain for devs! Apache 2.0 is well-known and tech-friendly, making it easier for non-native developers to understand the implications too.

🔗 Easier Integration: Apache 2.0 is compatible with many other licenses, simplifying tasks like model merging with models of different licensing requirements.

🚫 No Permission Needed: Custom licenses often require explicit permission and additional documentation work of filling forms, creating barriers. Apache 2.0 removes this hurdle, letting devs focus on innovation.

There are a lot interesting discussions from
@JustinLin610 's poll: https://x.com/JustinLin610/status/1793559737482764375 which inspired this thread.

Any other thoughts? Let me know ^^

1 reply

·

xianbao

posted an update 10 months ago

Post

1267

DeepSeekV2 is a big deal. Not only because its significant improvements to both key components of Transformer: the Attention layer and FFN layer.

It has also completed disrupted the Chines LLM market and forcing the competitors to drop the price to 1% of the original price.

---

There are two key components in Transformer architecture: the self-attention layer, which captures relationships between tokens in context, and the Feed-Forward Network (FFN) layer, which stores knowledge.

DeepSeek V2 introduces optimizations to both:

Attention layer normally uses KV Cache to reduce repetitive compute, but it consumes significant GPU RAM, limiting concurrent requests. DeepSeek V2 introduces Multi-head Latent Attention (MLA), which stores only a small latent representation, resulting in substantial RAM savings.

DeepSeek V2 utilizes 162 experts instead of the usual 8 as in Mixtral. This approach segments experts into finer granularity for higher specialization and more accurate knowledge acquisition. Activating only a small subset of experts for each token, leads to efficient processing.

It disrupted the market by dropping API prices to $0.14 per 1M tokens. This dramatic reduction forced competitors like GLM, Ernie, and QWen to follow suit, lowering their prices to 1% of their original offerings. Now, users can access these APIs at 1/35th the cost of ChatGPT-4o.

xianbao

posted an update 11 months ago

Post

1885

So hard to keep up with pace!!! Lots of new Chinese fine-tunes are being released on HF

So I asked my agent to create a collection
xianbao/llama3-zh-662ba8503bdfe51948a28403

code: https://colab.research.google.com/drive/1ap6fP-VytZE367Nqk26DeQqgQkYaf-cD#scrollTo=eljRbYb4c92M

Would be nice to run then regularly. Any thoughts / suggestions on where to host this cron job?

1 reply

·

xianbao

posted an update about 1 year ago

Post

Welcome Bunny! A family of lightweight but powerful multimodal models from BAAI

With detailed work on dataset curation, the Bunny-3B model built upon SigLIP and Phi-2 achieves performance on par with 13B models.

Model: BAAI/bunny-phi-2-siglip-lora

2 replies

·

xianbao

posted an update about 1 year ago

Post

There appears to be a huge misunderstanding regarding the licensing requirements for open sourced Chinese speaking speaking LLMs on
@huggingface

I initially shared this misconception too, but after conducting some research, I came up with the list below.

Veryimpressive!

xianbao

posted an update about 1 year ago

Post

Vision LLM for #edgecomputing ?

@openbmb , who OS'ed the UltraFeedback dataset before, released a series of strong eco-friendly yet powerful LLMs

- MiniCPM: 2B model that competes with Mistral-7B

- MiniCPM-V: 3B vision LLM on edge!
- MiniCPM-V: 3B vision LLM on edge!

ly015

authored 2 papers about 1 year ago

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model

Paper • 2401.16420 • Published Jan 29, 2024 • 55

Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively

Paper • 2401.02955 • Published Jan 5, 2024 • 22

nielsr

updated 2 models over 1 year ago

openmmlab/upernet-convnext-base

Image Segmentation • Updated Jun 23, 2023 • 269 • 1

openmmlab/upernet-swin-small

Image Segmentation • Updated Jun 23, 2023 • 627 • 5

nielsr

updated 4 models almost 2 years ago

OpenMMLab

AI & ML interests

openmmlab's activity

TRR360D: A dataset for 360 degree rotated rectangular box table detection

GTA: A Benchmark for General Tool Agents

PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model

Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively

openmmlab/upernet-convnext-base

openmmlab/upernet-swin-small

openmmlab/upernet-convnext-xlarge

openmmlab/upernet-swin-base

openmmlab/upernet-swin-large

openmmlab/upernet-convnext-tiny

AI & ML interests

Team members 5

openmmlab's activity