xiaotianhan (Xiaotian Han)

reacted to their post with 🚀 4 months ago

Post

887

🚀 Excited to announce the release of InfiMM-WebMath-40B — the largest open-source multimodal pretraining dataset designed to advance mathematical reasoning in AI! 🧮✨

With 40 billion tokens, this dataset aims for enhancing the reasoning capabilities of multimodal large language models in the domain of mathematics.

If you're interested in MLLMs, AI, and math reasoning, check out our work and dataset:

🤗 HF: InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning (2409.12568)
📂 Dataset: Infi-MM/InfiMM-WebMath-40B

posted an update 4 months ago

Post

887

🚀 Excited to announce the release of InfiMM-WebMath-40B — the largest open-source multimodal pretraining dataset designed to advance mathematical reasoning in AI! 🧮✨

With 40 billion tokens, this dataset aims for enhancing the reasoning capabilities of multimodal large language models in the domain of mathematics.

If you're interested in MLLMs, AI, and math reasoning, check out our work and dataset:

🤗 HF: InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning (2409.12568)
📂 Dataset: Infi-MM/InfiMM-WebMath-40B

reacted to victor's post with 🚀 9 months ago

Post

4304

The hype is real: a mysterious gpt2-chatbot model has appeared on the LLM Arena Leaderboard 👀.
It seems to be at least on par with the top performing models (closed and open).

To try it out: https://chat.lmsys.org/ -> then click on the Direct Chat tab and select gpt2-chatbot.

Take your bet, what do you think it is?

4 replies

·

replied to their post 9 months ago

Thanks for your interest, yeah, we will open source our code and pretrained weights soon.

posted an update 10 months ago

Post

2100

🎉 🎉 🎉 Happy to share our recent work. We noticed that image resolution plays an important role, either in improving multi-modal large language models (MLLM) performance or in Sora style any resolution encoder decoder, we hope this work can help lift restriction of 224x224 resolution limit in ViT.

ViTAR: Vision Transformer with Any Resolution (2403.18361)

2 replies

·

reacted to akhaliq's post with 👍 11 months ago

Post

LongRoPE

Extending LLM Context Window Beyond 2 Million Tokens

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens (2402.13753)

Large context window is a desirable feature in large language models (LLMs). However, due to high fine-tuning costs, scarcity of long texts, and catastrophic values introduced by new token positions, current extended context windows are limited to around 128k tokens. This paper introduces LongRoPE that, for the first time, extends the context window of pre-trained LLMs to an impressive 2048k tokens, with up to only 1k fine-tuning steps at within 256k training lengths, while maintaining performance at the original short context window. This is achieved by three key innovations: (i) we identify and exploit two forms of non-uniformities in positional interpolation through an efficient search, providing a better initialization for fine-tuning and enabling an 8x extension in non-fine-tuning scenarios; (ii) we introduce a progressive extension strategy that first fine-tunes a 256k length LLM and then conducts a second positional interpolation on the fine-tuned extended LLM to achieve a 2048k context window; (iii) we readjust LongRoPE on 8k length to recover the short context window performance. Extensive experiments on LLaMA2 and Mistral across various tasks demonstrate the effectiveness of our method. Models extended via LongRoPE retain the original architecture with minor modifications to the positional embedding, and can reuse most pre-existing optimizations.

3 replies

·

reacted to DmitryRyumin's post with ❤️ 11 months ago

Post

2 replies

·

reacted to dvilasuero's post with ❤️ 11 months ago

Post

🚀🧙🏼‍♂️Introducing OpenHermesPreferences: the largest open AI feedback dataset for RLHF & DPO

> Using LLMs to improve other LLMs, at scale!

Built in collaboration with the H4 Hugging Face team, it's a 1M preferences dataset on top of the amazing @teknium 's dataset.

Dataset:
argilla/OpenHermesPreferences

The dataset is another example of open collaboration:

> The H4 team created responses with Mixtral using llm-swarm

> Argilla created responses with NousResearch Hermes-2-Yi-34B using distilabel

> The H4 ranked these responses + original response with PairRM from AllenAI, University of Southern California, Zhejiang University ( @yuchenlin @DongfuTingle and colleagues)

We hope this dataset will help the community's research efforts towards understanding the role of AI feedback for LLM alignment.

We're particularly excited about the ability of filtering specific subsets to improve LLM skills like math or reasoning.

Here's how easy it is to filter by subset:

ds = load_dataset("HuggingFaceH4/OpenHermesPreferences", split="train")

# Get the categories of the source dataset
# ['airoboros2.2', 'CamelAI', 'caseus_custom', ...]
sources = ds.unique("source")

# Filter for a subset
ds_filtered = ds.filter(lambda x : x["source"] in ["metamath", "EvolInstruct_70k"], num_proc=6)

As usual, all the scripts to reproduce this work are available and open to the community!

argilla/OpenHermesPreferences

So fun collab between @vwxyzjn , @plaguss , @kashif , @philschmid & @lewtun !

Open Source AI FTW!

4 replies

·

reacted to their post with ❤️👍 12 months ago

Post

Thrilled to share some of our recent work in the field of Multimodal Large Language Models (MLLMs).

1️⃣ A Survey on Multimodal Reasoning 📚
Are you curious about the reasoning abilities of MLLMs? In our latest survey, we delve into the world of multimodal reasoning. We comprehensively review existing evaluation protocols, categorize the frontiers of MLLMs, explore recent trends in their applications for reasoning-intensive tasks, and discuss current practices and future directions. For an in-depth exploration, check out our paper: Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning (2401.06805)

2️⃣ Advancing Flamingo with InfiMM 🔥
Building upon the foundation of Flamingo, we introduce the InfiMM model series. InfiMM is a reproduction of Flamingo, enhanced with stronger Large Language Models (LLMs) such as LLaMA2-13B, Vicuna-13B, and Zephyr7B. We've meticulously filtered pre-training data and fine-tuned instructions, resulting in superior performance on recent benchmarks like MMMU, InfiMM-Eval, MM-Vet, and more. Explore the power of InfiMM on Huggingface: Infi-MM/infimm-zephyr

3️⃣ Exploring Multimodal Instruction Fine-tuning 🖼️
Visual Instruction Fine-tuning (IFT) is crucial for aligning MLLMs' output with user intentions. Our research identified challenges with models trained on the LLaVA-mix-665k dataset, particularly in multi-round dialog settings. To address this, we've created a new IFT dataset with high-quality, diverse instruction annotations and images sourced exclusively from the COCO dataset. Our experiments demonstrate that when fine-tuned with this dataset, MLLMs excel in open-ended evaluation benchmarks for both single-round and multi-round dialog settings. Dive into the details in our paper: COCO is "ALL'' You Need for Visual Instruction Fine-tuning (2401.08968)

Stay tuned for more exciting developments.
Special thanks to all our collaborators: @Ye27 @wwyssh @Yongfei @Yi-Qi638 @xudonglin @KhalilMrini @lllliuhhhhggg @Borise @Hongxia

reacted to their post with 🤗 about 1 year ago

Post

Thrilled to share some of our recent work in the field of Multimodal Large Language Models (MLLMs).

1️⃣ A Survey on Multimodal Reasoning 📚
Are you curious about the reasoning abilities of MLLMs? In our latest survey, we delve into the world of multimodal reasoning. We comprehensively review existing evaluation protocols, categorize the frontiers of MLLMs, explore recent trends in their applications for reasoning-intensive tasks, and discuss current practices and future directions. For an in-depth exploration, check out our paper: Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning (2401.06805)

2️⃣ Advancing Flamingo with InfiMM 🔥
Building upon the foundation of Flamingo, we introduce the InfiMM model series. InfiMM is a reproduction of Flamingo, enhanced with stronger Large Language Models (LLMs) such as LLaMA2-13B, Vicuna-13B, and Zephyr7B. We've meticulously filtered pre-training data and fine-tuned instructions, resulting in superior performance on recent benchmarks like MMMU, InfiMM-Eval, MM-Vet, and more. Explore the power of InfiMM on Huggingface: Infi-MM/infimm-zephyr

3️⃣ Exploring Multimodal Instruction Fine-tuning 🖼️
Visual Instruction Fine-tuning (IFT) is crucial for aligning MLLMs' output with user intentions. Our research identified challenges with models trained on the LLaVA-mix-665k dataset, particularly in multi-round dialog settings. To address this, we've created a new IFT dataset with high-quality, diverse instruction annotations and images sourced exclusively from the COCO dataset. Our experiments demonstrate that when fine-tuned with this dataset, MLLMs excel in open-ended evaluation benchmarks for both single-round and multi-round dialog settings. Dive into the details in our paper: COCO is "ALL'' You Need for Visual Instruction Fine-tuning (2401.08968)

Stay tuned for more exciting developments.
Special thanks to all our collaborators: @Ye27 @wwyssh @Yongfei @Yi-Qi638 @xudonglin @KhalilMrini @lllliuhhhhggg @Borise @Hongxia

posted an update about 1 year ago

Post

Thrilled to share some of our recent work in the field of Multimodal Large Language Models (MLLMs).

1️⃣ A Survey on Multimodal Reasoning 📚
Are you curious about the reasoning abilities of MLLMs? In our latest survey, we delve into the world of multimodal reasoning. We comprehensively review existing evaluation protocols, categorize the frontiers of MLLMs, explore recent trends in their applications for reasoning-intensive tasks, and discuss current practices and future directions. For an in-depth exploration, check out our paper: Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning (2401.06805)

2️⃣ Advancing Flamingo with InfiMM 🔥
Building upon the foundation of Flamingo, we introduce the InfiMM model series. InfiMM is a reproduction of Flamingo, enhanced with stronger Large Language Models (LLMs) such as LLaMA2-13B, Vicuna-13B, and Zephyr7B. We've meticulously filtered pre-training data and fine-tuned instructions, resulting in superior performance on recent benchmarks like MMMU, InfiMM-Eval, MM-Vet, and more. Explore the power of InfiMM on Huggingface: Infi-MM/infimm-zephyr

3️⃣ Exploring Multimodal Instruction Fine-tuning 🖼️
Visual Instruction Fine-tuning (IFT) is crucial for aligning MLLMs' output with user intentions. Our research identified challenges with models trained on the LLaVA-mix-665k dataset, particularly in multi-round dialog settings. To address this, we've created a new IFT dataset with high-quality, diverse instruction annotations and images sourced exclusively from the COCO dataset. Our experiments demonstrate that when fine-tuned with this dataset, MLLMs excel in open-ended evaluation benchmarks for both single-round and multi-round dialog settings. Dive into the details in our paper: COCO is "ALL'' You Need for Visual Instruction Fine-tuning (2401.08968)

Stay tuned for more exciting developments.
Special thanks to all our collaborators: @Ye27 @wwyssh @Yongfei @Yi-Qi638 @xudonglin @KhalilMrini @lllliuhhhhggg @Borise @Hongxia

Xiaotian Han

AI & ML interests

Recent Activity

Organizations

xiaotianhan's activity