NYU VisionX

university

https://www.sainingxie.com/

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

jihanyang authored a paper 5 days ago

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

rilynhan authored a paper 5 days ago

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

sainx authored a paper 5 days ago

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

View all activity

nyu-visionx's activity

sayakpaul

posted an update 1 day ago

Post

2198

Commits speak louder than words 🤪

* 4 new video models
* Multiple image models, including SANA & Flux Control
* New quantizers -> GGUF & TorchAO
* New training scripts

Enjoy this holiday-special Diffusers release 🤗
Notes: https://github.com/huggingface/diffusers/releases/tag/v0.32.0

jihanyang

authored a paper 5 days ago

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Paper • 2412.14171 • Published 7 days ago • 22

rilynhan

authored a paper 5 days ago

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Paper • 2412.14171 • Published 7 days ago • 22

sainx

authored a paper 5 days ago

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Paper • 2412.14171 • Published 7 days ago • 22

ShushengYang

authored 5 papers 6 days ago

Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection

Paper • 2204.02964 • Published Apr 6, 2022

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Paper • 2412.14171 • Published 7 days ago • 22

ShushengYang

updated a dataset 6 days ago

nyu-visionx/VSI-Bench

Viewer • Updated 6 days ago • 5.13k • 213 • 17

sayakpaul

posted an update 7 days ago

Post

1547

In the past seven days, the Diffusers team has shipped:

1. Two new video models
2. One new image model
3. Two new quantization backends
4. Three new fine-tuning scripts
5. Multiple fixes and library QoL improvements

Coffee on me if someone can guess 1 - 4 correctly.

1 reply

rilynhan

updated a dataset 7 days ago

nyu-visionx/VSI-Bench

Viewer • Updated 6 days ago • 5.13k • 213 • 17

jihanyang

updated a dataset 7 days ago

nyu-visionx/VSI-Bench

Viewer • Updated 6 days ago • 5.13k • 213 • 17

sayakpaul

posted an update 15 days ago

Post

2040

Introducing a high-quality open-preference dataset to further this line of research for image generation.

Despite being such an inseparable component for modern image generation, open preference datasets are a rarity!

So, we decided to work on one with the community!

Check it out here:
https://huggingface.co./blog/image-preferences

7 replies

sayakpaul

posted an update 16 days ago

Post

2096

The Control family of Flux from @black-forest-labs should be discussed more!

It enables structural controls like ControlNets while being significantly less expensive to run!

So, we're working on a Control LoRA training script 🤗

It's still WIP, so go easy:
https://github.com/huggingface/diffusers/pull/10130

sayakpaul

authored a paper 19 days ago

A Noise is Worth Diffusion Guidance

Paper • 2412.03895 • Published 20 days ago • 27

sayakpaul

posted an update 26 days ago

Post

1466

Let 2024 be the year of video model fine-tunes!

Check it out here:
https://github.com/a-r-r-o-w/cogvideox-factory/tree/main/training/mochi-1

sayakpaul

posted an update about 1 month ago

Post

2599

It's been a while we shipped native quantization support in diffusers 🧨

We currently support bistandbytes as the official backend but using others like torchao is already very simple.

This post is just a reminder of what's possible:

1. Loading a model with a quantization config
2. Saving a model with quantization config
3. Loading a pre-quantized model
4. enable_model_cpu_offload()
5. Training and loading LoRAs into quantized checkpoints

Docs:
https://huggingface.co./docs/diffusers/main/en/quantization/bitsandbytes

1 reply

sayakpaul

posted an update 3 months ago

Post

2752

Did some little experimentation to resize pre-trained LoRAs on Flux. I explored two themes:

* Decrease the rank of a LoRA
* Increase the rank of a LoRA

The first one is helpful in reducing memory requirements if the LoRA is of a high rank, while the second one is merely an experiment. Another implication of this study is in the unification of LoRA ranks when you would like to torch.compile() them.

Check it out here:
sayakpaul/flux-lora-resizing

1 reply

sayakpaul

authored a paper 4 months ago

LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs

Paper • 2408.13467 • Published Aug 24 • 24

AI & ML interests

Recent Activity

Team members 13

nyu-visionx's activity