Nicolay Rusnachenko

nicolay-r

https://nicolay-r.github.io/

AI & ML interests

Information Retrieval・Medical Multimodal NLP (🖼+📝) Research Fellow @BU_Research・software developer http://arekit.io・PhD in NLP

Recent Activity

reacted to merve's post with 👀 about 12 hours ago

QwQ can see 🔥 Qwen team released QvQ, a large vision LM with reasoning 😱 it outperforms proprietary VLMs on several benchmarks, comes with open weights and a demo! Check them out ⬇️ Demo https://huggingface.co./spaces/Qwen/QVQ-72B-preview Model https://huggingface.co./Qwen/QVQ-72B-Preview Read more https://qwenlm.github.io/blog/qvq-72b-preview/ Congratulations @JustinLin610 and team!

reacted to merve's post with 👍 about 12 hours ago

reacted to AdinaY's post with 👀 about 12 hours ago

QvQ-72B-Preview🎄 an open weight model for visual reasoning just released by Alibaba_Qwen team https://huggingface.co./collections/Qwen/qvq-676448c820912236342b9888 ✨ Combines visual understanding & language reasoning. ✨ Scores 70.3 on MMMU ✨ Outperforms Qwen2-VL-72B-Instruct in complex problem-solving

View all activity

Organizations

None yet

nicolay-r's activity

reacted to merve's post with 👀👍 about 12 hours ago

Post

1583

QwQ can see 🔥
Qwen team released QvQ, a large vision LM with reasoning 😱

it outperforms proprietary VLMs on several benchmarks, comes with open weights and a demo!
Check them out ⬇️
Demo Qwen/QVQ-72B-preview
Model Qwen/QVQ-72B-Preview
Read more https://qwenlm.github.io/blog/qvq-72b-preview/
Congratulations @JustinLin610 and team!

reacted to AdinaY's post with 👀🔥 about 12 hours ago

Post

1426

QvQ-72B-Preview🎄 an open weight model for visual reasoning just released by Alibaba_Qwen team
Qwen/qvq-676448c820912236342b9888
✨ Combines visual understanding & language reasoning.
✨ Scores 70.3 on MMMU
✨ Outperforms Qwen2-VL-72B-Instruct in complex problem-solving

reacted to hba123's post with 🚀 about 12 hours ago

Post

1188

Blindly applying algorithms without understanding the math behind them is not a good idea frmpv. So, I am on a quest to fix this!

I wrote my first hugging face article on how you would derive closed-form solutions for KL-regularised reinforcement learning problems - what is used for DPO.

Check it out: https://huggingface.co./blog/hba123/derivingdpo

posted an update about 16 hours ago

Post

412

📢 If you're aimed at quick experiment with LLM and known Chain-of-Thought (CoT) / prompt schema with no-string dependencies, then I have something relevant for you to share 💎

So far I released the updated version 📦 bulk-chain-0.25.0 📦, which is aimed at bringing accessible API for an instant LLM application towards massive data iterators using via predefined prompt schema 🎊

📦: https://pypi.org/project/bulk-chain/0.25.0/
🌟: https://github.com/nicolay-r/bulk-chain
📘: https://github.com/nicolay-r/bulk-chain/issues/26

The key updates of the most recent release are:
✅ 🪶 No-string (empty dependencies): you can use any framework / API for LLM.
✅ 🐍 Python API support (see first screenshot 📸).
✅ 💥 Native try-catch wrapping to guarantee no-data-lost on using remote providers especially: OpenAI, ReplicateIO, OpenRouter, etc.
✅ 🔥 Batching mode support: you may wrap for handling batches to significantly boost the performance 🚀 (see screenshot below for bath enabling 📸)
✅ 🔧 Fixed a lot of minor bugs

Quick start on GoogleColab:
📙: https://colab.research.google.com/github/nicolay-r/bulk-chain/blob/master/bulk_chain_tutorial.ipynb

📘 The wiki of the project is available here:
https://github.com/nicolay-r/bulk-chain/wiki/Project-Documentation

reacted to as-cle-bert's post with ❤️ about 17 hours ago

Post

861

Hi HuggingFacers!🤶🏼

As my last 2024 project, I've dropped a Discord Bot that knows a lot about Pokemons🦋

GitHub 👉 https://github.com/AstraBert/Pokemon-Bot
Demo Space 👉 as-cle-bert/pokemon-bot

The bot integrates:
- Chat features (Cohere's Command-R) with RAG functionalities (hybrid search and reranking with Qdrant) and chat memory (managed through PostgreSQL) to produce information about Pokemons
- Image-based search to identify Pokemons from their images (via Qdrant)
- Card package random extraction and description

HuggingFace🤗, as usual, plays the most important role in the application stack, with the following models:

- sentence-transformers/LaBSE
- prithivida/Splade_PP_en_v1
- facebook/dinov2-large

And datasets:

- Karbo31881/Pokemon_images
- wanghaofan/pokemon-wiki-captions
- TheFusion21/PokemonCards

Have fun!🍕

reacted to MonsterMMORPG's post with 👍 about 17 hours ago

Post

1238

Best open source Image to Video CogVideoX1.5-5B-I2V is pretty decent and optimized for low VRAM machines with high resolution - native resolution is 1360px and up to 10 seconds 161 frames - audios generated with new open source audio model

Full YouTube tutorial for CogVideoX1.5-5B-I2V : https://youtu.be/5UCkMzP2VLE

1-Click Windows, RunPod and Massed Compute installers : https://www.patreon.com/posts/112848192

https://www.patreon.com/posts/112848192 - installs into Python 3.11 VENV

Official Hugging Face repo of CogVideoX1.5-5B-I2V : THUDM/CogVideoX1.5-5B-I2V

Official github repo : https://github.com/THUDM/CogVideo

Used prompts to generate videos txt file : https://gist.github.com/FurkanGozukara/471db7b987ab8d9877790358c126ac05

Demo images shared in : https://www.patreon.com/posts/112848192

I used 1360x768px images at 16 FPS and 81 frames = 5 seconds

+1 frame coming from initial image

Also I have enabled all the optimizations shared on Hugging Face

pipe.enable_sequential_cpu_offload()

pipe.vae.enable_slicing()

pipe.vae.enable_tiling()

quantization = int8_weight_only - you need TorchAO and DeepSpeed works great on Windows with Python 3.11 VENV

Used audio model : https://github.com/hkchengrex/MMAudio

1-Click Windows, RunPod and Massed Compute Installers for MMAudio : https://www.patreon.com/posts/117990364

https://www.patreon.com/posts/117990364 - Installs into Python 3.10 VENV

Used very simple prompts - it fails when there is human in input video so use text to audio in such cases

I also tested some VRAM usages for CogVideoX1.5-5B-I2V

Resolutions and here their VRAM requirements - may work on lower VRAM GPUs too but slower

512x288 - 41 frames : 7700 MB , 576x320 - 41 frames : 7900 MB

576x320 - 81 frames : 8850 MB , 704x384 - 81 frames : 8950 MB

768x432 - 81 frames : 10600 MB , 896x496 - 81 frames : 12050 MB

896x496 - 81 frames : 12050 MB , 960x528 - 81 frames : 12850 MB

1 reply

reacted to suayptalha's post with 🔥 about 17 hours ago

Post

822

🚀 Introducing Substitution Cipher Solvers!

As @suayptalha and @Synd209 we are thrilled to share an update!

🔑 This project contains a text-to-text model designed to decrypt English and Turkish text encoded using a substitution cipher. In a substitution cipher, each letter in the plaintext is replaced by a corresponding, unique letter to form the ciphertext. The model leverages statistical and linguistic properties of English to make educated guesses about the letter substitutions, aiming to recover the original plaintext message.

These models were fine-tuned on T5-base. The models are for monoalphabetic English and Turkish substitution ciphers, and they output decoded text and the alphabet with an accuracy that has never been achieved before!

Example:

Encoded text: Z hztwgx tstcsf qf z ulooqfe osfuqb tzx uezx awej z ozewsbe vlfwby fsmqisfx.

Decoded text: A family member or a support person may stay with a patient during recovery.

Model Collection Link: Cipher-AI/substitution-cipher-solvers-6731ebd22f0f0d8e0e2e2e00

Organization Link: https://huggingface.co./Cipher-AI

Stay tuned for the paper 🤗!

2 replies

reacted to DawnC's post with ❤️ 1 day ago

Post

1212

🌟 PawMatchAI: Making Breed Selection More Intuitive! 🐕
Excited to share the latest update to this AI-powered companion for finding your perfect furry friend! The breed recommendation system just got a visual upgrade to help you make better decisions.

✨ What's New?
Enhanced breed recognition accuracy through strategic model improvements:
- Upgraded to a fine-tuned ConvNeXt architecture for superior feature extraction
- Implemented progressive layer unfreezing during training
- Optimized data augmentation pipeline for better generalization
- Achieved 8% improvement in breed classification accuracy

🎯 Key Features:
- Smart breed recognition powered by AI
- Visual matching scores with intuitive color indicators
- Detailed breed comparisons with interactive tooltips
- Lifestyle-based recommendations tailored to your needs

💭 Project Vision
Combining my passion for AI and pets, this project represents another step toward my goal of creating meaningful AI applications. Each update aims to make the breed selection process more accessible while improving the underlying technology.

👉 Try it now: DawnC/PawMatchAI

Your likes ❤️ on this space fuel this project's growth!

#AI #MachineLearning #DeepLearning #Pytorch #ComputerVision
See translation

reacted to sayakpaul's post with 🚀 1 day ago

Post

2863

Commits speak louder than words 🤪

* 4 new video models
* Multiple image models, including SANA & Flux Control
* New quantizers -> GGUF & TorchAO
* New training scripts

Enjoy this holiday-special Diffusers release 🤗
Notes: https://github.com/huggingface/diffusers/releases/tag/v0.32.0

reacted to singhsidhukuldeep's post with 🧠 1 day ago

Post

1898

Exciting News in AI: JinaAI Releases JINA-CLIP-v2!

The team at Jina AI has just released a groundbreaking multilingual multimodal embedding model that's pushing the boundaries of text-image understanding. Here's why this is a big deal:

🚀 Technical Highlights:
- Dual encoder architecture combining a 561M parameter Jina XLM-RoBERTa text encoder and a 304M parameter EVA02-L14 vision encoder
- Supports 89 languages with 8,192 token context length
- Processes images up to 512×512 pixels with 14×14 patch size
- Implements FlashAttention2 for text and xFormers for vision processing
- Uses Matryoshka Representation Learning for efficient vector storage

⚡️ Under The Hood:
- Multi-stage training process with progressive resolution scaling (224→384→512)
- Contrastive learning using InfoNCE loss in both directions
- Trained on massive multilingual dataset including 400M English and 400M multilingual image-caption pairs
- Incorporates specialized datasets for document understanding, scientific graphs, and infographics
- Uses hard negative mining with 7 negatives per positive sample

📊 Performance:
- Outperforms previous models on visual document retrieval (52.65% nDCG@5)
- Achieves 89.73% image-to-text and 79.09% text-to-image retrieval on CLIP benchmark
- Strong multilingual performance across 30 languages
- Maintains performance even with 75% dimension reduction (256D vs 1024D)

🎯 Key Innovation:
The model solves the long-standing challenge of unifying text-only and multi-modal retrieval systems while adding robust multilingual support. Perfect for building cross-lingual visual search systems!

Kudos to the research team at Jina AI for this impressive advancement in multimodal AI!

reacted to ginipick's post with 🔥 1 day ago

Post

2512

🎨 GiniGen Canvas-o3: Intelligent AI-Powered Image Editing Platform
Transform your images with precision using our next-generation tool that lets you extract anything from text to objects with simple natural language commands! 🚀
📌 Key Differentiators:

Intelligent Object Recognition & Extraction
• Freedom to select any target (text, logos, objects)
• Simple extraction via natural language commands ("dog", "signboard", "text")
• Ultra-precise segmentation powered by GroundingDINO + SAM
Advanced Background Processing
• AI-generated custom backgrounds for extracted objects
• Intuitive object size/position adjustment
• Multiple aspect ratio support (1:1, 16:9, 9:16, 4:3)
Progressive Text Integration
• Dual text placement: over or behind images
• Multi-language font support
• Real-time font style/size/color/opacity adjustment

🎯 Use Cases:

Extract logos from product images
Isolate text from signboards
Select specific objects from scenes
Combine extracted objects with new backgrounds
Layer text in front of or behind images

💫 Technical Features:

Natural language-based object detection
Real-time image processing
GPU acceleration & memory optimization
User-friendly interface

🎉 Key Benefits:

User Simplicity: Natural language commands for object extraction
High Precision: AI-powered accurate object recognition
Versatility: From basic editing to advanced content creation
Real-Time Processing: Instant result visualization

Experience the new paradigm of image editing with GiniGen Canvas-o3:

Seamless integration of multiple editing functions
Professional-grade results with consumer-grade ease
Perfect for social media, e-commerce, and design professionals

Whether you're extracting text from complex backgrounds or creating sophisticated visual content, GiniGen Canvas-o3 provides the precision and flexibility you need for modern image editing!

GO! ginigen/CANVAS-o3

2 replies

reacted to InferenceIllusionist's post with 🔥 3 days ago

Post

1861

MilkDropLM-32b-v0.3: Unlocking Next-Gen Visuals ✨

Stoked to release the latest iteration of our MilkDropLM project! This new release is based on the powerful Qwen2.5-Coder-32B-Instruct model using the same great dataset that powered our 7b model.

What's new?

- Genome Unlocked: Deeper understanding of preset relationships for more accurate and creative generations.

- Preset Revival: Breathe new life into old presets with our upgraded model!

- Loop-B-Gone: Say goodbye to pesky loops and hello to smooth generation.

- Natural Chats: Engage in more natural sounding conversations with our LLM than ever before.

Released under Apache 2.0, because sharing is caring!

Try it out: InferenceIllusionist/MilkDropLM-32b-v0.3

Shoutout to @superwatermelon for his invaluable insights and collab, and to all those courageous members in the community that have tested and provided feedback before!

reacted to ehristoforu's post with 🤗 3 days ago

Post

2687

✒️ Ultraset - all-in-one dataset for SFT training in Alpaca format.
fluently-sets/ultraset

❓ Ultraset is a comprehensive dataset for training Large Language Models (LLMs) using the SFT (instruction-based Fine-Tuning) method. This dataset consists of over 785 thousand entries in eight languages, including English, Russian, French, Italian, Spanish, German, Chinese, and Korean.

🤯 Ultraset solves the problem faced by users when selecting an appropriate dataset for LLM training. It combines various types of data required to enhance the model's skills in areas such as text writing and editing, mathematics, coding, biology, medicine, finance, and multilingualism.

🤗 For effective use of the dataset, it is recommended to utilize only the "instruction," "input," and "output" columns and train the model for 1-3 epochs. The dataset does not include DPO or Instruct data, making it suitable for training various types of LLM models.

❇️ Ultraset is an excellent tool to improve your language model's skills in diverse knowledge areas.

reacted to aaditya's post with 🔥 3 days ago

Post

3045

Last Week in Medical AI: Top Research Papers/Models 🔥
🏅 (December 15 – December 21, 2024)

Medical LLM & Other Models
- MedMax: Mixed-Modal Biomedical Assistant
- Advanced multimodal instruction tuning
- Enhanced biomedical knowledge integration
- Comprehensive assistant capabilities
- MGH Radiology Llama 70B
- Specialized radiology focus
- State-of-the-art performance
- Enhanced report generation capabilities
- HC-LLM: Historical Radiology Reports
- Context-aware report generation
- Historical data integration
- Improved accuracy in diagnostics

Frameworks & Methods
- ReflecTool: Reflection-Aware Clinical Agents
- Process-Supervised Clinical Notes
- Federated Learning with RAG
- Query Pipeline Optimization

Benchmarks & Evaluations
- Multi-OphthaLingua
- Multilingual ophthalmology benchmark
- Focus on LMICs healthcare
- Bias assessment framework
- ACE-M3 Evaluation Framework
- Multimodal medical model testing
- Comprehensive capability assessment
- Standardized evaluation metrics

LLM Applications
- Patient-Friendly Video Reports
- Medical Video QA Systems
- Gene Ontology Annotation
- Healthcare Recommendations

Special Focus: Medical Ethics & AI
- Clinical Trust Impact Study
- Mental Health AI Challenges
- Hospital Monitoring Ethics
- Radiology AI Integration

Now you can watch and listen to the latest Medical AI papers daily on our YouTube and Spotify channels as well!

- Full thread in detail:
https://x.com/OpenlifesciAI/status/1870504774162063760
- Youtube Link: youtu.be/SbFp4fnuxjo
- Spotify: https://t.co/QPmdrXuWP9

reacted to luigi12345's post with 👀 3 days ago

Post

2522

PERFECT FINAL PROMPT for Coding and Debugging.

Step 1: Generate the prompt that if sent to you will make you adjust the script so it meets each and every of the criteria it needs to meet to be 100% bug free and perfect.

Step 2: adjust the script following the steps and instructions in the prompt created in Step 1.

1 reply

reacted to prithivMLmods's post with 🤗 3 days ago

Post

4917

Sketchify 😉🎨

+ strangerzonehf/Flux-Sketch-Smudge-LoRA
+ strangerzonehf/Flux-Sketch-Sized-LoRA
+ strangerzonehf/Sketch-Paint

- strangerzonehf/sketch-fav-675ba869c7ceaec7e652ee1c

reacted to wenhuach's post with 👍 3 days ago

Post

2156

Are we the only providers of INT4 quantized models for Llama 3.2 VL?
OPEA/Llama-3.2-90B-Vision-Instruct-int4-sym-inc
OPEA/Llama-3.2-11B-Vision-Instruct-int4-sym-inc

posted an update 3 days ago

Post

2165

📢 So far I noticed that 🧠 reasoning with llm 🤖 in English is tend to be more accurate than in other languages.
However, besides the GoogleTrans and other open transparent translators, I could not find one that could be easy to use solutions to avoid:
1.🔴 Third-party framework installation
2.🔴 Text chunking
3.🔴 support of meta-annotation like spans / objects / etc.

💎 To cope problem of IR from non-english texts, I am happy to share the bulk-translate 0.25.0. 🎊

⭐ https://github.com/nicolay-r/bulk-translate

bulk-translate is a tiny Python 🐍 no-string framework that allows translate series of texts with the pre-annotated fixed-spans that are invariant for translator.

It supports 👨‍💻 API for quick data translation with (optionaly) annotated objects in texts (see figure below) in Python 🐍
I make it accessible as much as possible for RAG and / or LLM-powered app downstreams:
📘 https://github.com/nicolay-r/bulk-translate/wiki

All you have to do is to provide iterator of texts, where each text:
1. ✅ String object
2. ✅ List of strings and nested lists that represent spans (value + any ID data).

🤖 By default I provide a wrapper over googletrans which you can override with your own 🔥
https://github.com/nicolay-r/bulk-translate/blob/master/models/googletrans_310a.py