Clem ๐Ÿค—'s picture

Clem ๐Ÿค— PRO

clem

AI & ML interests

multi-modal, time-series, biology and chemistry

Recent Activity

Organizations

Hugging Face's profile picture Pied Piper's profile picture Objective Function's profile picture Society & Ethics's profile picture Organization's profile picture Text Generation Inference's profile picture testifly's profile picture HugGAN Community's profile picture Hugging Face Fellows's profile picture Gradio-Blocks-Party's profile picture HuggingFaceM4's profile picture Open-Source AI Meetup's profile picture Hugging Face OSS Metrics's profile picture Hugging Face Smol Cluster's profile picture huggingPartyParis's profile picture Unofficial Mistral Community's profile picture Journalists on Hugging Face's profile picture Major TOM's profile picture MLX Community's profile picture Miami AI Hub's profile picture Social Post Explorers's profile picture Paris AI Running Club's profile picture Hugging Face for Legal's profile picture Hugging Face Party @ PyTorch Conference's profile picture Nerdy Face's profile picture open/ acc's profile picture Bluesky Community's profile picture

clem's activity

reacted to etemiz's post with โค๏ธ about 18 hours ago
view post
Post
384
Should I create an organization tackling the AI--human alignment problem. Finding the humans that care about other humans most and basically pretraining with their stuff.. I already did some experiments and it seems to work well.

Want to know about my experiments?

Who would be interested to join?
reacted to wenhuach's post with ๐Ÿš€ about 18 hours ago
reacted to nyuuzyou's post with ๐Ÿ‘ about 18 hours ago
view post
Post
768
๐ŸŽฎ GoodGame.ru Clips Dataset - nyuuzyou/goodgame

A collection of 39,280 video clips metadata from GoodGame.ru streaming platform featuring:

- Complete clip information including direct video URLs and thumbnails
- Streamer details like usernames and avatars
- Engagement metrics such as view counts
- Game categories and content classifications
- Released under Creative Commons Zero (CC0) license

This extensive clips collection provides a valuable resource for developing and evaluating video-based AI applications, especially in Russian gaming and streaming contexts.
reacted to singhsidhukuldeep's post with ๐Ÿ”ฅ about 18 hours ago
view post
Post
1134
Exciting News in AI: JinaAI Releases JINA-CLIP-v2!

The team at Jina AI has just released a groundbreaking multilingual multimodal embedding model that's pushing the boundaries of text-image understanding. Here's why this is a big deal:

๐Ÿš€ Technical Highlights:
- Dual encoder architecture combining a 561M parameter Jina XLM-RoBERTa text encoder and a 304M parameter EVA02-L14 vision encoder
- Supports 89 languages with 8,192 token context length
- Processes images up to 512ร—512 pixels with 14ร—14 patch size
- Implements FlashAttention2 for text and xFormers for vision processing
- Uses Matryoshka Representation Learning for efficient vector storage

โšก๏ธ Under The Hood:
- Multi-stage training process with progressive resolution scaling (224โ†’384โ†’512)
- Contrastive learning using InfoNCE loss in both directions
- Trained on massive multilingual dataset including 400M English and 400M multilingual image-caption pairs
- Incorporates specialized datasets for document understanding, scientific graphs, and infographics
- Uses hard negative mining with 7 negatives per positive sample

๐Ÿ“Š Performance:
- Outperforms previous models on visual document retrieval (52.65% nDCG@5)
- Achieves 89.73% image-to-text and 79.09% text-to-image retrieval on CLIP benchmark
- Strong multilingual performance across 30 languages
- Maintains performance even with 75% dimension reduction (256D vs 1024D)

๐ŸŽฏ Key Innovation:
The model solves the long-standing challenge of unifying text-only and multi-modal retrieval systems while adding robust multilingual support. Perfect for building cross-lingual visual search systems!

Kudos to the research team at Jina AI for this impressive advancement in multimodal AI!
reacted to nicolay-r's post with ๐Ÿ‘€๐Ÿง  about 18 hours ago
view post
Post
2111
๐Ÿ“ข So far I noticed that ๐Ÿง  reasoning with llm ๐Ÿค– in English is tend to be more accurate than in other languages.
However, besides the GoogleTrans and other open transparent translators, I could not find one that could be easy to use solutions to avoid:
1.๐Ÿ”ด Third-party framework installation
2.๐Ÿ”ด Text chunking
3.๐Ÿ”ด support of meta-annotation like spans / objects / etc.

๐Ÿ’Ž To cope problem of IR from non-english texts, I am happy to share the bulk-translate 0.25.0. ๐ŸŽŠ

โญ https://github.com/nicolay-r/bulk-translate

bulk-translate is a tiny Python ๐Ÿ no-string framework that allows translate series of texts with the pre-annotated fixed-spans that are invariant for translator.

It supports ๐Ÿ‘จโ€๐Ÿ’ป API for quick data translation with (optionaly) annotated objects in texts (see figure below) in Python ๐Ÿ
I make it accessible as much as possible for RAG and / or LLM-powered app downstreams:
๐Ÿ“˜ https://github.com/nicolay-r/bulk-translate/wiki

All you have to do is to provide iterator of texts, where each text:
1. โœ… String object
2. โœ… List of strings and nested lists that represent spans (value + any ID data).

๐Ÿค– By default I provide a wrapper over googletrans which you can override with your own ๐Ÿ”ฅ
https://github.com/nicolay-r/bulk-translate/blob/master/models/googletrans_310a.py
reacted to sayakpaul's post with ๐Ÿš€๐Ÿ”ฅ about 18 hours ago
view post
Post
1519
Commits speak louder than words ๐Ÿคช

* 4 new video models
* Multiple image models, including SANA & Flux Control
* New quantizers -> GGUF & TorchAO
* New training scripts

Enjoy this holiday-special Diffusers release ๐Ÿค—
Notes: https://github.com/huggingface/diffusers/releases/tag/v0.32.0
reacted to ginipick's post with ๐Ÿ”ฅ about 18 hours ago
view post
Post
1279
๐ŸŽจ GiniGen Canvas-o3: Intelligent AI-Powered Image Editing Platform
Transform your images with precision using our next-generation tool that lets you extract anything from text to objects with simple natural language commands! ๐Ÿš€
๐Ÿ“Œ Key Differentiators:

Intelligent Object Recognition & Extraction
โ€ข Freedom to select any target (text, logos, objects)
โ€ข Simple extraction via natural language commands ("dog", "signboard", "text")
โ€ข Ultra-precise segmentation powered by GroundingDINO + SAM
Advanced Background Processing
โ€ข AI-generated custom backgrounds for extracted objects
โ€ข Intuitive object size/position adjustment
โ€ข Multiple aspect ratio support (1:1, 16:9, 9:16, 4:3)
Progressive Text Integration
โ€ข Dual text placement: over or behind images
โ€ข Multi-language font support
โ€ข Real-time font style/size/color/opacity adjustment

๐ŸŽฏ Use Cases:

Extract logos from product images
Isolate text from signboards
Select specific objects from scenes
Combine extracted objects with new backgrounds
Layer text in front of or behind images

๐Ÿ’ซ Technical Features:

Natural language-based object detection
Real-time image processing
GPU acceleration & memory optimization
User-friendly interface

๐ŸŽ‰ Key Benefits:

User Simplicity: Natural language commands for object extraction
High Precision: AI-powered accurate object recognition
Versatility: From basic editing to advanced content creation
Real-Time Processing: Instant result visualization

Experience the new paradigm of image editing with GiniGen Canvas-o3:

Seamless integration of multiple editing functions
Professional-grade results with consumer-grade ease
Perfect for social media, e-commerce, and design professionals

Whether you're extracting text from complex backgrounds or creating sophisticated visual content, GiniGen Canvas-o3 provides the precision and flexibility you need for modern image editing!

GO! ginigen/CANVAS-o3
  • 2 replies
ยท
reacted to prithivMLmods's post with ๐Ÿ”ฅ๐Ÿค—โค๏ธ about 18 hours ago
reacted to singhsidhukuldeep's post with ๐Ÿ‘€ about 18 hours ago
view post
Post
1128
Fascinating insights from @Pinterest 's latest research on improving feature interactions in recommendation systems!

Pinterest's engineering team has tackled a critical challenge in their Homefeed ranking system that serves 500M+ monthly active users. Here's what makes their approach remarkable:

>> Technical Deep Dive

Architecture Overview
โ€ข The ranking model combines dense features, sparse features, and embedding features to represent users, Pins, and context
โ€ข Sparse features are processed using learnable embeddings with size based on feature cardinality
โ€ข User sequence embeddings are generated using a transformer architecture processing past engagements

Feature Processing Pipeline
โ€ข Dense features undergo normalization for numerical stability
โ€ข Sparse and embedding features receive L2 normalization
โ€ข All features are concatenated into a single feature embedding

Key Innovations
โ€ข Implemented parallel MaskNet layers with 3 blocks
โ€ข Used projection ratio of 2.0 and output dimension of 512
โ€ข Stacked 4 DCNv2 layers on top for higher-order interactions

Performance Improvements
โ€ข Achieved +1.42% increase in Homefeed Save Volume
โ€ข Boosted Overall Time Spent by +0.39%
โ€ข Maintained memory consumption increase to just 5%

>> Industry Constraints Addressed

Memory Management
โ€ข Optimized for 60% GPU memory utilization
โ€ข Prevented OOM errors while maintaining batch size efficiency

Latency Optimization
โ€ข Removed input-output concatenation before MLP
โ€ข Reduced hidden layer sizes in MLP
โ€ข Achieved zero latency increase while improving performance

System Stability
โ€ข Ensured reproducible results across retraining
โ€ข Maintained model stability across different data distributions
โ€ข Successfully deployed in production environment

This work brilliantly demonstrates how to balance academic innovations with real-world industrial constraints. Kudos to the Pinterest team!
reacted to Kseniase's post with โค๏ธ๐Ÿ‘ about 18 hours ago
view post
Post
1819
**15 Agentic Systems and Frameworks of 2024**

This year, we started our โ€œAI Agents and Agentic Workflowsโ€ series (https://www.turingpost.com/t/AI-Agents) to explore everything about AI agents step by step: all the vocabulary, how they work, and how to build them.
The huge interest in this series and the large number of studies conducted on agents showed that it was one of the most popular and important themes of the year. In 2025, most likely, agents will reach new highs โ€“ we will be covering that for you. Now, letโ€™s review the agentic systems that have emerged this year.

Here is a list of 15 agentic systems and frameworks of 2024:

1. GUI Agents: A Survey (2412.13501)

2. Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level (2411.03562)

3. The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery (2408.06292)

4. MALT: Improving Reasoning with Multi-Agent LLM Training (2412.01928)

5. Agent S: An Open Agentic Framework that Uses Computers Like a Human (2410.08164)

6. Automated Design of Agentic Systems (2408.08435)

7. AgentInstruct: Toward Generative Teaching with Agentic Flows (2407.03502)

8. AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant (2410.18603)

9. WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents (2410.07484)

10. Generative Agent Simulations of 1,000 People (2411.10109)

11. DynaSaur: Large Language Agents Beyond Predefined Actions (2411.01747)

12. PRefLexOR: Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning and Agentic Thinking (2410.12375)

13. Generative World Explorer (2411.11844)

14. Bel Esprit: Multi-Agent Framework for Building AI Model Pipelines (2412.14684)

15. AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions (2410.20424)

Thanks for reading Turing Post!
Subscribe to receive new posts straight into your inbox -> https://www.turingpost.com/subscribe
liked a Space about 18 hours ago