Kuldeep Singh Sidhu
singhsidhukuldeep
·
AI & ML interests
😃 TOP 3 on HuggingFace for posts 🤗 Seeking contributors for a completely open-source 🚀 Data Science platform! singhsidhukuldeep.github.io
Recent Activity
posted
an
update
about 5 hours ago
Excited to share a groundbreaking development in recommendation systems - Legommenders, a comprehensive content-based recommendation library that revolutionizes how we approach personalized content delivery.
>> Key Innovations
End-to-End Training
The library enables joint training of content encoders alongside behavior and interaction modules, making it the first of its kind to offer truly integrated content understanding in recommendation pipelines.
Massive Scale
- Supports creation and analysis of over 1,000 distinct models
- Compatible with 15 diverse datasets
- Features 15 content operators, 8 behavior operators, and 9 click predictors
Advanced LLM Integration
Legommenders pioneers LLM integration in two crucial ways:
- As feature encoders for enhanced content understanding
- As data generators for high-quality training data augmentation
Superior Architecture
The system comprises four core components:
- Dataset processor for unified data handling
- Content operator for embedding generation
- Behavior operator for user sequence fusion
- Click predictor for probability calculations
Performance Optimization
The library introduces an innovative caching pipeline that achieves up to 50x speedup in evaluation compared to traditional approaches.
Developed by researchers from The Hong Kong Polytechnic University, this open-source project represents a significant leap forward in recommendation system technology.
For those interested in content-based recommendation systems, this is a must-explore tool. The library is available on GitHub for implementation and experimentation.
posted
an
update
2 days ago
Groundbreaking Survey on Large Language Models in Recommendation Systems!
Just read a comprehensive survey that maps out how LLMs are revolutionizing recommender systems. The authors have meticulously categorized existing approaches into two major paradigms:
Discriminative LLMs for Recommendation:
- Leverages BERT-like models for understanding user-item interactions
- Uses fine-tuning and prompt tuning to adapt pre-trained models
- Excels at tasks like user representation learning and ranking
Generative LLMs for Recommendation:
- Employs GPT-style models to directly generate recommendations
- Implements innovative techniques like in-context learning and zero-shot recommendation
- Supports natural language interaction and explanation generation
Key Technical Insights:
- Novel taxonomy of modeling paradigms: LLM Embeddings + RS, LLM Tokens + RS, and LLM as RS
- Integration methods spanning from simple prompting to sophisticated instruction tuning
- Hybrid approaches combining collaborative filtering with LLM capabilities
- Advanced prompt engineering techniques for controlled recommendation generation
Critical Challenges Identified:
- Position and popularity bias in LLM recommendations
- Limited context length affecting user history processing
- Need for better evaluation metrics for generative recommendations
- Controlled output generation and personalization challenges
This work opens exciting possibilities for next-gen recommendation systems while highlighting crucial areas for future research.
posted
an
update
5 days ago
Groundbreaking Research Alert: Correctness ≠ Faithfulness in RAG Systems
Fascinating new research from L3S Research Center, University of Amsterdam, and TU Delft reveals a critical insight into Retrieval Augmented Generation (RAG) systems. The study exposes that up to 57% of citations in RAG systems could be unfaithful, despite being technically correct.
>> Key Technical Insights:
Post-rationalization Problem
The researchers discovered that RAG systems often engage in "post-rationalization" - where models first generate answers from their parametric memory and then search for supporting evidence afterward. This means that while citations may be correct, they don't reflect the actual reasoning process.
Experimental Design
The team used Command-R+ (104B parameters) with 4-bit quantization on NVIDIA A100 GPU, testing on the NaturalQuestions dataset. They employed BM25 for initial retrieval and ColBERT v2 for reranking.
Attribution Framework
The research introduces a comprehensive framework for evaluating RAG systems across multiple dimensions:
- Citation Correctness: Whether cited documents support the claims
- Citation Faithfulness: Whether citations reflect actual model reasoning
- Citation Appropriateness: Relevance and meaningfulness of citations
- Citation Comprehensiveness: Coverage of key points
Under the Hood
The system processes involve:
1. Document relevance prediction
2. Citation prediction
3. Answer generation without citations
4. Answer generation with citations
This work fundamentally challenges our understanding of RAG systems and highlights the need for more robust evaluation metrics in AI systems that claim to provide verifiable information.
Organizations
singhsidhukuldeep's activity
Update Request
2
#2 opened about 2 months ago
by
singhsidhukuldeep
The model can be started using vllm, but no dialogue is possible.
3
#2 opened 6 months ago
by
SongXiaoMao
Adding chat_template to tokenizer_config.json file
1
#3 opened 6 months ago
by
singhsidhukuldeep
Script request
3
#1 opened 6 months ago
by
singhsidhukuldeep
Requesting script
#1 opened 6 months ago
by
singhsidhukuldeep