Debugging Tags: Imagine, Associated Thoughts, Dialectical Analysis, Backwards Induction, Metacognition, and Normal Thought Processes such as <think> or <begin_of_thought>
This Phi-4 model is part of a test project that I called Micro-Dose. My goal was to use a small dataset to activate reasoning and other cognitive processes without relying on a large dataset.
I found that this was possible with a tiny dataset of just 90 rows, specifically designed as math problems. In the initial iterations, the dataset only activated reasoning when a math-related question was asked. I then made a few changes to the datasetโs structure, including the order of information and the naming of tags. You can see the sample results in the pictures. Not really anything special, just thought I'd share.
made a few improvements on custom grpo trainer: - added sequence similarity reward (seems to work) - improved vllm support (5x inference speed) - adjusted reward scores (this helped with format/accuracy) - can now push to hf hub (already pushed mine lol: Jaward/smollm2_360m_grpo_gsm8k_reasoner)
I just came across a groundbreaking paper titled "Hypencoder: Hypernetworks for Information Retrieval" by researchers from the University of Massachusetts Amherst that introduces a fundamentally new paradigm for search technology.
Most current retrieval models rely on simple inner product calculations between query and document vectors, which severely limits their expressiveness. The authors prove theoretically that inner product similarity functions fundamentally constrain what types of relevance relationships can be captured.
Hypencoder takes a radically different approach: instead of encoding a query as a vector, it generates a small neural network (called a "q-net") that acts as a learned relevance function. This neural network takes document representations as input and produces relevance scores.
Under the hood, Hypencoder uses: - Attention-based hypernetwork layers (hyperhead layers) that transform contextualized query embeddings into weights and biases for the q-net - A document encoder that produces vector representations similar to existing models - A graph-based greedy search algorithm for efficient retrieval that can search 8.8M documents in under 60ms
The results are impressive - Hypencoder significantly outperforms strong dense retrieval models on standard benchmarks like MS MARCO and TREC Deep Learning Track. The performance gap widens even further on complex retrieval tasks like tip-of-the-tongue queries and instruction-following retrieval.
What makes this approach particularly powerful is that neural networks are universal approximators, allowing Hypencoder to express far more complex relevance relationships than inner product similarity functions. The framework is also flexible enough to replicate any existing neural retrieval method while adding the ability to learn query-dependent weights.
๐ Multimodal > OpenGVLab released InternVideo 2.5 Chat models, new video LMs with long context > AIDC released Ovis2 model family along with Ovis dataset, new vision LMs in different sizes (1B, 2B, 4B, 8B, 16B, 34B), with video and OCR support > ColQwenStella-2b is a multilingual visual retrieval model that is sota in it's size > Hoags-2B-Exp is a new multilingual vision LM with contextual reasoning, long context video understanding
๐ฌ LLMs A lot of math models! > Open-R1 team released OpenR1-Math-220k large scale math reasoning dataset, along with Qwen2.5-220K-Math fine-tuned on the dataset, OpenR1-Qwen-7B > Nomic AI released new Nomic Embed multilingual retrieval model, a MoE with 500 params with 305M active params, outperforming other models > DeepScaleR-1.5B-Preview is a new DeepSeek-R1-Distill fine-tune using distributed RL on math > LIMO is a new fine-tune of Qwen2.5-32B-Instruct on Math
๐ฃ๏ธ Audio > Zonos-v0.1 is a new family of speech recognition models, which contains the model itself and embeddings
๐ผ๏ธ Vision and Image Generation > We have ported DepthPro of Apple to transformers for your convenience! > illustrious-xl-v1.0 is a new illustration generation model
๐ Project Overview 3D Llama Studio is an all-in-one AI platform that generates high-quality 3D models and stylized images from text or image inputs.
โจ Key Features
Text/Image to 3D Conversion ๐ฏ
Generate 3D models from detailed text descriptions or reference images Intuitive user interface
Text to Styled Image Generation ๐จ
Customizable image generation settings Adjustable resolution, generation steps, and guidance scale Supports both English and Korean prompts
๐ ๏ธ Technical Features
Gradio-based web interface Dark theme UI/UX Real-time image generation and 3D modeling
๐ซ Highlights
User-friendly interface Real-time preview Random seed generation High-resolution output support (up to 2048x2048)
๐ฏ Applications
Product design Game asset creation Architectural visualization Educational 3D content