Introducing Moonshine Web: real-time speech recognition running 100% locally in your browser! š Faster and more accurate than Whisper š Privacy-focused (no data leaves your device) ā”ļø WebGPU accelerated (w/ WASM fallback) š„ Powered by ONNX Runtime Web and Transformers.js
We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute š„
How? By combining step-wise reward models with tree search algorithms :)
We show that smol models can match or exceed the performance of their much larger siblings when given enough "time to think"
We're open sourcing the full recipe and sharing a detailed blog post.
In our blog post we cover:
š Compute-optimal scaling: How we implemented DeepMind's recipe to boost the mathematical capabilities of open models at test-time.
š Diverse Verifier Tree Search (DVTS): An unpublished extension we developed to the verifier-guided tree search technique. This simple yet effective method improves diversity and delivers better performance, particularly at large test-time compute budgets.
š§ Search and Learn: A lightweight toolkit for implementing search strategies with LLMs and built for speed with vLLM