Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

inflatebot 
posted an update 2 days ago
view post
Post
1115
!!SEE UPDATE BELOW!!
I don't know who still needs to hear this, but if you're using Mistral Nemo-based models, you might have been using the wrong completions format. This is a signal boost from MarinaraSpaghetti's model card for NemoMix-Unleashed: MarinaraSpaghetti/NemoMix-Unleashed-12B
A lot of people have been working with a version of Nemo that's been reconfigured for ChatML, and while that works great, simply using the right format might be just as effective at correcting weirdness people in the AIRP scene sometimes have with Nemo.

Huge ups to Marinara for pointing this out, and to the MistralAI team member who let her know.

Update: A PR has been merged to SillyTavern Staging with new corrected templates! If you don't want to switch or wait, I put them up on GitHub: https://github.com/inflatebot/SillyTavern-Mistral-Templates

PRs for KoboldCPP's chat adapters and KoboldAI Lite *have been merged* and are coming in their respective releases (probably the next time KoboldCPP updates -- it didn't make it for 1.75.1, but you could just grab 'em from the repo!)
  • 1 reply
·
KingNish 
posted an update 2 days ago
reach-vb 
posted an update 2 days ago
view post
Post
1938
Less than two days ago Kyutai Labs open sourced Moshi - an ~7.6B on-device Speech to Speech foundation model and Mimi - SoTA streaming speech codec! 🔥

The release includes:

1. Moshiko & Moshika - Moshi finetuned on synthetic data (CC-BY license) ( kyutai/moshi-v01-release-66eaeaf3302bef6bd9ad7acd)
2. Mimi - Streaiming Audio Codec, processes 24 kHz audio, down to a 12.5 Hz representation with a bandwidth of 1.1 kbps (CC-BY license) ( kyutai/mimi)
3. Model checkpoints & Inference codebase written in Rust (Candle), PyTorch & MLX (Apache license) (https://github.com/kyutai-labs/moshi)

How does Moshi work?

1. Moshi processes two audio streams: one for itself and one for the user, with the user's stream coming from audio input and Moshi's stream generated by the model.

2. Along with these audio streams, Moshi predicts text tokens for its speech, enhancing its generation quality.

3. The model uses a small Depth Transformer for codebook dependencies and a large 7B parameter Temporal Transformer for temporal dependencies.

4. The theoretical latency is 160ms, with a practical latency of around 200ms on an L4 GPU.

Model size & inference:

Moshiko/ka are 7.69B param models

bf16 ~16GB VRAM
8-bit ~8GB VRAM
4-bit ~4GB VRAM

You can run inference via Candle 🦀, PyTorch and MLX - based on your hardware.

The Kyutai team, @adefossez @lmz and team are cracked AF, they're bringing some serious firepower to the open source/ science AI scene, looking forward to what's next! 🐐
  • 1 reply
·
kz919 
posted an update 2 days ago
view post
Post
825
Just for the meme.

But the clear lesson I learnt from building these demos are, the more powerful the underlying base model is, the closer you will get to GPT4o1. CoT is nothing more than simply inducing the latent reasoning capability from the model.

kz919/GPT4-O1-Proximas
MonsterMMORPG 
posted an update about 23 hours ago
view post
Post
658
I have done an extensive multi-GPU FLUX Full Fine Tuning / DreamBooth training experimentation on RunPod by using 2x A100–80 GB GPUs (PCIe) since this was commonly asked of me.

Full article here : https://medium.com/@furkangozukara/multi-gpu-flux-fu

Image 1
Image 1 shows that only first part of installation of Kohya GUI took 30 minutes on a such powerful machine on a very expensive Secure Cloud pod — 3.28 USD per hour
There was also part 2, so just installation took super time
On Massed Compute, it would take like 2–3 minutes
This is why I suggest you to use Massed Compute over RunPod, RunPod machines have terrible hard disk speeds and they are like lottery to get good ones



Image 2, 3 and 4
Image 2 shows speed of our very best config FLUX Fine Tuning training shared below when doing 2x Multi GPU training
https://www.patreon.com/posts/kohya-flux-fine-112099700
Used config name is : Quality_1_27500MB_6_26_Second_IT.json
Image 3 shows VRAM usage of this config when doing 2x Multi GPU training
Image 4 shows the GPUs of the Pod


Image 5 and 6
Image 5 shows speed of our very best config FLUX Fine Tuning training shared below when doing a single GPU training
https://www.patreon.com/posts/kohya-flux-fine-112099700
Used config name is : Quality_1_27500MB_6_26_Second_IT.json
Image 6 shows this setup used VRAM amount


Image 7 and 8
Image 7 shows speed of our very best config FLUX Fine Tuning training shared below when doing a single GPU training and Gradient Checkpointing is disabled
https://www.patreon.com/posts/kohya-flux-fine-112099700
Used config name is : Quality_1_27500MB_6_26_Second_IT.json
Image 8 shows this setup used VRAM amount


....
loztcontrol 
posted an update 1 day ago
view post
Post
728
I am developing a personal project to further support and help people living with Depression and Anxiety. As I suffer mainly from chronic depression I would like to create a tool based on AI that can monitor my moods but first I will collect information about myself, my moods and after collecting at least 6 months of my moods and my writings I will be able to formulate as a kind of recognition when my emotions are “out of control” I mean those states or feelings of emptiness. I think that sometimes not all of us have access to treatments and therapies so I would like to develop in a free way this project that I have just started today. I have already started the code to register events of my moods. I will share with you the updates :D


import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report
import nltk
from nltk.corpus import stopwords
import string
import matplotlib.pyplot as plt
from datetime import datetime

nltk.download('stopwords')

data = {
    'text': [
        "Hoy me siento bien, aunque un poco cansado", 
        "Me siento triste y solo", 
        "Esto es frustrante, todo sale mal", 
        "Estoy nervioso por lo que va a pasar",
        "No puedo con este estrés", 
        "Todo está saliendo bien, me siento optimista", 
        "Siento miedo de lo que pueda suceder", 
        "Hoy fue un día horrible"
    ],
    'emotion': [
        'felicidad', 
        'tristeza', 
        'enojo', 
        'ansiedad', 
        'ansiedad', 
        'felicidad', 
        'miedo', 
        'tristeza'
    ]
}

df = pd.DataFrame(data)

# Función para limpiar el texto
def clean_text(text):

Yes, I speak Spanish :P too
  • 3 replies
·
dylanebert 
posted an update 2 days ago
m-ric 
posted an update 2 days ago
view post
Post
516
🧠 Stanford paper might be the key to OpenAI o1’s performance: What’s so effective about Chain of Thought? ⇒ it unlocks radically different sequential tasks!

💭 Reminder: A Chain of Thought (CoT) means that you instruct the model to “think step by step”. Often it’s literally just putting in the prompt “let’s think step by step.”

🤔 This method has been shown to be unreasonably effective to increase perf on benchmarks. However why it works so well remains unclear.

Here's the scoop: Transformers are amazing at parallel processing, but they've always struggled with tasks that require sequential reasoning.

⛔️ For instance if you ask them the result of 3^2^2^2^…, with 20 iterations, they’ll nearly always fail.

💡 Indeed, researchers prove mathematically, by assimilating transformers networks to logical circuits, that effectively they cannot solve sequential tasks that require more than a certain threshold of sequences.

But CoT enables sequential reasoning:

- 🧱 Each step in the CoT corresponds to simulating one operation in a complex circuit.
- 🔄 This allows the transformer to "reset" the depth of intermediate outputs, overcoming previous limitations.
- 🚀 Thus, with CoT, constant-depth transformers can now solve ANY problem computable by polynomial-size circuits! (That's a huge class of problems in computer science.)
- 🔑 Transformers can now handle tricky tasks like iterated squares (computing 3^2^2^2^2) composed permutations and evaluating circuits - stuff that requires serial computation.
- 📊 The improvement is especially dramatic for transformers with a limited depth. Empirical tests on four arithmetic problems showed massive accuracy gains with CoT on inherently serial tasks.

Main takeaway: Chain-of-thought isn't just a neat trick - it fundamentally expands what transformer models can do!

Read the paper 👉  Chain of Thought Empowers Transformers to Solve Inherently Serial Problems (2402.12875)
tomaarsen 
posted an update 3 days ago
view post
Post
1622
🎉SetFit v1.1.0 is out! Training efficient classifiers on CPU or GPU now uses the Sentence Transformers Trainer, and we resolved a lot of issues caused by updates of third-party libraries (like Transformers). Details:

Training a SetFit classifier model consists of 2 phases:
1. Finetuning a Sentence Transformer embedding model
2. Training a Classifier to map embeddings -> classes

🔌The first phase now uses the SentenceTransformerTrainer that was introduced in the Sentence Transformers v3 update. This brings some immediate upsides like MultiGPU support, without any (intended) breaking changes.

➡️ Beyond that, we softly deprecated the "evaluation_strategy" argument in favor of "eval_strategy" (following a Transformers deprecation), and deprecated Python 3.7. In return, we add official support for Python 3.11 and 3.12.

✨ There's some more minor changes too, like max_steps and eval_max_steps now being a hard limit instead of an approximate one, training/validation losses now logging nicely in Notebooks, and the "device" parameter no longer being ignored in some situations.

Check out the full release notes here: https://github.com/huggingface/setfit/releases/tag/v1.1.0
Or read the documentation: https://huggingface.co./docs/setfit
Or check out the public SetFit models for inspiration: https://huggingface.co./models?library=setfit&sort=created

P.s. the model in the code snippet trained in 1 minute and it can classify ~6000 sentences per second on my GPU.
zolicsaki 
posted an update 2 days ago
view post
Post
882
We’ve open-sourced an app, powered by SambaNova Cloud and Llama 405B, that intelligently detects when a web search is needed—then answers directly or with RAG.

sambanovasystems/auto-web-search

🥚 A hidden Easter egg is that Auto Search detection is already trained into Llama 3.1 checkpoints. Simply use the tool usage system prompt below, and the model will either respond with a web search query if it deems necessary or respond to the query directly.🥚

Environment: IPython
Tools: Brave Search
Knowledge Cutoff Date: December 2023
Today's Date: September 2024
You are a helpful assistant. Reminder:
Search function calls MUST follow the specified format: "brave_search.call(query)"

You can see the documentation here
https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_1#built-in-tooling
and read about how the tool usage was trained into Llama3.1 models in section 4.3.5 here https://arxiv.org/pdf/2407.21783