When you come across an interesting dataset, you often wonder: Which topics frequently appear in these documents? π€ What is this data really about? π
Topic modeling helps answer these questions by identifying recurring themes within a collection of documents. This process enables quick and efficient exploratory data analysis.
Iβve been working on an app that leverages BERTopic, a flexible framework designed for topic modeling. Its modularity makes BERTopic powerful, allowing you to switch components with your preferred algorithms. It also supports handling large datasets efficiently by merging models using the BERTopic.merge_models approach. π
π How do we make this work? Hereβs the stack weβre using:
π Data Source β‘οΈ Hugging Face datasets with DuckDB for retrieval π§ Text Embeddings β‘οΈ Sentence Transformers (all-MiniLM-L6-v2) β‘ Dimensionality Reduction β‘οΈ RAPIDS cuML UMAP for GPU-accelerated performance π Clustering β‘οΈ RAPIDS cuML HDBSCAN for fast clustering βοΈ Tokenization β‘οΈ CountVectorizer π§ Representation Tuning β‘οΈ KeyBERTInspired + Hugging Face Inference Client with Meta-Llama-3-8B-Instruct π Visualization β‘οΈ Datamapplot library Check out the space and see how you can quickly generate topics from your dataset: datasets-topics/topics-generator
π Excited to share the latest update to the Notebook Creator Tool!
Now with basic fine-tuning support using Supervised Fine-Tuning! π―
How it works: 1οΈβ£ Choose your Hugging Face dataset and notebook type (SFT) 2οΈβ£ Automatically generate your training notebook 3οΈβ£ Start fine-tuning with your data!
Link to the app π https://lnkd.in/e_3nmWrB π‘ Want to contribute with new notebooks? πhttps://lnkd.in/eWcZ92dS
I've been working on a Space to make it super easy to create notebooks and help users quickly understand and manipulate their data! With just a few clicks automatically generate notebooks for:
π Exploratory Data Analysis π§ Text Embeddings π€ Retrieval-Augmented Generation (RAG)
β¨ Automatic training is coming soon! Check it out here asoria/auto-notebook-creator Appreciate any feedback to improve this tool π€