2 25 63

Afonso Marques

marquesafonso

marquesafonso

AI & ML interests

None yet

Recent Activity

upvoted a collection 5 days ago

NuExtract-2.0

reacted to davidberenstein1957's post with 👍 6 days ago

🫸 New release to push vector search to the Hub with vicinity and work with any serialisable objects. 🧑‍🏫 KNN, HNSW, USEARCH, ANNOY, PYNNDESCENT, FAISS, and VOYAGER. 🔗 Example Repo: https://huggingface.co./datasets/minishlab/my-vicinity-repo

reacted to singhsidhukuldeep's post with 👍 6 days ago

Exciting New Tool for Knowledge Graph Extraction from Plain Text! I just came across a groundbreaking new tool called KGGen that's solving a major challenge in the AI world - the scarcity of high-quality knowledge graph data. KGGen is an open-source Python package that leverages language models to extract knowledge graphs (KGs) from plain text. What makes it special is its innovative approach to clustering related entities, which significantly reduces sparsity in the extracted KGs. The technical approach is fascinating: 1. KGGen uses a multi-stage process involving an LLM (GPT-4o in their implementation) to extract entities and relations from source text 2. It aggregates graphs across sources to reduce redundancy 3. Most importantly, it applies iterative LM-based clustering to refine the raw graph The clustering stage is particularly innovative - it identifies which nodes and edges refer to the same underlying entities or concepts. This normalizes variations in tense, plurality, stemming, and capitalization (e.g., "labors" clustered with "labor"). The researchers from Stanford and University of Toronto also introduced MINE (Measure of Information in Nodes and Edges), the first benchmark for evaluating KG extractors. When tested against existing methods like OpenIE and GraphRAG, KGGen outperformed them by up to 18%. For anyone working with knowledge graphs, RAG systems, or KG embeddings, this tool addresses the fundamental challenge of data scarcity that's been holding back progress in graph-based foundation models. The package is available via pip install kg-gen, making it accessible to everyone. This could be a game-changer for knowledge graph applications!

View all activity

Organizations

None yet

Collections 3

spaces 4

pinned

Running

Multilang Asr Transcriber

👁

A multilingual automatic speech transcription tool

pinned

Running

Multilang Asr Subtitler

🐢

A multilingual ASR and video captioning tool

Sleeping

Albertina STS

🏆

A PT sentence similarity endpoint using albertina-sts

Sleeping

Bertimbau Large Ner Selective

📈

A Portuguese NER endpoint (PER, LOC, VAL, TEMP, ORG)

models 6

marquesafonso/albertina-sts

marquesafonso/bertimbau-large-ner-total

Token Classification • Updated Jan 5, 2024 • 353 • 1

Afonso Marques

AI & ML interests

Recent Activity

Organizations

Collections 3

Multilang Asr Transcriber

Multilang Asr Subtitler

marquesafonso/bertimbau-large-ner-selective

marquesafonso/bertimbau-large-ner-total

spaces 4

Multilang Asr Transcriber

Multilang Asr Subtitler

Albertina STS

Bertimbau Large Ner Selective

models 6

marquesafonso/NuExtract-openvino-8bit

marquesafonso/NuExtract-1.5-tiny-openvino

marquesafonso/NuExtract-1.5-smol-openvino

marquesafonso/albertina-sts

marquesafonso/bertimbau-large-ner-total

marquesafonso/bertimbau-large-ner-selective

datasets 1

marquesafonso/wikipedia-pt-embeddings

Afonso Marques

AI & ML interests

Recent Activity

Organizations

Collections 3

Multilang Asr Transcriber

Multilang Asr Subtitler

spaces 4 Sort: Recently updated

Multilang Asr Transcriber

Multilang Asr Subtitler

Albertina STS

Bertimbau Large Ner Selective

models 6 Sort: Recently updated

datasets 1

spaces 4

models 6