karmiq (Karel Minarik)

upvoted an article about 1 month ago

Article

🇨🇿 BenCzechMark - Can your LLM Understand Czech?

Oct 1

• 16

upvoted a paper 5 months ago

Segment Any Text: A Universal Approach for Robust, Efficient and Adaptable Sentence Segmentation

Paper • 2406.16678 • Published Jun 24 • 14

upvoted a collection 5 months ago

Nemotron 4 340B

Collection

Nemotron-4: open models for Synthetic Data Generation (SDG). Includes Base, Instruct, and Reward models. • 4 items • Updated 9 days ago • 157

upvoted a paper 5 months ago

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Paper • 2406.01574 • Published Jun 3 • 42

upvoted an article 6 months ago

Article

Training and Finetuning Embedding Models with Sentence Transformers v3

May 28

• 156

upvoted a paper 8 months ago

Anticipatory Music Transformer

Paper • 2306.08620 • Published Jun 14, 2023 • 9

upvoted a collection 8 months ago

Czech evaluation datasets

Collection

This collections should contain czech evaluation datasets • 8 items • Updated Jan 14 • 3

upvoted 3 papers 9 months ago

upvoted a paper 10 months ago

Text Embeddings Reveal (Almost) As Much As Text

Paper • 2310.06816 • Published Oct 10, 2023 • 1

upvoted 4 papers 11 months ago

Shai: A large language model for asset management

Paper • 2312.14203 • Published Dec 21, 2023 • 4

Borges and AI

Paper • 2310.01425 • Published Sep 27, 2023 • 2

Recursively Summarizing Books with Human Feedback

Paper • 2109.10862 • Published Sep 22, 2021 • 1

An In-depth Look at Gemini's Language Abilities

Paper • 2312.11444 • Published Dec 18, 2023 • 1

upvoted 2 papers 12 months ago

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

Paper • 2101.00027 • Published Dec 31, 2020 • 6

Jailbroken: How Does LLM Safety Training Fail?

Paper • 2307.02483 • Published Jul 5, 2023 • 13

upvoted a collection 12 months ago

Zephyr 7B

Collection

Models, datasets, and demos associated with Zephyr 7B. For code to train the models, see: https://github.com/huggingface/alignment-handbook • 9 items • Updated Apr 12 • 145

upvoted a paper 12 months ago

FinGPT: Large Generative Models for a Small Language

Paper • 2311.05640 • Published Nov 3, 2023 • 27

Karel Minarik

AI & ML interests

Organizations

karmiq's activity

🇨🇿 BenCzechMark - Can your LLM Understand Czech?

Segment Any Text: A Universal Approach for Robust, Efficient and Adaptable Sentence Segmentation

Nemotron 4 340B

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Training and Finetuning Embedding Models with Sentence Transformers v3

Anticipatory Music Transformer

Czech evaluation datasets

Retrieval-Augmented Generation for Large Language Models: A Survey

Improving Text Embeddings with Large Language Models

Multilingual E5 Text Embeddings: A Technical Report

Text Embeddings Reveal (Almost) As Much As Text

Shai: A large language model for asset management

Borges and AI

Recursively Summarizing Books with Human Feedback

An In-depth Look at Gemini's Language Abilities

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

Jailbroken: How Does LLM Safety Training Fail?

Zephyr 7B

FinGPT: Large Generative Models for a Small Language