Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
diwank 
posted an update Jun 15, 2024
Post
2258
Just published "CryptGPT: A Simple Approach to Privacy-Preserving Language Models Using the Vigenere Cipher".

https://huggingface.co./blog/diwank/cryptgpt-part1

tl;dr - we pretrained a gpt-2 tokenizer and model from scratch on a dataset encrypted with Vigenere cipher and it performs as well as regular gpt-2. Except in order to use it, you need to know the encryption key.

links:
https://github.com/creatorrr/cryptgpt
diwank/cryptgpt
diwank/cryptgpt-large

@Narsil @ArthurZ @anthony really appreciate your work on tokenizers which was waaaaay easier than sentencepiece to use. <3

deleted
This comment has been hidden
In this post