Edit model card

Llama 3 Youko 8B (rinna/llama-3-youko-8b)

rinna-icon

Overview

We conduct continual pre-training of meta-llama/Meta-Llama-3-8B on 22B tokens from a mixture of Japanese and English datasets. The continual pre-training significantly improves the model's performance on Japanese tasks.

The name youko comes from the Japanese word 妖狐/γ‚ˆγ†γ“/Youko, which is a kind of Japanese mythical creature (ε¦–ζ€ͺ/γ‚ˆγ†γ‹γ„/Youkai).

Size Continual Pre-Training Instruction-Tuning
8B Llama 3 Youko 8B [HF] [GPTQ] Llama 3 Youko 8B Instruct [HF] [GPTQ]
70B Llama 3 Youko 70B [HF] [GPTQ] Llama 3 Youko 70B Instruct [HF] [GPTQ]

Benchmarking

Please refer to rinna's LM benchmark page.


How to use the model

import transformers
import torch

model_id = "rinna/llama-3-youko-8b"
pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto"
)
output = pipeline(
    "θ₯Ώη”°εΉΎε€šιƒŽγ―、",
    max_new_tokens=256,
    do_sample=True
)
print(output[0]["generated_text"])

Tokenization

The model uses the original meta-llama/Meta-Llama-3-8B tokenizer.


How to cite

@misc{rinna-llama-3-youko-8b,
    title = {rinna/llama-3-youko-8b},
    author = {Mitsuda, Koh and Chen, Xinqi and Wakatsuki, Toshiaki and Sawada, Kei},
    url = {https://huggingface.co./rinna/llama-3-youko-8b}
}

@inproceedings{sawada2024release,
    title = {Release of Pre-Trained Models for the {J}apanese Language},
    author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
    booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
    month = {5},
    year = {2024},
    pages = {13898--13905},
    url = {https://aclanthology.org/2024.lrec-main.1213},
    note = {\url{https://arxiv.org/abs/2404.01657}}
}

References

@article{llama3modelcard,
    title = {Llama 3 Model Card},
    author = {AI@Meta},
    year = {2024},
    url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
}

@software{gpt-neox-library,
    title = {{GPT}-{N}eo{X}: Large Scale Autoregressive Language Modeling in {P}y{T}orch},
    author = {Andonian, Alex and Anthony, Quentin and Biderman, Stella and Black, Sid and Gali, Preetham and Gao, Leo and Hallahan, Eric and Levy-Kramer, Josh and Leahy, Connor and Nestler, Lucas and Parker, Kip and Pieler, Michael and Purohit, Shivanshu and Songz, Tri and Phil, Wang and Weinbach, Samuel},
    doi = {10.5281/zenodo.5879544},
    month = {8},
    year = {2021},
    version = {0.0.1},
    url = {https://www.github.com/eleutherai/gpt-neox}
}

License

Meta Llama 3 Community License

Downloads last month
1,688
Safetensors
Model size
8.03B params
Tensor type
BF16
Β·
Inference Examples
Inference API (serverless) has been turned off for this model.

Model tree for rinna/llama-3-youko-8b

Finetuned
(354)
this model
Adapters
5 models
Finetunes
8 models
Merges
2 models
Quantizations
7 models

Datasets used to train rinna/llama-3-youko-8b

Spaces using rinna/llama-3-youko-8b 9

Collections including rinna/llama-3-youko-8b