p1atdev/multi-tokenizers-processor-sample

See transformers で複数のトークナイザーを一つのプロセッサーで扱う.

https://zenn.dev/platina/articles/732feb7c3e9852

Example usage

from transformers import AutoProcessor

processor = AutoProcessor.from_pretrained(
    "p1atdev/multi-tokenizers-processor-sample",
    trust_remote_code=True,
    commit_hash="111e8a30609fb5bc13e16d08f7c49196b23d5056"
)

print(processor(
    text_1="テキスト1",
    text_2="テキスト2",
))
# {'input_ids': tensor([[    1, 43412, 28745]]), 'attention_mask': tensor([[1, 1, 1]]), 'input_ids_2': tensor([[56833, 61803, 70534,    17]]), 'attention_mask_2': tensor([[1, 1, 1, 1]])}