🚩 Report
json files in this repo is corrupted, and the author has not responded for it for about 2 weeks. It affects researchers badly
c870e4446b7f38bee63704756d4c221d ./tokenizer.json
2c8b77848a6814cba288c53a9c162cb8 ./tokenizer_config.json
b09ba0283111cf1efbe04028785b98d2 ./special_tokens_map.json
efd77c944abc8fc3d9ae32bd33cf2a55 ./pytorch_model-00004-of-00007.bin
e99fc31c5e7ed4cac0a92d0a4f67ee4f ./pytorch_model.bin.index.json
736ddd6433e52787d742eb015626aca3 ./pytorch_model-00007-of-00007.bin
97cde98da890fb3bb38f7909cd4373dd ./config.json
f6206d9bb15fd412faab305190259f0e ./pytorch_model-00006-of-00007.bin
9f3c686d18ced6992975a985a962e06f ./generation_config.json
cc23c1272ad8fb9849451b1d47159af4 ./pytorch_model-00005-of-00007.bin
27b0dc092f99aa2efaf467b2d8026c3f ./added_tokens.json
dcd5b0d011322f6bb24d9485a51f8d4c ./pytorch_model-00002-of-00007.bin
aa8f44b013cbce3e52ece14d9d7e2557 ./pytorch_model-00003-of-00007.bin
35fb2f68c7e3ec4f434fd5309042dfcc ./pytorch_model-00001-of-00007.bin
cf45df715e9b7f473fccbc102a7c8bb4 ./tokenizer.model
This is the pulled md5 for xor folder
Just to make sure this is not related to file-endings: Which OS are you using?
I am pretty sure this is a process issue rather than the JSONs. The JSONs are not corrupted, they are just XORs of JSON files rather than actual JSON files. I have updated the README with some new details, hopefully they are helpful
Just to make sure this is not related to file-endings: Which OS are you using?
Ubuntu 22.04
After using dos2unix *.json *.json, the md5 of json files under oasst-rlhf-2-llama-30b-7k-steps-xor remains the same
I am pretty sure this is a process issue rather than the JSONs. The JSONs are not corrupted, they are just XORs of JSON files rather than actual JSON files. I have updated the README with some new details, hopefully they are helpful
Thanks! I think I get the same md5 checksum:
fdb311c39b8659a5d5c1991339bafc09 ./tokenizer.json
edd1a5897748864768b1fab645b31491 ./tokenizer_config.json
6b2e0a735969660e720c27061ef3f3d3 ./special_tokens_map.json
3eddc6fc02c0172d38727e5826181adb ./pytorch_model-00004-of-00007.bin
fecfda4fba7bfd911e187a85db5fa2ef ./pytorch_model.bin.index.json
462a2d07f65776f27c0facfa2affb9f9 ./pytorch_model-00007-of-00007.bin
598538f18fed1877b41f77de034c0c8a ./config.json
99762d59efa6b96599e863893cf2da02 ./pytorch_model-00006-of-00007.bin
aee09e21813368c49baaece120125ae3 ./generation_config.json
92754d6c6f291819ffc3dfcaf470f541 ./pytorch_model-00005-of-00007.bin
5cfcb78b908ffa02e681cce69dbe4303 ./pytorch_model-00002-of-00007.bin
e1dc8c48a65279fb1fbccff14562e6a3 ./pytorch_model-00003-of-00007.bin
9cffb1aeba11b16da84b56abb773d099 ./pytorch_model-00001-of-00007.bin
eeec4125e9c7560836b4873b6f8e3025 ./tokenizer.model