multiple_choice_score: there are 1132 tasks in prompt multiple_choice_score: reading tasks.multiple_choice_score: failed to read task 20 of 1132