sgugger George-Ogden commited on
Commit
bc2764f
1 Parent(s): ff46155

fix typos (#6)

Browse files

- fix typos (c2a5e573587885ce23744cf330ee7c402f0df16f)


Co-authored-by: George Ogden <[email protected]>

Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -42,7 +42,7 @@ interests you.
42
 
43
  Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked)
44
  to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
45
- generation you should look at model like GPT2.
46
 
47
  ### How to use
48
 
@@ -166,14 +166,14 @@ The RoBERTa model was pretrained on the reunion of five datasets:
166
  - [Stories](https://arxiv.org/abs/1806.02847) a dataset containing a subset of CommonCrawl data filtered to match the
167
  story-like style of Winograd schemas.
168
 
169
- Together theses datasets weight 160GB of text.
170
 
171
  ## Training procedure
172
 
173
  ### Preprocessing
174
 
175
  The texts are tokenized using a byte version of Byte-Pair Encoding (BPE) and a vocabulary size of 50,000. The inputs of
176
- the model take pieces of 512 contiguous token that may span over documents. The beginning of a new document is marked
177
  with `<s>` and the end of one by `</s>`
178
 
179
  The details of the masking procedure for each sentence are the following:
 
42
 
43
  Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked)
44
  to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
45
+ generation you should look at a model like GPT2.
46
 
47
  ### How to use
48
 
 
166
  - [Stories](https://arxiv.org/abs/1806.02847) a dataset containing a subset of CommonCrawl data filtered to match the
167
  story-like style of Winograd schemas.
168
 
169
+ Together these datasets weigh 160GB of text.
170
 
171
  ## Training procedure
172
 
173
  ### Preprocessing
174
 
175
  The texts are tokenized using a byte version of Byte-Pair Encoding (BPE) and a vocabulary size of 50,000. The inputs of
176
+ the model take pieces of 512 contiguous tokens that may span over documents. The beginning of a new document is marked
177
  with `<s>` and the end of one by `</s>`
178
 
179
  The details of the masking procedure for each sentence are the following: