RaushanTurganbay HF staff commited on
Commit
c2756cb
·
verified ·
1 Parent(s): ae30a20

add updated chat template example

Browse files
Files changed (1) hide show
  1. README.md +27 -1
README.md CHANGED
@@ -7,6 +7,9 @@ license: apache-2.0
7
  # Model Card for Model ID
8
  Transformers compatible pixtral checkpoints. Make sure to install from source or wait for v4.45!
9
 
 
 
 
10
  ```python
11
  from PIL import Image
12
  from transformers import AutoProcessor, LlavaForConditionalGeneration
@@ -54,6 +57,8 @@ Each image captures a different scene, from a close-up of a dog to expansive nat
54
  """
55
  ```
56
 
 
 
57
  You can also use a chat template to format your chat history for Pixtral. Make sure that the `images` argument to the `processor` contains the images in the order
58
  that they appear in the chat, so that the model understands where each image is supposed to go.
59
 
@@ -86,6 +91,27 @@ generate_ids = model.generate(**inputs, max_new_tokens=500)
86
  output = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
87
  ```
88
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
89
  You should get something like this:
90
 
91
  ```
@@ -108,4 +134,4 @@ Would you like more information on any specific aspect?
108
  ```
109
 
110
  While it may appear that spacing in the input is disrupted, this is caused by us skipping special tokens for display, and actually "Can this animal" and "live here" are
111
- correctly separated by image tokens. Try decoding with special tokens included to see exactly what the model sees!
 
7
  # Model Card for Model ID
8
  Transformers compatible pixtral checkpoints. Make sure to install from source or wait for v4.45!
9
 
10
+
11
+ ### Usage example
12
+
13
  ```python
14
  from PIL import Image
15
  from transformers import AutoProcessor, LlavaForConditionalGeneration
 
57
  """
58
  ```
59
 
60
+ ### Usage with chat template
61
+
62
  You can also use a chat template to format your chat history for Pixtral. Make sure that the `images` argument to the `processor` contains the images in the order
63
  that they appear in the chat, so that the model understands where each image is supposed to go.
64
 
 
91
  output = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
92
  ```
93
 
94
+ -----------
95
+ From transformers>=v4.48, you can also pass image url or local path to the conversation history, and let the chat template handle the rest.
96
+ Chat template will load the image for you and return inputs in `torch.Tensor` which you can pass directly to `model.generate()`.
97
+
98
+ ```python
99
+ chat = [
100
+ {
101
+ "role": "user", "content": [
102
+ {"type": "text", "content": "Can this animal"},
103
+ {"type": "image", "url": url_dog},
104
+ {"type": "text", "content": "live here?"},
105
+ {"type": "image", "url" : url_mountain}
106
+ ]
107
+ }
108
+ ]
109
+
110
+ inputs = processor.apply_chat_template(chat, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors"pt").to(model.device)
111
+ generate_ids = model.generate(**inputs, max_new_tokens=500)
112
+ output = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
113
+ ```
114
+
115
  You should get something like this:
116
 
117
  ```
 
134
  ```
135
 
136
  While it may appear that spacing in the input is disrupted, this is caused by us skipping special tokens for display, and actually "Can this animal" and "live here" are
137
+ correctly separated by image tokens. Try decoding with special tokens included to see exactly what the model sees!