czczup commited on
Commit
bbead0c
·
verified ·
1 Parent(s): 12279a7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -20
README.md CHANGED
@@ -1,5 +1,7 @@
1
  ---
2
- license: mit
 
 
3
  pipeline_tag: image-text-to-text
4
  library_name: transformers
5
  base_model:
@@ -65,7 +67,7 @@ To construct this dataset, we propose an efficient data construction pipeline. S
65
 
66
  - **For samples with clear ground truths:**
67
  the model is prompted to first provide the reasoning process and then give the final answer in the format like `Final Answer: ***`.
68
- Responses matching the ground truth answer constitute the positive set \\(mathcal{Y}_p\\), while those that do not match make up the negative set \\(\mathcal{Y}_n\\). Additionally, responses that fail to provide a clear final answer are also merged into \\(\mathcal{Y}_n\\).
69
  Given these responses labeled as positive or negative, we build the preference pairs by selecting a chosen response \\(y_c\\) from \\(\mathcal{Y}_p\\) and a negative response \\(y_r\\) from \\(\mathcal{Y}_n\\).
70
 
71
  - **For samples without clear ground truths:**
@@ -160,7 +162,7 @@ To comprehensively compare InternVL's performance before and after MPO, we emplo
160
 
161
  ## Quick Start
162
 
163
- We provide an example code to run `InternVL2_5-1B` using `transformers`.
164
 
165
  > Please use transformers>=4.37.2 to ensure the model works normally.
166
 
@@ -171,7 +173,7 @@ We provide an example code to run `InternVL2_5-1B` using `transformers`.
171
  ```python
172
  import torch
173
  from transformers import AutoTokenizer, AutoModel
174
- path = "OpenGVLab/InternVL2_5-1B"
175
  model = AutoModel.from_pretrained(
176
  path,
177
  torch_dtype=torch.bfloat16,
@@ -185,7 +187,7 @@ model = AutoModel.from_pretrained(
185
  ```python
186
  import torch
187
  from transformers import AutoTokenizer, AutoModel
188
- path = "OpenGVLab/InternVL2_5-1B"
189
  model = AutoModel.from_pretrained(
190
  path,
191
  torch_dtype=torch.bfloat16,
@@ -230,8 +232,8 @@ def split_model(model_name):
230
 
231
  return device_map
232
 
233
- path = "OpenGVLab/InternVL2_5-1B"
234
- device_map = split_model('InternVL2_5-1B')
235
  model = AutoModel.from_pretrained(
236
  path,
237
  torch_dtype=torch.bfloat16,
@@ -244,6 +246,7 @@ model = AutoModel.from_pretrained(
244
  ### Inference with Transformers
245
 
246
  ```python
 
247
  import numpy as np
248
  import torch
249
  import torchvision.transforms as T
@@ -326,14 +329,44 @@ def load_image(image_file, input_size=448, max_num=12):
326
  pixel_values = torch.stack(pixel_values)
327
  return pixel_values
328
 
329
- # If you want to load a model using multiple GPUs, please refer to the `Multiple GPUs` section.
330
- path = 'OpenGVLab/InternVL2_5-1B'
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
331
  model = AutoModel.from_pretrained(
332
  path,
333
  torch_dtype=torch.bfloat16,
 
334
  low_cpu_mem_usage=True,
335
  use_flash_attn=True,
336
- trust_remote_code=True).eval().cuda()
 
337
  tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, use_fast=False)
338
 
339
  # set the max number of tiles in `max_num`
@@ -510,9 +543,9 @@ LMDeploy abstracts the complex inference process of multi-modal Vision-Language
510
  from lmdeploy import pipeline, TurbomindEngineConfig
511
  from lmdeploy.vl import load_image
512
 
513
- model = 'OpenGVLab/InternVL2_5-1B'
514
  image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
515
- pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
516
  response = pipe(('describe this image', image))
517
  print(response.text)
518
  ```
@@ -528,8 +561,8 @@ from lmdeploy import pipeline, TurbomindEngineConfig
528
  from lmdeploy.vl import load_image
529
  from lmdeploy.vl.constants import IMAGE_TOKEN
530
 
531
- model = 'OpenGVLab/InternVL2_5-1B'
532
- pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
533
 
534
  image_urls=[
535
  'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg',
@@ -550,8 +583,8 @@ Conducting inference with batch prompts is quite straightforward; just place the
550
  from lmdeploy import pipeline, TurbomindEngineConfig
551
  from lmdeploy.vl import load_image
552
 
553
- model = 'OpenGVLab/InternVL2_5-1B'
554
- pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
555
 
556
  image_urls=[
557
  "https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg",
@@ -570,8 +603,8 @@ There are two ways to do the multi-turn conversations with the pipeline. One is
570
  from lmdeploy import pipeline, TurbomindEngineConfig, GenerationConfig
571
  from lmdeploy.vl import load_image
572
 
573
- model = 'OpenGVLab/InternVL2_5-1B'
574
- pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
575
 
576
  image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg')
577
  gen_config = GenerationConfig(top_k=40, top_p=0.8, temperature=0.8)
@@ -586,7 +619,7 @@ print(sess.response.text)
586
  LMDeploy's `api_server` enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup:
587
 
588
  ```shell
589
- lmdeploy serve api_server OpenGVLab/InternVL2_5-1B --server-port 23333
590
  ```
591
 
592
  To use the OpenAI-style interface, you need to install OpenAI:
@@ -625,7 +658,7 @@ print(response)
625
 
626
  ## License
627
 
628
- This project is released under the MIT License. This project uses the pre-trained Qwen2.5-0.5B-Instruct as a component, which is licensed under the Apache License 2.0.
629
 
630
  ## Citation
631
 
 
1
  ---
2
+ license: other
3
+ license_name: qwen
4
+ license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
5
  pipeline_tag: image-text-to-text
6
  library_name: transformers
7
  base_model:
 
67
 
68
  - **For samples with clear ground truths:**
69
  the model is prompted to first provide the reasoning process and then give the final answer in the format like `Final Answer: ***`.
70
+ Responses matching the ground truth answer constitute the positive set \\(\mathcal{Y}_p\\), while those that do not match make up the negative set \\(\mathcal{Y}_n\\). Additionally, responses that fail to provide a clear final answer are also merged into \\(\mathcal{Y}_n\\).
71
  Given these responses labeled as positive or negative, we build the preference pairs by selecting a chosen response \\(y_c\\) from \\(\mathcal{Y}_p\\) and a negative response \\(y_r\\) from \\(\mathcal{Y}_n\\).
72
 
73
  - **For samples without clear ground truths:**
 
162
 
163
  ## Quick Start
164
 
165
+ We provide an example code to run `InternVL2_5-78B-MPO` using `transformers`.
166
 
167
  > Please use transformers>=4.37.2 to ensure the model works normally.
168
 
 
173
  ```python
174
  import torch
175
  from transformers import AutoTokenizer, AutoModel
176
+ path = "OpenGVLab/InternVL2_5-78B-MPO"
177
  model = AutoModel.from_pretrained(
178
  path,
179
  torch_dtype=torch.bfloat16,
 
187
  ```python
188
  import torch
189
  from transformers import AutoTokenizer, AutoModel
190
+ path = "OpenGVLab/InternVL2_5-78B-MPO"
191
  model = AutoModel.from_pretrained(
192
  path,
193
  torch_dtype=torch.bfloat16,
 
232
 
233
  return device_map
234
 
235
+ path = "OpenGVLab/InternVL2_5-78B-MPO"
236
+ device_map = split_model('InternVL2_5-78B')
237
  model = AutoModel.from_pretrained(
238
  path,
239
  torch_dtype=torch.bfloat16,
 
246
  ### Inference with Transformers
247
 
248
  ```python
249
+ import math
250
  import numpy as np
251
  import torch
252
  import torchvision.transforms as T
 
329
  pixel_values = torch.stack(pixel_values)
330
  return pixel_values
331
 
332
+ def split_model(model_name):
333
+ device_map = {}
334
+ world_size = torch.cuda.device_count()
335
+ num_layers = {
336
+ 'InternVL2_5-1B': 24, 'InternVL2_5-2B': 24, 'InternVL2_5-4B': 36, 'InternVL2_5-8B': 32,
337
+ 'InternVL2_5-26B': 48, 'InternVL2_5-38B': 64, 'InternVL2_5-78B': 80}[model_name]
338
+ # Since the first GPU will be used for ViT, treat it as half a GPU.
339
+ num_layers_per_gpu = math.ceil(num_layers / (world_size - 0.5))
340
+ num_layers_per_gpu = [num_layers_per_gpu] * world_size
341
+ num_layers_per_gpu[0] = math.ceil(num_layers_per_gpu[0] * 0.5)
342
+ layer_cnt = 0
343
+ for i, num_layer in enumerate(num_layers_per_gpu):
344
+ for j in range(num_layer):
345
+ device_map[f'language_model.model.layers.{layer_cnt}'] = i
346
+ layer_cnt += 1
347
+ device_map['vision_model'] = 0
348
+ device_map['mlp1'] = 0
349
+ device_map['language_model.model.tok_embeddings'] = 0
350
+ device_map['language_model.model.embed_tokens'] = 0
351
+ device_map['language_model.output'] = 0
352
+ device_map['language_model.model.norm'] = 0
353
+ device_map['language_model.lm_head'] = 0
354
+ device_map[f'language_model.model.layers.{num_layers - 1}'] = 0
355
+
356
+ return device_map
357
+
358
+ # If you set `load_in_8bit=True`, you will need two 80GB GPUs.
359
+ # If you set `load_in_8bit=False`, you will need at least three 80GB GPUs.
360
+ path = 'OpenGVLab/InternVL2_5-78B-MPO'
361
+ device_map = split_model('InternVL2_5-78B')
362
  model = AutoModel.from_pretrained(
363
  path,
364
  torch_dtype=torch.bfloat16,
365
+ load_in_8bit=False,
366
  low_cpu_mem_usage=True,
367
  use_flash_attn=True,
368
+ trust_remote_code=True,
369
+ device_map=device_map).eval()
370
  tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, use_fast=False)
371
 
372
  # set the max number of tiles in `max_num`
 
543
  from lmdeploy import pipeline, TurbomindEngineConfig
544
  from lmdeploy.vl import load_image
545
 
546
+ model = 'OpenGVLab/InternVL2_5-78B-MPO'
547
  image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
548
+ pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192, tp=4))
549
  response = pipe(('describe this image', image))
550
  print(response.text)
551
  ```
 
561
  from lmdeploy.vl import load_image
562
  from lmdeploy.vl.constants import IMAGE_TOKEN
563
 
564
+ model = 'OpenGVLab/InternVL2_5-78B-MPO'
565
+ pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192, tp=4))
566
 
567
  image_urls=[
568
  'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg',
 
583
  from lmdeploy import pipeline, TurbomindEngineConfig
584
  from lmdeploy.vl import load_image
585
 
586
+ model = 'OpenGVLab/InternVL2_5-78B-MPO'
587
+ pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192, tp=4))
588
 
589
  image_urls=[
590
  "https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg",
 
603
  from lmdeploy import pipeline, TurbomindEngineConfig, GenerationConfig
604
  from lmdeploy.vl import load_image
605
 
606
+ model = 'OpenGVLab/InternVL2_5-78B-MPO'
607
+ pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192, tp=4))
608
 
609
  image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg')
610
  gen_config = GenerationConfig(top_k=40, top_p=0.8, temperature=0.8)
 
619
  LMDeploy's `api_server` enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup:
620
 
621
  ```shell
622
+ lmdeploy serve api_server OpenGVLab/InternVL2_5-78B-MPO --server-port 23333 --tp 4
623
  ```
624
 
625
  To use the OpenAI-style interface, you need to install OpenAI:
 
658
 
659
  ## License
660
 
661
+ This project is released under the MIT License. This project uses the pre-trained Qwen2.5-72B-Instruct as a component, which is licensed under the Qwen License.
662
 
663
  ## Citation
664