File size: 8,201 Bytes

df2b1aa
 
 
 
 
 
50101ba
df2b1aa
 
 
50101ba
 
 
b1667a4
 
df2b1aa
 
 
 
 
 
 
 
 
e850412
36bdfd8
df2b1aa
e850412
b195445
 
 
df2b1aa
e850412
df2b1aa
 
 
 
 
 
 
 
 
 
 
d19969e
df2b1aa
 
 
 
 
 
 
d19969e
 
df2b1aa
 
 
 
 
 
 
 
 
 
 
 
29b7b8d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
df2b1aa
 
 
 
36bdfd8
 
 
 
29b7b8d
 
 
 
 
d19969e
29b7b8d
d19969e
29b7b8d
 
 
 
 
 
 
 
 
 
 
df2b1aa
29b7b8d
36bdfd8
 
 
50101ba
29b7b8d
36bdfd8
df2b1aa
 
 
e850412
 
 
 
 
 
a4fecfb
8b9e430
e850412
 
a4fecfb
8b9e430
e850412
 
a4fecfb
8b9e430
e850412
 
167d34d
 
 
 
 
 
 
 
3c1db79
167d34d
 
3c1db79
167d34d
 
 
 
8b9e430
e850412
167d34d
 
e850412
 
3c1db79
e850412
167d34d
3c1db79
a4fecfb
e850412
50101ba
df2b1aa
 
 
 
 
 
 
f8b95d2

---
tags:
- npu
- amd
- llama3.1
- RyzenAI
- translation
---

This model is finetuned [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co./meta-llama/Meta-Llama-3.1-8B-Instruct) and AWQ quantized and converted version to run on the [NPU installed Ryzen AI PC](https://github.com/amd/RyzenAI-SW/issues/18), for example, Ryzen 9 7940HS Processor.  

Supports translation between English, French, Chinese(Mandarin) and Japanese.  

There are many varieties of Chinese, with Mandarin being the most commonly used, so please use Mandarin instead of Chinese in your prompts.  

For set up Ryzen AI for LLMs in window 11, see [Running LLM on AMD NPU Hardware](https://www.hackster.io/gharada2013/running-llm-on-amd-npu-hardware-19322f).  

The following sample assumes that the setup on the above page has been completed.  

This model has only been tested on RyzenAI for Windows 11. It does not work in Linux environments such as WSL.  

RoPE support is not yet complete, but it has been confirmed that the perplexity is lower than Llama 3.  

2024/07/30  
- [Ryzen AI Software 1.2](https://ryzenai.docs.amd.com/en/latest/) has been released. Please note that this model is based on [Ryzen AI Software 1.1](https://ryzenai.docs.amd.com/en/1.1/index.html). 
- [amd/RyzenAI-SW 1.2](https://github.com/amd/RyzenAI-SW) was announced on July 29, 2024. This sample for [amd/RyzenAI-SW 1.1](https://github.com/amd/RyzenAI-SW/tree/1.1). Please note that the folder and script contents have been completely changed.

2024/08/04  
- This model was created with the 1.1 driver, but it has been confirmed that it works with 1.2. Please check the [setup for 1.2 driver](https://huggingface.co./dahara1/llama-translate-amd-npu).



### setup for 1.1 driver 
In cmd windows.
```
conda activate ryzenai-transformers
<your_install_path>\RyzenAI-SW\example\transformers\setup.bat

pip install transformers==4.43.3
# Updating the Transformers library will cause the LLama 2 sample to stop working.  
# If you want to run LLama 2, revert to pip install transformers==4.34.0.  
pip install tokenizers==0.19.1
pip install -U "huggingface_hub[cli]"

huggingface-cli download dahara1/llama-translate-amd-npu --revision main --local-dir llama-translate-amd-npu

copy <your_ryzen_ai-sw_install_path>\RyzenAI-SW\example\transformers\models\llama2\modeling_llama_amd.py .

# set up Runtime. see https://ryzenai.docs.amd.com/en/latest/runtime_setup.html
set XLNX_VART_FIRMWARE=<your_firmware_install_path>\voe-4.0-win_amd64\1x4.xclbin
set NUM_OF_DPU_RUNNERS=1

# save below sample script as utf8 and llama-translate-amd-npu-test.py
python llama-translate-amd-npu.py
```

### Sample Script

```
import torch
import psutil
import transformers
from transformers import AutoTokenizer, set_seed
import qlinear
import logging


def translation(instruction,  input):
    system =  """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a highly skilled professional translator. You are a native speaker of English, Japanese, French and Mandarin. Translate the given text accurately, taking into account the context and specific instructions provided. Steps may include hints enclosed in square brackets [] with the key and value separated by a colon:. If no additional instructions or context are provided, use your expertise to consider what the most appropriate context is and provide a natural translation that aligns with that context. When translating, strive to faithfully reflect the meaning and tone of the original text, pay attention to cultural nuances and differences in language usage, and ensure that the translation is grammatically correct and easy to read. For technical terms and proper nouns, either leave them in the original language or use appropriate translations as necessary. Take a deep breath, calm down, and start translating.<|eot_id|><|start_header_id|>user<|end_header_id|>"""

    prompt = f"""{system}
### Instruction:
{instruction}

### Input:
{input}

### Response:
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""

    tokenized_input = tokenizer(prompt, return_tensors="pt",
        padding=True, max_length=1600, truncation=True)

    terminators = [
        tokenizer.eos_token_id,
    ]

    outputs = model.generate(tokenized_input['input_ids'],
            max_new_tokens=600,
            eos_token_id=terminators,
            attention_mask=tokenized_input['attention_mask'],
            do_sample=True,
            temperature=0.3,
            top_p=0.5)
    response = outputs[0][tokenized_input['input_ids'].shape[-1]:]
    response_message = tokenizer.decode(response, skip_special_tokens=True)
    return response_message


if __name__ == "__main__":


  transformers.logging.set_verbosity_error()
  logging.disable(logging.CRITICAL)

  set_seed(123)
  p = psutil.Process()
  p.cpu_affinity([0, 1, 2, 3])
  torch.set_num_threads(4)

  tokenizer = AutoTokenizer.from_pretrained("llama-translate-amd-npu")
  tokenizer.pad_token_id = tokenizer.add_special_tokens({'pad_token': '<|finetune_right_pad_id|>'})
  ckpt = r"llama-translate-amd-npu\llama3.1_8b_translate_w_bit_4_awq_amd.pt"

  model = torch.load(ckpt)
  model.eval()
  model = model.to(torch.bfloat16)

  for n, m in model.named_modules():
      if isinstance(m, qlinear.QLinearPerGrp):
          print(f"Preparing weights of layer : {n}")
          m.device = "aie"
          m.quantize_weights()



  print(translation("Translate Japanese to English.", "1月1日は日本の祝日です。その日は日曜日で、5日ぶりに雨が降りました"))
  print(translation("Translate English to Japanese.", "It’s raining cats and dogs."))
  print(translation("Translate French to Japanese.", "Après la pluie, le beau temps"))
  print(translation("Translate Mandarin to Japanese.", "要功夫深，铁杵磨成针"))

  

```

### setup for 1.2 driver 

The setup of 1.2 may not work even if you follow the instructions, so I will write some tips on how to run it below.  
For the first half, see [Appendix: Tips for running Ryzen AI Software 1.2 in Running LLM on AMD NPU Hardware](https://www.hackster.io/gharada2013/running-llm-on-amd-npu-hardware-19322f).　　

Then, 
- Uninstall VC 2019

I'm not sure if this is the cause, but compilation sometimes failed if VC 2019 was installed

- Delete the previous virtual environment for 1.1

This may not be necessary, but just to be sure

- Follow the instructions on [LLMs on RyzenAI with Pytorch](https://github.com/amd/RyzenAI-SW/blob/main/example/transformers/models/llm/docs/README.md)  and create conda enviroment.

After creating the Z drive and compiling, delete the Z drive before running the script. Otherwise, an error will occur if the module cannot be found.  

- Exit the CMD and restart it.

- Enable the conda environment

```
conda activate ryzenai-transformers
```

- cd and then run setup.bat You should do cd command or fail setup_phx.bat.

```
cd <your_RyzenA-SW_install_path>\RyzenAI-SW\example\transformers
.\setup_phx.bat
```

- Set the environment variables. The batch file you should run will differ depending on the CPU you are using, so please refer to the [official instructions](https://ryzenai.docs.amd.com/en/latest/runtime_setup.html).   

```
set XLNX_VART_FIRMWARE=%RYZEN_AI_INSTALLATION_PATH%\voe-4.0-win_amd64\xclbins\phoenix\1x4.xclbin
set XLNX_TARGET_NAME=AMD_AIE2_Nx4_Overlay
```

- Copy [modeling_llama_amd.py](https://github.com/amd/RyzenAI-SW/blob/1.1/example/transformers/models/llama2/modeling_llama_amd.py) from the version1.1 tree.  

There are no changes to the model download and sample scripts.  
In Version 1.2, you can see the NPU usage in the Task Manager.  
Good luck. 

![chat_image](trans-sample.png)

## Acknowledgements
- [amd/RyzenAI-SW](https://github.com/amd/RyzenAI-SW)  
Sample Code and Drivers.
- [mit-han-lab/llm-awq](https://github.com/mit-han-lab/llm-awq)  
Thanks for AWQ quantization Method.  
- [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co./meta-llama/Meta-Llama-3-8B-Instruct)  
[Built with Meta Llama 3](https://llama.meta.com/llama3/license/)  
Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.