How to convert original model to q4f16 or q4 for web?
#1
by
nickelshh
- opened
How to convert original model to q4f16 or q4 for web? It seem the convert using optimium cli + quantize_dynamic for QInt4 does work in onnx-web.
I converted the model with this script that also takes care of the 4bit quantization:
https://github.com/microsoft/onnxruntime-genai/blob/main/src/python/py/models/builder.py
I used "python -m onnxruntime_genai.models.builder -m ~/models/Phi-3.5-mini-instruct/ -o ./model/ms -p int4 -e web" and seems the inference result is all incorrect compare to the one I download from yours.