Can we run this model on CPU?

by gustavomr - opened Apr 12, 2023

Apr 12, 2023

I think a lot of users might have this questions in order to have it locally and running on laptop and simple computers.

Can we have an alternative version (simplest), maybe trained using same corpus but with less params in order to run on CPUS?

By the way, congrats on this amazing job! Keep rocking!

srowen

Databricks org Apr 12, 2023

You can, but it would be very very slow. You really want a GPU.
The training code for "v2" will be on the repo soon, and you could use that to train from a smaller Pythia model.
Maybe the team will just do that. But models small enough to work on CPUs are <100M params and that may not perform that well for the kind of text-gen QA task people expect to use this for.

satyasumans

Apr 13, 2023

Can we quantize the model, like someone did in Llama.cpp? (Pardon my ignorance)

srowen

Databricks org Apr 13, 2023

Sure, you can try. See https://github.com/databrickslabs/dolly for source code (2.0 training code coming soon)

compiled

Apr 13, 2023

People have already done it here: https://github.com/ggerganov/llama.cpp/discussions/569. Looks like it runs on CPU just fine :)

Zatsepin

Apr 14, 2023

Direct link to cpu ready version is here - https://huggingface.co./geemili/dolly-v2-12b/tree/main

n940767

Apr 19, 2023

How do you run https://huggingface.co./geemili/dolly-v2-12b/tree/main (12b-quantized model, .ggml file) with AutoTokenizer and AutoModelForCausalLM?

jeffwadsworth

Apr 20, 2023

How do you run https://huggingface.co./geemili/dolly-v2-12b/tree/main (12b-quantized model, .ggml file) with AutoTokenizer and AutoModelForCausalLM?

You can try the following in the attachment.

n940767

Apr 20, 2023

How do you run https://huggingface.co./geemili/dolly-v2-12b/tree/main (12b-quantized model, .ggml file) with AutoTokenizer and AutoModelForCausalLM?

You can try the following in the attachment.

Having trouble installing bitsnadbytes on databricks E8as_v4 GPU cluster?

srowen

Databricks org Apr 20, 2023

Hm, what issue? bitsandbytes has been working fine for me

n940767

Apr 20, 2023

Library installation attempted on the driver node of cluster 0413-233703-4jtufovq and failed. Please refer to the following error message to fix the library or contact Databricks support. Error Code: DRIVER_LIBRARY_INSTALLATION_FAILURE. Error Message: org.apache.spark.SparkException: Process List(bash, /local_disk0/.ephemeral_nfs/cluster_libraries/python/python_start_clusterwide.sh, /local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/pip, install, bitsandbytes==0.38.1, --index-url, https://github.com/timdettmers/bitsandbytes, --disable-pip-version-check) exited with code 1. ERROR: Could not find a version that satisfies the requirement bitsandbytes==0.38.1 (from versions: none) ERROR: No matching distribution found for bitsandbytes==0.38.1

srowen

Databricks org Apr 20, 2023

Works fine for me: %pip install bitsandbytes==0.38.1 using the 13.0 ML runtime. How are you installing, exactly?

n940767

Apr 20, 2023

using %pip install bitsandbytes==0.38.1

Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

issues

bin /local_disk0/.ephemeral_nfs/envs/pythonEnv-6fa3d848-9028-4c25-be76-e27f73042d8f/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117_nocublaslt.so
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.0
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /local_disk0/.ephemeral_nfs/envs/pythonEnv-6fa3d848-9028-4c25-be76-e27f73042d8f/lib/

/local_disk0/.ephemeral_nfs/envs/pythonEnv-6fa3d848-9028-4c25-be76-e27f73042d8f/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/databricks/jars/*')}
warn(msg)
/local_disk0/.ephemeral_nfs/envs/pythonEnv-6fa3d848-9028-4c25-be76-e27f73042d8f/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get CUDA error: invalid device function errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
warn(msg)
/local_disk0/.ephemeral_nfs/envs/pythonEnv-6fa3d848-9028-4c25-be76-e27f73042d8f/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
warn(msg)

Then when I run print(pipe("Explain to me the difference between nuclear fission and fusion."))

RuntimeError: probability tensor contains either inf, nan or element < 0

srowen

Databricks org Apr 20, 2023

Yeah, that's all working - it's just that the model hits overflow on that input in 8-bit. This can happen. IIRC this seemed to happen on the V100, not A10, but may be just coincidence. Try an A10, or a smaller model.

srowen changed discussion status to closed Apr 21, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment