Can we run this model on CPU?
I think a lot of users might have this questions in order to have it locally and running on laptop and simple computers.
Can we have an alternative version (simplest), maybe trained using same corpus but with less params in order to run on CPUS?
By the way, congrats on this amazing job! Keep rocking!
You can, but it would be very very slow. You really want a GPU.
The training code for "v2" will be on the repo soon, and you could use that to train from a smaller Pythia model.
Maybe the team will just do that. But models small enough to work on CPUs are <100M params and that may not perform that well for the kind of text-gen QA task people expect to use this for.
Can we quantize the model, like someone did in Llama.cpp? (Pardon my ignorance)
Sure, you can try. See https://github.com/databrickslabs/dolly for source code (2.0 training code coming soon)
People have already done it here: https://github.com/ggerganov/llama.cpp/discussions/569. Looks like it runs on CPU just fine :)
Direct link to cpu ready version is here - https://huggingface.co./geemili/dolly-v2-12b/tree/main
How do you run https://huggingface.co./geemili/dolly-v2-12b/tree/main (12b-quantized model, .ggml file) with AutoTokenizer and AutoModelForCausalLM?
How do you run https://huggingface.co./geemili/dolly-v2-12b/tree/main (12b-quantized model, .ggml file) with AutoTokenizer and AutoModelForCausalLM?
How do you run https://huggingface.co./geemili/dolly-v2-12b/tree/main (12b-quantized model, .ggml file) with AutoTokenizer and AutoModelForCausalLM?
Having trouble installing bitsnadbytes on databricks E8as_v4 GPU cluster?
Hm, what issue? bitsandbytes has been working fine for me
Library installation attempted on the driver node of cluster 0413-233703-4jtufovq and failed. Please refer to the following error message to fix the library or contact Databricks support. Error Code: DRIVER_LIBRARY_INSTALLATION_FAILURE. Error Message: org.apache.spark.SparkException: Process List(bash, /local_disk0/.ephemeral_nfs/cluster_libraries/python/python_start_clusterwide.sh, /local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/pip, install, bitsandbytes==0.38.1, --index-url, https://github.com/timdettmers/bitsandbytes, --disable-pip-version-check) exited with code 1. ERROR: Could not find a version that satisfies the requirement bitsandbytes==0.38.1 (from versions: none) ERROR: No matching distribution found for bitsandbytes==0.38.1
Works fine for me: %pip install bitsandbytes==0.38.1
using the 13.0 ML runtime. How are you installing, exactly?
using %pip install bitsandbytes==0.38.1
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /local_disk0/.ephemeral_nfs/envs/pythonEnv-6fa3d848-9028-4c25-be76-e27f73042d8f/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117_nocublaslt.so
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.0
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /local_disk0/.ephemeral_nfs/envs/pythonEnv-6fa3d848-9028-4c25-be76-e27f73042d8f/lib/
/local_disk0/.ephemeral_nfs/envs/pythonEnv-6fa3d848-9028-4c25-be76-e27f73042d8f/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/databricks/jars/*')}
warn(msg)
/local_disk0/.ephemeral_nfs/envs/pythonEnv-6fa3d848-9028-4c25-be76-e27f73042d8f/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get CUDA error: invalid device function
errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
warn(msg)
/local_disk0/.ephemeral_nfs/envs/pythonEnv-6fa3d848-9028-4c25-be76-e27f73042d8f/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
warn(msg)
Then when I run print(pipe("Explain to me the difference between nuclear fission and fusion."))
RuntimeError: probability tensor contains either inf
, nan
or element < 0
Yeah, that's all working - it's just that the model hits overflow on that input in 8-bit. This can happen. IIRC this seemed to happen on the V100, not A10, but may be just coincidence. Try an A10, or a smaller model.