Remove GGUF from this main repo please!
Unnecessary download! Maybe offer it as a separate repo?
I disagree that it is not necessary. For me, when I go to check out a model, GGUF is the first thing I look for because I can actually use it easily (usually)! Slight problem though in that it is huge! A 4 bit version is usually good enough, e.g: Q4_K_M models from TheBloke are heavily downloaded. People don't need to clone the whole repo, there are other ways to get the files you want, even with git.
How much is quantization on the model and how much size it's big?
Hi
@migtissera
.Why do you want to remove it from here? transformers
does not download all files in the repo, only the files that are used.
Gguf should be separate repo
Or at least a separate branch
We expect the main repo to contain only the 16 bit weights otherwise we download lots of stuff we don't need
absolutely!
why do we have gguf here? it's just unneccesary
We expect the main repo to contain only the 16 bit weights otherwise we download lots of stuff we don't need
Thanks for sharing! Where is this expectation? Which library? There are many repos with GGUF and transformers weights in the main
branch.
We usually do a git pull — some files are unnecessary to be included in the same repo.
See this? This is how we open source guys usually do repos: https://huggingface.co./alpindale/gemma-7b/tree/main
This is by convention, and it's emerged due to a myriad of methods and tools and workflows, not all of which work exactly the same.
This is our way.
Basically this (https://huggingface.co./alpindale/gemma-7b/tree/main) kind of thing happens when we don't make things easy for people.
@TheBloke forged a lot of this territory.
This is our way
It’s also a bit arrogant to think that all of us just use Transformers to download repos
I think it is a bit of a dismissive stance to take, from a suggestion that we come together with a common approach. It doesn't seem this is just a problem for people that use transformers library to download, it is from the thousands of community developed apps that have to struggle when a new model comes into the scene with some unicorn repo format.
Hi all, to clarify from my side:
We've seen quite often people releasing in the same repository the GGUF files and the transformers files, such as in https://huggingface.co./stabilityai/stablelm-2-zephyr-1_6b, https://huggingface.co./defog/sqlcoder-7b-2, and https://huggingface.co./TinyLlama/TinyLlama-1.1B-Chat-v0.6, and there were no requests for separate repos in those if I recall correct.
If you're a transformers
user, it does not download all the files, just the ones being used. Above, I was asking if there's a tool for GGUF
that downloads all the files, as many tools, such as llama.cpp
or lmstudio
, download a specific GGUF file from the repository. Likewise, using huggingface_hub
and huggingface-cli
, users download one specific file or can even filter for file extensions if they want to download all *.gguf
. I was not trying to imply all users use transformers
, but trying to understand which non-transformers
tools rely on having a repo with only GGUF
files. We're discussing with the Google team to see if we add a separate repo for the GGUF files, thanks for explaining the reasoning behind wanting a separate repo.
Downloading files using git pull
is fine but usually less efficient since it requires to download the full repository (so you are tied to the repo's owner decisions) and the download speed is usually slower than the HTTP-based methods (used by transformers
but also most of the other libraries interacting with the Hub using huggingface_hub
). You can expect x2-x4 speed-up on a good connection.
A suggestion to replace a git pull
in your workflow is to use huggingface-cli
as such:
# Download full repo (similar to "git pull")
huggingface-cli download google/gemma-7b
# Download only the gguf file
huggingface-cli download google/gemma-7b gemma-7b.gguf
# Download all safetensors files + config.json
huggingface-cli download google/gemma-7b --include "*.safetensors" --include config.json
By default, files will be downloaded to the HF cache folder, and will be reusable by any library using huggingface_hub
under the hood to avoid re-downloads (HF cache location can be configured like this). If you want to manage by yourself where the files are downloaded, you can use --local-dir
option to provide a destination path. Files will be downloaded to the cache and symlinked to this location. To disable symlinks and actually download files to the local dir, you should set --local-dir-use-symlinks=0
as well (this workflow has actually been re-discussed recently -here- and will soon be simplified).
Hope this will help you all getting to use this repo with less friction :)
More for details, check out the CLI docs page.
I've actually found it easier to continue using the git LFS workflow, but simply mask any *.gguf" binaries. This is what we used to do when new devs started putting zips and tars into the source code repos, as usually we do not store binaries inside the source repository, as they are considered to be artifacts.
I understand the needs for certain workflows and how this 34GB bloat for each 7B model is just to help the gemma.cpp project defaults with kaggle. It's just that 32bit float model in a 7bit gguf is not even the optimal format for a 7B, and users will most likely use this repo to make their own GGUF anyhow. Seems like a moot point. Really happy to see another foundational model. thank-you.
Downloading files using
git pull
is fine but usually less efficient since it requires to download the full repository (so you are tied to the repo's owner decisions) and the download speed is usually slower than the HTTP-based methods (used bytransformers
but also most of the other libraries interacting with the Hub usinghuggingface_hub
). You can expect x2-x4 speed-up on a good connection.A suggestion to replace a
git pull
in your workflow is to usehuggingface-cli
as such:
# Download full repo (similar to "git pull") huggingface-cli download google/gemma-7b # Download only the gguf file huggingface-cli download google/gemma-7b gemma-7b.gguf # Download all safetensors files + config.json huggingface-cli download google/gemma-7b --include "*.safetensors" --include config.json
By default, files will be downloaded to the HF cache folder, and will be reusable by any library using
huggingface_hub
under the hood to avoid re-downloads (HF cache location can be configured like this). If you want to manage by yourself where the files are downloaded, you can use--local-dir
option to provide a destination path. Files will be downloaded to the cache and symlinked to this location. To disable symlinks and actually download files to the local dir, you should set--local-dir-use-symlinks=0
as well (this workflow has actually been re-discussed recently -here- and will soon be simplified).Hope this will help you all getting to use this repo with less friction :)
More for details, check out the CLI docs page.
This is not a functional solution if you want to use the tokenizer that comes with the model. Perhaps your suggestions would work better by just excluding *.gguf, and retaining everything else (understanding, like you said, being victim to the repo maintainer's choices).
Maybe this:
# Download full repo excluding *.gguf files
huggingface-cli download google/gemma-7b --exclude "*.gguf"
The thing is, on one hand we have a few repos that indeed mix them, but we can forgive them to a certain extent due to their file size, but when we are talking about such huge file size like this then that's another subject.
Also, it's a bit weird to say that because a few do it everyone should.
This is not a functional solution if you want to use the tokenizer that comes with the model. Perhaps your suggestions would work better by just excluding *.gguf, and retaining everything else (understanding, like you said, being victim to the repo maintainer's choices).
Definitely yes! Just wanted to point out the different possibilities but it can be adapted depending on the use case.
There are definitely much better options than git to clone the model files. Something with the cli like Wauplin said or huggingface_hub works. Even requests like oobabooga's download-model.py. I would still recommend repository managers to create different branches for different frameworks and list them on the model card. I don't blame them for not knowing as many older official repositories have models of every framework like pytorch, tensorflow, onnx, jax, rust, transformers, and more in just one branch
Hi all, to clarify from my side:
We've seen quite often people releasing in the same repository the GGUF files and the transformers files, such as in https://huggingface.co./stabilityai/stablelm-2-zephyr-1_6b, https://huggingface.co./defog/sqlcoder-7b-2, and https://huggingface.co./TinyLlama/TinyLlama-1.1B-Chat-v0.6, and there were no requests for separate repos in those if I recall correct.
If you're a
transformers
user, it does not download all the files, just the ones being used. Above, I was asking if there's a tool forGGUF
that downloads all the files, as many tools, such asllama.cpp
orlmstudio
, download a specific GGUF file from the repository. Likewise, usinghuggingface_hub
andhuggingface-cli
, users download one specific file or can even filter for file extensions if they want to download all*.gguf
. I was not trying to imply all users usetransformers
, but trying to understand which non-transformers
tools rely on having a repo with onlyGGUF
files. We're discussing with the Google team to see if we add a separate repo for the GGUF files, thanks for explaining the reasoning behind wanting a separate repo.
Your examples are either tiny and/or not major base models that will be used by everyone.
Please accept that:
- We have been doing this for a while, and we know what we are talking about
- our concern is legitimate
We are making a suggestion that will reduce confusion and fragmentation and keep traffic on your repo.
Migel closed the issue, clearly he's done talking about it, and so am I.
Nice model, thank you!
Hey all. We're looking into moving the weights into separate repos in the coming days, hopefully we share them today or on Monday.
That said, even with the separate GGUF files, please consider not using git for this workflow. As shared by @Wauplin and @Anthonyg5005 , using the http based approach will be few times faster, and llama.cpp has a new HF script to just download the needed files
Hey @ehartford et al. I forgot to update here, but there are now also separate repos just with the GGUFs as discussed. See https://huggingface.co./collections/google/gemma-release-65d5efbccdbb8c4202ec078b
im at the opposite. git clone [email protected]:google/gemma-7b-it
only cloning the repo without gguf. Not sure why, but can i continue the clone to existing folder ? or i need to download the file manually
@ducknificient
you most likely have GIT_LFS_SKIP_SMUDGE=1
set as environment variable on your machine which prevents you from downloading the LFS files when cloning the repo.