google/gemma-7b · Remove GGUF from this main repo please!

Feb 21

Unnecessary download! Maybe offer it as a separate repo?

Feb 21

I disagree that it is not necessary. For me, when I go to check out a model, GGUF is the first thing I look for because I can actually use it easily (usually)! Slight problem though in that it is huge! A 4 bit version is usually good enough, e.g: Q4_K_M models from TheBloke are heavily downloaded. People don't need to clone the whole repo, there are other ways to get the files you want, even with git.

miloradg

Feb 21

How much is quantization on the model and how much size it's big?

osanseviero

Google org Feb 21

Hi @migtissera .Why do you want to remove it from here? transformers does not download all files in the repo, only the files that are used.

ehartford

Feb 21

Gguf should be separate repo

Or at least a separate branch

We expect the main repo to contain only the 16 bit weights otherwise we download lots of stuff we don't need

albusdd

Feb 21

absolutely!
why do we have gguf here? it's just unneccesary

osanseviero

Google org Feb 21

•

edited Feb 21

We expect the main repo to contain only the 16 bit weights otherwise we download lots of stuff we don't need

Thanks for sharing! Where is this expectation? Which library? There are many repos with GGUF and transformers weights in the main branch.

migtissera

Feb 21

We usually do a git pull — some files are unnecessary to be included in the same repo.

migtissera

Feb 21

See this? This is how we open source guys usually do repos: https://huggingface.co./alpindale/gemma-7b/tree/main

ehartford

Feb 21

@osanseviero

This is by convention, and it's emerged due to a myriad of methods and tools and workflows, not all of which work exactly the same.

This is our way.

Basically this (https://huggingface.co./alpindale/gemma-7b/tree/main) kind of thing happens when we don't make things easy for people.

@TheBloke forged a lot of this territory.

migtissera

Feb 21

This is our way

migtissera

Feb 21

It’s also a bit arrogant to think that all of us just use Transformers to download repos

Suparious

Feb 22

•

edited Feb 22

I think it is a bit of a dismissive stance to take, from a suggestion that we come together with a common approach. It doesn't seem this is just a problem for people that use transformers library to download, it is from the thousands of community developed apps that have to struggle when a new model comes into the scene with some unicorn repo format.

osanseviero

Google org Feb 22

Hi all, to clarify from my side:

We've seen quite often people releasing in the same repository the GGUF files and the transformers files, such as in https://huggingface.co./stabilityai/stablelm-2-zephyr-1_6b, https://huggingface.co./defog/sqlcoder-7b-2, and https://huggingface.co./TinyLlama/TinyLlama-1.1B-Chat-v0.6, and there were no requests for separate repos in those if I recall correct.

If you're a transformers user, it does not download all the files, just the ones being used. Above, I was asking if there's a tool for GGUF that downloads all the files, as many tools, such as llama.cpp or lmstudio, download a specific GGUF file from the repository. Likewise, using huggingface_hub and huggingface-cli, users download one specific file or can even filter for file extensions if they want to download all *.gguf. I was not trying to imply all users use transformers, but trying to understand which non-transformers tools rely on having a repo with only GGUF files. We're discussing with the Google team to see if we add a separate repo for the GGUF files, thanks for explaining the reasoning behind wanting a separate repo.

Wauplin

Feb 22

•

edited Feb 22

Downloading files using git pull is fine but usually less efficient since it requires to download the full repository (so you are tied to the repo's owner decisions) and the download speed is usually slower than the HTTP-based methods (used by transformers but also most of the other libraries interacting with the Hub using huggingface_hub). You can expect x2-x4 speed-up on a good connection.

A suggestion to replace a git pull in your workflow is to use huggingface-cli as such:

# Download full repo (similar to "git pull")
huggingface-cli download google/gemma-7b

# Download only the gguf file
huggingface-cli download google/gemma-7b gemma-7b.gguf

# Download all safetensors files + config.json
huggingface-cli download google/gemma-7b --include "*.safetensors"  --include config.json

By default, files will be downloaded to the HF cache folder, and will be reusable by any library using huggingface_hub under the hood to avoid re-downloads (HF cache location can be configured like this). If you want to manage by yourself where the files are downloaded, you can use --local-dir option to provide a destination path. Files will be downloaded to the cache and symlinked to this location. To disable symlinks and actually download files to the local dir, you should set --local-dir-use-symlinks=0 as well (this workflow has actually been re-discussed recently -here- and will soon be simplified).

Hope this will help you all getting to use this repo with less friction :)

More for details, check out the CLI docs page.

Suparious

Feb 22

•

edited Feb 22

I've actually found it easier to continue using the git LFS workflow, but simply mask any *.gguf" binaries. This is what we used to do when new devs started putting zips and tars into the source code repos, as usually we do not store binaries inside the source repository, as they are considered to be artifacts.

I understand the needs for certain workflows and how this 34GB bloat for each 7B model is just to help the gemma.cpp project defaults with kaggle. It's just that 32bit float model in a 7bit gguf is not even the optimal format for a 7B, and users will most likely use this repo to make their own GGUF anyhow. Seems like a moot point. Really happy to see another foundational model. thank-you.

Suparious

Feb 22

•

edited Feb 22

Downloading files using git pull is fine but usually less efficient since it requires to download the full repository (so you are tied to the repo's owner decisions) and the download speed is usually slower than the HTTP-based methods (used by transformers but also most of the other libraries interacting with the Hub using huggingface_hub). You can expect x2-x4 speed-up on a good connection.

A suggestion to replace a git pull in your workflow is to use huggingface-cli as such:
# Download full repo (similar to "git pull")
huggingface-cli download google/gemma-7b

# Download only the gguf file
huggingface-cli download google/gemma-7b gemma-7b.gguf

# Download all safetensors files + config.json
huggingface-cli download google/gemma-7b --include "*.safetensors"  --include config.json
By default, files will be downloaded to the HF cache folder, and will be reusable by any library using huggingface_hub under the hood to avoid re-downloads (HF cache location can be configured like this). If you want to manage by yourself where the files are downloaded, you can use --local-dir option to provide a destination path. Files will be downloaded to the cache and symlinked to this location. To disable symlinks and actually download files to the local dir, you should set --local-dir-use-symlinks=0 as well (this workflow has actually been re-discussed recently -here- and will soon be simplified).

Hope this will help you all getting to use this repo with less friction :)

More for details, check out the CLI docs page.

This is not a functional solution if you want to use the tokenizer that comes with the model. Perhaps your suggestions would work better by just excluding *.gguf, and retaining everything else (understanding, like you said, being victim to the repo maintainer's choices).

Maybe this:

# Download full repo excluding *.gguf files
huggingface-cli download google/gemma-7b --exclude "*.gguf"

pandora-s

Feb 22

•

edited Feb 22

The thing is, on one hand we have a few repos that indeed mix them, but we can forgive them to a certain extent due to their file size, but when we are talking about such huge file size like this then that's another subject.

Also, it's a bit weird to say that because a few do it everyone should.

Wauplin

Feb 22

This is not a functional solution if you want to use the tokenizer that comes with the model. Perhaps your suggestions would work better by just excluding *.gguf, and retaining everything else (understanding, like you said, being victim to the repo maintainer's choices).

Definitely yes! Just wanted to point out the different possibilities but it can be adapted depending on the use case.

Anthonyg5005

Feb 22

•

edited Feb 22

There are definitely much better options than git to clone the model files. Something with the cli like Wauplin said or huggingface_hub works. Even requests like oobabooga's download-model.py. I would still recommend repository managers to create different branches for different frameworks and list them on the model card. I don't blame them for not knowing as many older official repositories have models of every framework like pytorch, tensorflow, onnx, jax, rust, transformers, and more in just one branch

migtissera changed discussion status to closed Feb 22

ehartford

Feb 22

Hi all, to clarify from my side:

We've seen quite often people releasing in the same repository the GGUF files and the transformers files, such as in https://huggingface.co./stabilityai/stablelm-2-zephyr-1_6b, https://huggingface.co./defog/sqlcoder-7b-2, and https://huggingface.co./TinyLlama/TinyLlama-1.1B-Chat-v0.6, and there were no requests for separate repos in those if I recall correct.

If you're a transformers user, it does not download all the files, just the ones being used. Above, I was asking if there's a tool for GGUF that downloads all the files, as many tools, such as llama.cpp or lmstudio, download a specific GGUF file from the repository. Likewise, using huggingface_hub and huggingface-cli, users download one specific file or can even filter for file extensions if they want to download all *.gguf. I was not trying to imply all users use transformers, but trying to understand which non-transformers tools rely on having a repo with only GGUF files. We're discussing with the Google team to see if we add a separate repo for the GGUF files, thanks for explaining the reasoning behind wanting a separate repo.

Your examples are either tiny and/or not major base models that will be used by everyone.

Please accept that:

We have been doing this for a while, and we know what we are talking about
our concern is legitimate

We are making a suggestion that will reduce confusion and fragmentation and keep traffic on your repo.

Migel closed the issue, clearly he's done talking about it, and so am I.

Nice model, thank you!

osanseviero

Google org Feb 23

Hey all. We're looking into moving the weights into separate repos in the coming days, hopefully we share them today or on Monday.

That said, even with the separate GGUF files, please consider not using git for this workflow. As shared by @Wauplin and @Anthonyg5005 , using the http based approach will be few times faster, and llama.cpp has a new HF script to just download the needed files

osanseviero

Google org Mar 1

Hey @ehartford et al. I forgot to update here, but there are now also separate repos just with the GGUFs as discussed. See https://huggingface.co./collections/google/gemma-release-65d5efbccdbb8c4202ec078b

ducknificient

Mar 26

im at the opposite. git clone [email protected]:google/gemma-7b-it only cloning the repo without gguf. Not sure why, but can i continue the clone to existing folder ? or i need to download the file manually

Wauplin

Mar 26

@ducknificient you most likely have GIT_LFS_SKIP_SMUDGE=1 set as environment variable on your machine which prevents you from downloading the LFS files when cloning the repo.