THE THREAD OF DOOM

#12
by jukofyork - opened

Just realised I deleted the old "thread of doom" as it was attached to the earliest alpha version of the control vectors :(

jukofyork pinned discussion

Okay, I was wondering if we crossed some sort of line.

Anyway.. the INCREDIBLY important thing I was saying before the thread disappeared was... I have a feeling it is going to be just like they say. They are going to be liberal with grants. I suspect they will target people who are using the space outside the purpose that was intended... somewhere out there, someone has all their RAW 8k videos of their cats...

Anyway.. the INCREDIBLY important thing I was saying before the thread disappeared was... I have a feeling it is going to be just like they say. They are going to be liberal with grants. I suspect they will target people who are using the space outside the purpose that was intended... somewhere out there, someone has all their RAW 8k videos of their cats...

Yeah, it's a pity it got deleted (I should have checked more carefully what was linked), but it was getting a bit out of hand with all that scrolling so perhaps not such a bad thing.

I'm just gonna keep up the models that people have downloaded the most and get rid of all the "experimental, but likely broken" stuff with 15 downloads as they really weren't serving much of a purpose.

Also, all the old versions of the control vectors were vastly inferior to the final version due to me figuring out how to get them working as I went along, so it's probably better to just keep up the final v3.0 ones to avoid a lot of the confusion.


image.png

image.png

It looks a lot more like I'm just uploading quality models that people like/use now at least... The creative-writer-v0.1-35b and creative-writer-v0.2-35b models will be going as soon as I get the v1.0 version uploaded, and possible Dusk-Miqu-70B if they do set a hard-limit (I still think Dark-Miqu-70B is worth keeping whatever though).


Also if anybody really misses any I have uploaded, then I can in theory recreate them and upload a LoRA created from the delta using extract_lora.py, but I strongly suspect most of the models nobody will even notice they have gone... Of all that I have created I've only ever used Dark-Miqu-70B myself!

:( Damn there was some good info in that thread.

If you've still got Firefox tabs open somewhere, you'll be able to save some of the thread.

Unfortunately, I cleaned my browser tabs up about an hour ago.

And yeah, if people were using it as free cloud storage then it makes sense. I just think they could have gone about it better, rather than having us wake up and see the limit.

I'm curious, did your quota drop after deleting that? I wonder if all the PNG files attached there were "billed" to you.

@jukofyork I think you're good man. If they start enforcing it, you'll get an exemption for sure.

I come across your contributions randomly all over the place, even on github repos like some fine tuning tool lol

I should probably deduplicate my quants. Often, I was making one because I could not find what I was looking for, then it would turn out a few of us just happened to be making them at the same time, Then I started getting requests. So I just decided I would make a bunch. Need a Huggingverse quant global dedupe...

I haven't tested it yet, but the new initialization / optimisation may let me bump the Entropy up even further than I could before, but for now I'm just using stock Cross-Entropy loss and no attempt to increase Entropy until I get the hyper-parameters dialed in properly...

I'm still running on the 1.1M random paragraphs dataset and using the "hack" I posted above to avoid the special tokens getting nerfed:

https://github.com/tdrussell/qlora-pipe/discussions/41

I'll be buggered if I can make this work in pytorch without using 10GB extra VRAM (for no apparent reason - even using "chunking"???), but the Triton kernel modification works...

If anybody has any suggestions I'd be very grateful, as currently this dodgy hack will mean the code needs to be edited for every different model :/

Merry Christmas!

2ab07090374e9f9a78cbdf0e304dc8c8.jpg

Merry Christmas @jukofyork @ChuckMcSneed @gghfez and lurkers!

Merry Christmas!

https://huggingface.co./spaces/Xenova/the-tokenizer-playground

This looks useful. I've got a tokenizer issue to investigate myself. I've been using the standard eg:

from transformers import AutoTokenizer
writer_tokenizer = AutoTokenizer.from_pretrained("gghfez/Writer-Large-2411-v2.1")
print(writer_tokenizer.encode("""<BOS>paragaph1

paragraph2

paragraph3"""))

So it looks like for command-r, 206 is 1 linefeed and 2126 is 2 linefeeds.

If anybody has any suggestions I'd be very grateful, as currently this dodgy hack will mean the code needs to be edited for every different model :/

Sorry, what you're doing is beyond my level right now.

Merry Christmas!

Not related to creative writing, but the new QWQ:72B model is insanely impressive:

  1. I gave it an obscure picture of train line map I took at a museum a few months ago: horrible photo, glare reflecting off the perspex in front of it, etc. Then asked it to estimate the date and it absolutely nailed it by looking at the place names, the dates the lines were created and cut, the style of the fonts, and so on!
  2. I gave it a picture of my brother and his wife sitting in front of a waterfall in New Zealand and it looked at the foliage, lighting, water colour and so on to narrow it down and actually got the exact place!
  3. I gave it a picture of my confusing 3-phase electric meter and asked for the reading, and it managed to ignore all the distractions and read the exact value!

I think GeoGuessr will have to start working on their anti-cheat as it's likely better than 99% of the population!!!

Merry Christmas all! Have a great day!

Sign up or log in to comment