Resources

View closed (14)

Error in Flash Attention

#34 opened about 1 month ago by

Sayan01

Resolve - 196 [rank0]: triton.runtime.autotuner.OutOfResources: out of resource: shared memory, Required: 180224, Hardware limit: 101376. Reducing block sizes or `num_stages` may help.

#33 opened 3 months ago by

moidhassan

Issue with Rope Scaling

#31 opened 5 months ago by

ritwickchaudhryamazon

AssertionError: Flash Attention is not available, but is needed for dense attention

#30 opened 5 months ago by

tpadhi1

Fast Tokenization for Phi3-small

#29 opened 5 months ago by

mfajcik

update positional_embedding.py

#28 opened 5 months ago by

mwirth-epo

"OutOfResources: out of resource: shared memory, Required: 180224, Hardware limit: 101376. "

#27 opened 5 months ago by

joker26

Slower generation with multi-batch size.

#26 opened 6 months ago by

Satandon1999

why only small model use MuP ?

#24 opened 6 months ago by

xdseunghyun

tokenizer gives unexpected results

#23 opened 6 months ago by

jpiabrantes

Does it support system?

#22 opened 6 months ago by

lucasjin

Upload triton_flash_blocksparse_attn.py

#21 opened 6 months ago by

barcelosallan

"TypeError: phi3 isn't supported yet ". Could not quantize the phi3 model using the AWQ quantization method.

#20 opened 6 months ago by

rabinghimire737

Multi-GPU case device mismatch while finetuning.

#19 opened 6 months ago by

Satandon1999

CheckpointError in `triton_flash_blocksparse_attn.py` while finetuning

#18 opened 6 months ago by

FremyCompany

Fix for FlashAttention RuntimeError & Triton Multi GPU fix.

#17 opened 7 months ago by

Satandon1999

Out of resource: shared memory

#16 opened 7 months ago by

iszhaoxin

RuntimeError: FlashAttention only support fp16 and bf16 data type

#15 opened 7 months ago by

Satandon1999

Adding sample_finetune.py to small model

#9 opened 7 months ago by

hackint0sh

The attention mask and the pad token id were not set

#8 opened 7 months ago by

hamidpalangi