LeoLM/leo-hessianai-13b-chat · CUDA out of memory applying to a dataset of texts

Oct 5, 2023

•

edited Oct 5, 2023

I do:

def look4menicusriss(report):
    response = llm_chain.run(
        system_prompt=system_prompt,
        user_prompt=user_prompt,
        report=report
        )
    #print(response)
    tear = response[0]
    return tear


new_dataset = dataset.map(
    lambda row: {'prediction': look4menicusriss(row['Befund'])}
)

It starts applying the function to reports stored in a dataset, but then, at about 20%, I get an OutOfMemoryError: CUDA out of memory. It didn't happen with the original LLama2. (GPU: V100 32 Gb)

OutOfMemoryError: CUDA out of memory. Tried to allocate 40.00 MiB (GPU 0; 31.75 GiB total capacity; 30.43 GiB already allocated; 33.50 MiB free; 30.79 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

fcivardi

Oct 5, 2023

This comment has been hidden

fcivardi changed discussion status to closed Oct 5, 2023

fcivardi changed discussion status to open Oct 5, 2023

bjoernp

LAION LeoLM org Oct 5, 2023

Our models support 2x the context size of the original Llama. Perhaps you're having issues with some samples being too long or much longer than the others? Maybe try filtering these out or otherwise you could also try using a quantized model.

fcivardi

Oct 5, 2023

Thank you. Indeed, using a retriever to pass to the LLM only the relevant parts of each document solved the problem.

fcivardi changed discussion status to closed Oct 5, 2023