CUDA out of memory applying to a dataset of texts

#4
by fcivardi - opened

I do:

def look4menicusriss(report):
    response = llm_chain.run(
        system_prompt=system_prompt,
        user_prompt=user_prompt,
        report=report
        )
    #print(response)
    tear = response[0]
    return tear


new_dataset = dataset.map(
    lambda row: {'prediction': look4menicusriss(row['Befund'])}
)

It starts applying the function to reports stored in a dataset, but then, at about 20%, I get an OutOfMemoryError: CUDA out of memory. It didn't happen with the original LLama2. (GPU: V100 32 Gb)

OutOfMemoryError: CUDA out of memory. Tried to allocate 40.00 MiB (GPU 0; 31.75 GiB total capacity; 30.43 GiB already allocated; 33.50 MiB free; 30.79 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

This comment has been hidden
fcivardi changed discussion status to closed
fcivardi changed discussion status to open
LAION LeoLM org

Our models support 2x the context size of the original Llama. Perhaps you're having issues with some samples being too long or much longer than the others? Maybe try filtering these out or otherwise you could also try using a quantized model.

Thank you. Indeed, using a retriever to pass to the LLM only the relevant parts of each document solved the problem.

fcivardi changed discussion status to closed

Sign up or log in to comment