@Jaward on Hugging Face: "Triton nanoGPT now has a custom cross entropy loss kernel 🚀 Next: matmul…"

Post

1743

Triton nanoGPT now has a custom cross entropy loss kernel 🚀
Next: matmul, gradually overthrowing all major PyTorch ops:)

Simplified pseudo for parallel cross-entropy loss compute:
- init program: get pid, compute offsets, load targets.
- init row_max and row_sum.
- for-loop1 (find max logits): update row_max with max logits.
- for-loop2 (compute softmax and loss): compute row_sum, update loss.
- add log(row_sum) and store loss.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/triton_nanoGPT.ipynb

Join the conversation