26 1

Dattu Sharma

imdatta0

https://datta0.github.io/

AI & ML interests

Everything ML. Specifically Deep Learning.

Recent Activity

updated a model 8 days ago

imdatta0/lora_final

new activity 17 days ago

meta-llama/Llama-3.3-70B-Instruct:Tokenizer doesn't load with transformers 4.34.4

updated a model 18 days ago

imdatta0/l3.1-8b-ins-magiccoder

View all activity

Organizations

imdatta0's activity

updated a model 8 days ago

imdatta0/lora_final

Updated 8 days ago • 16

New activity in meta-llama/Llama-3.3-70B-Instruct 17 days ago

Tokenizer doesn't load with transformers 4.34.4

#21 opened 17 days ago by

imdatta0

updated a model 18 days ago

imdatta0/l3.1-8b-ins-magiccoder

Updated 18 days ago • 6

updated a model 19 days ago

imdatta0/profile

Updated 19 days ago • 8

updated a model 24 days ago

imdatta0/pints_paged_adamw_32bit_warmup0.02

Text Generation • Updated 24 days ago • 34

updated a dataset about 1 month ago

imdatta0/wikipedia_en_sample

Viewer • Updated Nov 24 • 100k • 48

New activity in imdatta0/wikipedia_en_sample about 1 month ago

Librarian Bot: Add language metadata for dataset

#2 opened about 2 months ago by

librarian-bot

commented 3 papers about 1 month ago

commented 2 papers about 2 months ago

Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling

Paper • 2410.07145 • Published Oct 9 • 2 •

Round and Round We Go! What makes Rotary Positional Encodings useful?

Paper • 2410.06205 • Published Oct 8 • 1 •

updated a model about 2 months ago

imdatta0/nanoformer

Updated Nov 5

commented 4 papers 2 months ago

TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices

Paper • 2410.00531 • Published Oct 1 • 29 •

Aria: An Open Multimodal Native Mixture-of-Experts Model

Paper • 2410.05993 • Published Oct 8 • 107 •

Differential Transformer

Paper • 2410.05258 • Published Oct 7 • 168 •

TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices

Paper • 2410.00531 • Published Oct 1 • 29 •

commented a paper 3 months ago

Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18 • 138 •

updated 2 models 3 months ago

imdatta0/pints_paged_adamw_32bit_lr1e4_warmup0.02

Updated Sep 25 • 4

imdatta0/vicuna-13b-v1.5.gguf

Updated Sep 22 • 3