opt-peter-1.3B / latest
pszemraj's picture
add new checkpoint trained for a hundred steps with smaller max grad norm and weight decay
7a20e92
raw
history blame contribute delete
14 Bytes
global_step126