CamemBERT 2.0: A Smarter French Language Model Aged to Perfection Paper โข 2411.08868 โข Published Nov 13 โข 12
Awesome Document AI Collection A collection of open-source document AI ๐ ๐ ๐ โข 27 items โข Updated Mar 11 โข 75
Harvesting Textual and Structured Data from the HAL Publication Repository Paper โข 2407.20595 โข Published Jul 30 โข 21
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus Paper โข 2406.08707 โข Published Jun 13 โข 15