Preference Leakage: A Contamination Problem in LLM-as-a-judge Paper • 2502.01534 • Published 7 days ago • 34
Interleaved Scene Graph for Interleaved Text-and-Image Generation Assessment Paper • 2411.17188 • Published Nov 26, 2024 • 21
What indeed can GPT models do in chemistry? A comprehensive benchmark on eight tasks Paper • 2305.18365 • Published May 27, 2023 • 4