BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
When and Where
Speakers
Description
In this talk we introduce BigCodeBench, a benchmark that challenges LLMs to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained programming tasks. Our evaluation of 60 LLMs shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%.
Please join the event. Everyone is welcome—it is free and you do not need to be affiliated with the university.
About Terry Yue Zhuo
Terry Yue Zhuo is a PhD candidate in Computer Science at Monash University and the CSIRO’s Data61. He holds a Bachelor of Computer Science (Honours) from Monash University. He is additionally an associate member of the Sea AI Lab, a visiting scholar at Singapore Management University, and a research technician at CSIRO’s Data61. His research has been published at venues including EMNLP, ICLR, EACL, and TMLR.