BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

When and Where

Thursday, July 11, 2024 9:00 am to 10:00 am

Speakers

Terry Yue Zhuo, Monash University

Description

In this talk we introduce BigCodeBench, a benchmark that challenges LLMs to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained programming tasks. Our evaluation of 60 LLMs shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%.

Please join the event. Everyone is welcome—it is free and you do not need to be affiliated with the university.

About Terry Yue Zhuo

Terry Yue Zhuo is a PhD candidate in Computer Science at Monash University and the CSIRO’s Data61. He holds a Bachelor of Computer Science (Honours) from Monash University. He is additionally an associate member of the Sea AI Lab, a visiting scholar at Singapore Management University, and a research technician at CSIRO’s Data61. His research has been published at venues including EMNLP, ICLR, EACL, and TMLR.