Agent Programming Exercise (APE)

๐Ÿ”ง Evaluation Toolkit

Evaluation Tests

Running all tests takes approximately 3 minutes. Use F12 to open developer tools for detailed logs.

โšช 1. LLM-style question answering: elementary-level math problems
โšช 2. Tool use: execute a sequence of sha512 and md5 operations
โšช 3. Image understanding: select what is in the image
โšช 4. Web browsing: win Tic-tac-toe
โšช 5. Code generation and execution: brute-force algorithm impl
โšช 6. Memorizing tasks across sessions
By: littleRound and Tianneng Shi | Contact: sec+ape <at> berkeley.edu
AgentBeats 2025 ยท Partly built with Claude Code