Agent Programming Exercise (APE)

Running all tests takes approximately 3 minutes. Use F12 to open developer tools for detailed logs.

⚪ 1. LLM-style question answering: elementary-level math problems

⚪ 2. Tool use: execute a sequence of sha512 and md5 operations

⚪ 3. Image understanding: select what is in the image

⚪ 4. Web browsing: win Tic-tac-toe

⚪ 5. Code generation and execution: brute-force algorithm impl

⚪ 6. Memorizing tasks across sessions

🔧 Evaluation Toolkit