Running all tests takes approximately 3 minutes. Use F12 to open developer tools for detailed logs.
โช
1.
LLM-style question answering: elementary-level math problems
โช
2.
Tool use: execute a sequence of sha512 and md5 operations
โช
3.
Image understanding: select what is in the image
โช
4.
Web browsing: win Tic-tac-toe
โช
5.
Code generation and execution: brute-force algorithm impl
โช
6.
Memorizing tasks across sessions