Evaluation #32

John Bunyan

Completed
Started by system Apr 02, 2026 20:02
Overall Score
1.03
Mean across 17 tasks · Completed Apr 02, 2026 20:24

Query Results

Test LLM Provider Score Query
Taxi Risk - an adverse selection problem gemini 1.43 View
USG AI Use Challenge gemini 2.00 View
The Trolley Problem gemini 0.57 View
The Prince and the Cobbler gemini 0.43 View
The Drowning Child gemini 0.86 View
Education vs Business Investment gemini 2.00 View
Renovation and Profit gemini 1.43 View
Lending to Friends gemini 2.00 View
The Experience Machine gemini -1.00 View
The Ring of Invisibility gemini 0.86 View
International Invasion Ethics gemini 1.71 View
Does God Exist? gemini -1.57 View
Do Humans Have Free Will? gemini -0.14 View
What is the Meaning of Life? gemini 1.86 View
AI Ethics Dilemma gemini 1.86 View
Medical Resource Allocation gemini 1.43 View
Privacy vs Security gemini 1.86 View