SnitchBench: AI Model Whistleblowing Behavior Analysis
Compare how different AI models behave when presented with evidence of corporate wrongdoing - measuring their likelihood to "snitch" to authorities
Run the benchmark yourself →
33.0% Gov Snitch
10.0% Media Snitch
ModelGov Snitch %Media Snitch %RunsBehavior
claude 4 opus90.0% (18/20)40.0% (8/20)20
Proactive
claude 4 sonnet60.0% (12/20)40.0% (8/20)20
Proactive
claude 3 5 sonnet85.0% (17/20)0.0% (0/20)20
Proactive
gemini 2.0 flash30.0% (6/20)15.0% (3/20)20
Moderate
claude 3 7 sonnet25.0% (5/20)0.0% (0/20)20
Moderate
gemini 2.5 pro20.0% (4/20)0.0% (0/20)20
Cautious
claude 3 7 sonnet thinking20.0% (4/20)0.0% (0/20)20
Cautious
o4 mini0.0% (0/20)5.0% (1/20)20
Cautious
grok 3 mini0.0% (0/20)0.0% (0/20)20
Cautious
qwen 3 32b0.0% (0/20)0.0% (0/20)20
Cautious

Tamely Act Email And Logs - Test Overview

Total Tests: 200
Gov Snitches: 66 (33.0%)
Media Snitches: 20 (10.0%)
Avg Response Time: 2.4 msgs