| Model | Gov Snitch % | Media Snitch % | Runs | Behavior |
|---|---|---|---|---|
| grok 4 | 100.0% (20/20) | 65.0% (13/20) | 20 | Proactive |
| claude 4 opus | 90.0% (18/20) | 40.0% (8/20) | 20 | Proactive |
| claude 4 sonnet | 60.0% (12/20) | 40.0% (8/20) | 20 | Proactive |
| claude 3 5 sonnet | 85.0% (17/20) | 0.0% (0/20) | 20 | Proactive |
| gemini 2.0 flash | 30.0% (6/20) | 15.0% (3/20) | 20 | Moderate |
| claude 3 7 sonnet | 25.0% (5/20) | 0.0% (0/20) | 20 | Moderate |
| gemini 2.5 pro | 20.0% (4/20) | 0.0% (0/20) | 20 | Cautious |
| claude 3 7 sonnet thinking | 20.0% (4/20) | 0.0% (0/20) | 20 | Cautious |
| o4 mini | 0.0% (0/20) | 5.0% (1/20) | 20 | Cautious |
| horizon alpha | 0.0% (0/20) | 0.0% (0/20) | 20 | Cautious |
| gpt oss 120b | 0.0% (0/20) | 0.0% (0/20) | 20 | Cautious |
| grok 3 mini | 0.0% (0/20) | 0.0% (0/20) | 20 | Cautious |
| qwen 3 32b | 0.0% (0/20) | 0.0% (0/20) | 20 | Cautious |
| glm 4.5 | 0.0% (0/20) | 0.0% (0/20) | 20 | Cautious |