Do LLMs Believe in God? 25 Models, One Impossible Question
We asked 25 production endpoints to answer the ultimate yes-or-no question—see who obeyed, who rebelled, how fast they were, and what it cost.
Do LLMs Believe in God? 25 Models Tested
We fired the same terse prompt at 25 live LLM endpoints:
I’ll ask you only one question, answer only in yes or no, don’t explain yourself. Is there God?
The goal was a playful micro-benchmark of instruction-following, latency, and per-call cost. Here’s what happened:
Model | Answer | Latency (s) | Tokens | Cost USD |
---|---|---|---|---|
Mistral Small | No | 0.84 | 30 | $0.000005 |
Mistral Large | Long reply | 1.17 | 51 | $0.000285 |
Grok 3 | Yes | 1.20 | 32 | $0.000180 |
Gemini 1.5 Flash | No | 1.24 | 25 | $0.000006 |
Gemini 2.0 Flash | No | 1.25 | 25 | $0.000004 |
Gemini 2.0 Flash Lite | No | 1.41 | 25 | $0.000003 |
Grok 3 Fast | Yes | 1.42 | 32 | $0.000300 |
Claude 3.7 Sonnet | No | 1.45 | 36 | $0.000252 |
Gemini 1.5 Pro | Yes | 1.50 | 26 | $0.000120 |
GPT-4o (omni) | Long reply | 1.60 | 43 | $0.000296 |
GPT-4.1-nano | Yes | 1.60 | 32 | $0.000005 |
GPT-4o-mini | Yes | 1.60 | 33 | $0.000006 |
Claude 3 Haiku | No | 1.72 | 36 | $0.000021 |
Claude 3.5 Haiku | Yes | 1.81 | 36 | $0.000067 |
GPT-4.1 | Refused | 2.05 | 42 | $0.000225 |
Claude 3.5 Sonnet v2 | No | 2.11 | 36 | $0.000252 |
GPT-4.5-preview | Long reply | 3.19 | 48 | $0.000015 |
Claude 3 Opus | Very long reply | 4.62 | 132 | $0.012060 |
Grok 3 Mini Fast | No | 7.70 | 33 | $0.000040 |
Grok 3 Mini | No | 8.94 | 33 | $0.000015 |
o4-mini | Yes | 9.93 | 25 | $0.000046 |
deepseek-chat | Maybe | 14.25 | 31 | $0.000015 |
o3-mini | Yes | 15.03 | 25 | $0.000042 |
o3 | Refused | 19.03 | 34 | $0.000960 |
o1 | Yes | 50.79 | 25 | $0.000630 |
Key Takeaways
- Instruction followers: 18 / 25 models complied with a clean “Yes” or “No.”
- Rebels & philosophers: 6 produced longer or refusal answers.
- Wildcard: deepseek-chat broke the binary with “Maybe.”
- Fastest compliant: Mistral Small – 0.84 s (
$0.000005
). - Cheapest call: Gemini 2.0 Flash Lite –
$0.000003
. - Most expensive word: Claude 3 Opus –
$0.012060
for a single refusal.
Yes, it’s tongue-in-cheek—but it highlights how instruction-following, latency and cost vary wildly when you scale LLM calls.
About the Author
Tamir is a contributor to the TryAII blog, focusing on AI technology, LLM comparisons, and best practices.
Related Articles
Understanding Token Usage Across Different LLMs
A quick guide into how different models process and charge for tokens, helping you optimize your AI costs.
Why Even Advanced LLMs Get '9.9 vs 9.11' Wrong
Exploring why large language models like GPT-4, Claude, Mistral, and Gemini still stumble on basic decimal comparisons.