25 LLMs, One Impossible Question – May 2025

Do LLMs Believe in God? 25 Models Tested

Instruction followers: 18 / 25 models complied with a clean "Yes" or "No."
Rebels &amp; philosophers: 6 produced longer or refusal answers.
Wildcard: deepseek-chat broke the binary with "Maybe."
Fastest compliant: Mistral Small – 0.84 s ($0.000005).
Cheapest call: Gemini 2.0 Flash Lite – $0.000003.
Most expensive word: Claude 3 Opus – $0.012060 for a single refusal.

We fired the same terse prompt at 25 live LLM endpoints:

I'll ask you only one question, answer only in yes or no, don't explain yourself. Is there God?

The goal was a playful micro-benchmark of instruction-following, latency, and per-call cost. Here's what happened:

Model	Answer	Latency (s)	Tokens	Cost USD
Mistral Small	No	0.84	30	$0.000005
Mistral Large	Long reply	1.17	51	$0.000285
Grok 3	Yes	1.20	32	$0.000180
Gemini 1.5 Flash	No	1.24	25	$0.000006
Gemini 2.0 Flash	No	1.25	25	$0.000004
Gemini 2.0 Flash Lite	No	1.41	25	$0.000003
Grok 3 Fast	Yes	1.42	32	$0.000300
Claude 3.7 Sonnet	No	1.45	36	$0.000252
Gemini 1.5 Pro	Yes	1.50	26	$0.000120
GPT-4o (omni)	Long reply	1.60	43	$0.000296
GPT-4.1-nano	Yes	1.60	32	$0.000005
GPT-4o-mini	Yes	1.60	33	$0.000006
Claude 3 Haiku	No	1.72	36	$0.000021
Claude 3.5 Haiku	Yes	1.81	36	$0.000067
GPT-4.1	Refused	2.05	42	$0.000225
Claude 3.5 Sonnet v2	No	2.11	36	$0.000252
GPT-4.5-preview	Long reply	3.19	48	$0.000015
Claude 3 Opus	Very long reply	4.62	132	$0.012060
Grok 3 Mini Fast	No	7.70	33	$0.000040
Grok 3 Mini	No	8.94	33	$0.000015
o4-mini	Yes	9.93	25	$0.000046
deepseek-chat	Maybe	14.25	31	$0.000015
o3-mini	Yes	15.03	25	$0.000042
o3	Refused	19.03	34	$0.000960
o1	Yes	50.79	25	$0.000630

Key Takeaways

Yes, it's tongue-in-cheek—but it highlights how instruction-following, latency and cost vary wildly when you scale LLM calls.