If someone uses ChatGPT5Thinking to query EVERYTHING before they do anything, are they safer to be around?

Here’s a clean Bayesian reframing that separates selection from direct effects and makes the base-rate logic explicit.

Bayesian setup

Variables

J \in \{\text{E}, \text{A}\}: latent judgment quality (Exceptional, Average).
U \in \{0,1\}: uses AI extensively for routine decisions.
S \in \{0,1\}: “safe to be around” outcome.

Priors and propensities

\pi \equiv P(J=\text{E}) is small; P(J=\text{A})=1-\pi.
\alpha_j \equiv P(U=1\mid J=j). Empirically you’re positing \alpha_{\text{E}} < \alpha_{\text{A}} for routine use.

Baseline safety and direct effect

s_{j0} \equiv P(S=1\mid U=0, J=j) baseline safety.
r_j \equiv \dfrac{P(S=1\mid U=1, J=j)}{P(S=1\mid U=0, J=j)} risk ratio from AI use (direct effect at fixed J). Typically r_j \ge 1, with diminishing returns r_{\text{E}} \le r_{\text{A}}.

Posterior over judgment given behavior (selection)

P(J=\text{E}\mid U=1)=\frac{\pi\,\alpha_{\text{E}}}{\pi\,\alpha_{\text{E}}+(1-\pi)\,\alpha_{\text{A}}}, \quad P(J=\text{E}\mid U=0)=\frac{\pi\,(1-\alpha_{\text{E}})}{\pi\,(1-\alpha_{\text{E}})+(1-\pi)\,(1-\alpha_{\text{A}})}.

Equivalently, the Bayes factor of observing AI use for “exceptional vs average” is

\text{BF}U=\frac{P(U=1\mid \text{E})}{P(U=1\mid \text{A})}=\frac{\alpha{\text{E}}}{\alpha_{\text{A}}}\ (<1 \text{ under your assumption}).

Decomposition: direct vs selection

Total safety difference when you observe AI use vs non-use:

\begin{aligned} \Delta &\equiv P(S=1\mid U=1)-P(S=1\mid U=0) \\ &=\underbrace{\sum_{j} \Big(P(S\!=\!1\mid U\!=\!1,j)-P(S\!=\!1\mid U\!=\!0,j)\Big)\,P(j\mid U\!=\!1)}{\textbf{Direct effect at fixed } J} \\ &\qquad +\ \underbrace{\sum{j} P(S\!=\!1\mid U\!=\!0,j)\,\Big(P(j\mid U\!=\!1)-P(j\mid U\!=\!0)\Big)}{\textbf{Selection effect via }P(J\mid U)}. \end{aligned}

Using r_j and s{j0}:

P(S=1\mid U=1)=\sum_j r_j\,s_{j0}\,P(j\mid U=1),\qquad P(S=1\mid U=0)=\sum_j s_{j0}\,P(j\mid U=0).

Direct effect term is positive if r_j>1.
Selection effect term is negative if AI users are less likely to be exceptional (\alpha_{\text{E}}<\alpha_{\text{A}}) and s_{\text{E}0}>s_{\text{A}0}.

Decision rules you actually care about

Should we update positively on someone who uses AI a lot?
Yes iff
\sum_j s_{j0}\big(r_j\,P(j\mid U=1)-P(j\mid U=0)\big) \;>\; 0.
Intuition: the direct boost r_j must outweigh the composition shift toward average users that the behavior signals.
“Average + AI” vs “Unknown without AI”?
Prefer “Average + AI” when
r_{\text{A}}\,s_{\text{A}0} \;>\; \sum_j s_{j0}\,P(j\mid U=0).
If you ignore the selection in the “unknown without AI” pool and use pure priors as a shortcut:
r_{\text{A}}\,s_{\text{A}0} \;>\; \pi\,s_{\text{E}0} + (1-\pi)\,s_{\text{A}0} \quad\Longleftrightarrow\quad r_{\text{A}} \;>\; 1 + \pi\,\frac{s_{\text{E}0}-s_{\text{A}0}}{s_{\text{A}0}}.
With exceptional people rare (\pi small), this threshold is typically modest.
When is “exceptional without AI” still best?
Usually when s_{\text{E}0} already dominates and r_{\text{E}} offers little marginal gain. Formally, “E without AI” beats “A with AI” if
s_{\text{E}0} \;>\; r_{\text{A}}\,s_{\text{A}0}.

Minimal numeric toy example

Let \pi=0.10, \alpha_{\text{E}}=0.2, \alpha_{\text{A}}=0.6, s_{\text{E}0}=0.95, s_{\text{A}0}=0.80, r_{\text{E}}=1.02, r_{\text{A}}=1.10.

Selection: P(\text{E}\mid U=1)=\frac{0.1\cdot0.2}{0.1\cdot0.2+0.9\cdot0.6}\approx3.6\%
P(\text{E}\mid U=0)\approx18.2\%.
Safety:
P(S\mid U=1)\approx 0.036\cdot0.969 + 0.964\cdot0.88 \approx 0.883,
P(S\mid U=0)\approx 0.182\cdot0.95 + 0.818\cdot0.80 \approx 0.827.
So despite the negative selection signal, AI users are safer on average because the direct effect is large enough. Also,
“Average + AI” yields r_{\text{A}}s_{\text{A}0}=0.88 which beats the “unknown without AI” pool at 0.827, while “Exceptional without AI” remains highest at 0.95.

Takeaways in one line each

Observing AI use is evidence against “exceptional,” but can still increase expected safety if r_{\text{A}} is decent.
The rarer true excellence is, the more “average + AI” dominates comparisons to “unknown without AI.”
Exceptional without AI remains the gold standard unless average-user augmentation is very strong.

Related questions