Why Warmup Vendor Scores Don't Match Real Inbox Placement (2026 Data)

When the vendor selling you the warmup service is also the party scoring it, you don't have a measurement — you have a marketing number. That isn't an accusation. It's a structural observation, and the 2026 data makes it impossible to ignore.

The 2026 gap, in one sentence

Across fifty anonymised audits we ran between January and April 2026, vendor-reported scores averaged 92% inbox. Independent placement tests on the same domains, same campaigns, same week, averaged 47% inbox. That isn't noise.

What a vendor score actually measures

A warmup vendor score is a function of two inputs: emails delivered into other accounts inside the same pool, and engagement actions performed by accounts inside the same pool. Both inputs are produced by the vendor's own infrastructure. Both can be tuned, by the vendor, to produce any number the dashboard needs to display.

Pool-internal delivery is not Gmail's decision about your sender reputation. It's Gmail's decision about a few hundred specific recipient addresses inside the pool.
Pool-internal engagement is not human engagement. Synthetic replies, mark-as-important and pixel-fires were already discounted by major providers in 2024 and are essentially worthless in 2026.
The vendor controls both the test and the grade. There is no third party to verify.

What an independent placement test measures

An independent test sends the actual campaign — same template, same auth, same time of day — to seed addresses that are not part of any warmup network, that have organic histories, and that are read by real IMAP polling. The result is the provider's real-time decision on the real content.

It's the same decision the provider makes on a real prospect. That's the only number that maps to revenue.

Inside the fifty 2026 cases

We can't name the senders. We can describe the shape of the data.

Vendor score distribution: 88-98%, mean 92%, median 93%.
Independent placement distribution: 19-71%, mean 47%, median 44%.
Per-provider breakdown of the gap (mean): Gmail vendor 91% / real 38%; Outlook vendor 93% / real 56%; Yahoo vendor 94% / real 41%.
Auth quality of the underperformers: 41 of 50 had clean SPF/DKIM/DMARC alignment. The auth was not the problem.
Time on warmup: mean 11.4 weeks. Long pool tenure didn't correlate with smaller gap.

Why 2026 widened the gap

Google and Yahoo's 2024 Bulk Sender Rules began discounting low-quality engagement signals at scale. The 2025 quiet updates went further. By 2026 the cost of synthetic engagement isn't neutral — it's slightly negative for unestablished senders. Pool engagement that boosts a vendor dashboard can depress a real-prospect placement at the same time.

That's how a 92% vendor score and a 47% real placement coexist on the same domain in the same week. The vendor isn't lying. The vendor is measuring a different thing.

How to close the gap (without firing anyone)

Run an independent placement test before, during and after every campaign. Same template, same time of day.
Track only the independent number. Treat the vendor score as one input, not the truth.
If the independent number is below 70% on Gmail, fix content and authentication first — those are the largest 2026 levers.
If the gap between vendor and independent stays above 30 points for four consecutive weeks, you are paying for a metric, not an outcome.

Free, outside every warmup pool

Inbox Check sends your real campaign to seed accounts that are not part of any warmup network. The number you see is the number Gmail, Outlook, Yahoo, Mail.ru, Yandex, GMX and T-Online actually decided.

FAQ

Are you saying warmup is fraud?

No. Vendor-reported scores often don't reflect real-world placement because they measure a different thing — pool-internal delivery, not provider classification on real content. That's a measurement design issue, not a fraud claim.

Is the 2026 data published?

The 50 cases are anonymised audit files we ran for clients between January and April 2026. We can't share named domains; we publish the aggregate distribution and per-provider means here.

What if my vendor and independent numbers agree?

Then your warmup is, at minimum, not lying to you. Keep measuring both. Agreement is the goal — drift is the warning sign.

Does this apply to all warmup tools?

It applies to any tool that grades its own output without a third-party reference. The structural conflict of interest is the same regardless of brand.

Why warmup vendor scores don't match real inbox placement in 2026