How to design A/B tests for push that you can actually trust
Uneven sample sizes, leaky variants, no statistical significance โ most push A/B tests are theatre. Here is how to set them up properly.
A push A/B test is only useful if it would actually change what you do next. Most of the tests we audit fail this bar โ either the sample is too small to be conclusive, or the variants aren't comparable, or the success metric is the wrong one.
Three rules we apply to every test we set up:
1. Pick a meaningful primary metric. Open rate is almost never it. Tap-through-to-conversion is usually the one that matters. Open rate without downstream conversion is a vanity number.
2. Pre-compute the required sample size. If your daily push volume is 5,000 and you want to detect a 10% lift in tap-through with 95% confidence, you need a specific minimum sample. We calculate this during test design โ saves running a test for two weeks only to find out it's underpowered.
3. Lock variant allocation at send time. Decide who gets A vs B *before* you press send, not by hashing user IDs at delivery time. Hashed allocation is fine for randomness, but the variant assignment should be persisted on the user record so the same person sees the same variant across multiple test pushes.
Bonus: never run more than one test at a time on the same segment. We've seen teams run 3 concurrent tests on their active users and wonder why the results don't add up. They literally cannot โ the variants interact.
All Pro and Enterprise plans include test framework setup with these rules baked in.