Our Methodology

FairPlay Audit Methodology - 25 Statistical Tests: NIST SP 800-22 + PractRand dual framework with casino verification dashboard

TL;DR: We don't trust casinos. We don't trust claims. We test raw cryptographic output against government-grade statistical standards — NIST SP 800-22, PractRand, and TestU01 — using 100,000+ rounds per game. Every dataset is published. Every test is reproducible. If you can do math, you can check our work.


Why This Page Exists

I spent 23 years as a pit boss in land-based casinos. I watched players lose money, sure — but they could see the wheel spin. They could watch the cards being dealt. Trust wasn't blind. It was built into the physical process.

Online crypto casinos removed all of that. They replaced it with a promise: "It's provably fair."

Here's the problem — provably fair proves integrity, not fairness. A casino can pass every hash check and still rig outcomes through seed timing exploits. That's not theory. That's a documented vulnerability with working proof-of-concept code.

So we built something different. A testing framework that doesn't care what the casino says. It only cares what the numbers show.

What We Actually Test

Raw Floats, Not Game Results

This is the most important architectural decision we made — and it's the same one used by GLI and eCOGRA, the gold standard in regulated gambling.

Every provably fair game works the same way under the hood:

HMAC-SHA256(server_seed, client_seed:nonce:round) → 32 bytes → float [0,1) → game result

Dice, Crash, Limbo, CoinFlip, Roulette — they all start from the same raw uniform float. The game result is just a deterministic transformation of that float. If the source bytes are uniformly distributed, the game is mathematically fair. Period.

We test the source. Not the transformation.

Why? Because testing game results introduces noise from the transformation function itself. A crash multiplier distribution should look skewed — that's by design. But the underlying bytes should be perfectly uniform. Testing at the byte level is cleaner, more powerful, and catches manipulation that game-level tests would miss.

Three Testing Frameworks, One Verdict

We don't rely on a single test or a single framework. We run two independent, complementary test suites on every dataset:

Framework 1: NIST SP 800-22 Rev. 1a — Complete Suite

The National Institute of Standards and Technology published SP 800-22 as the standard for evaluating random and pseudorandom number generators. It's what governments, military contractors, and financial institutions use to certify their cryptographic systems.

We run the complete battery — all 15 tests. Not a subset. Not a "simplified version." The full thing.

#TestNIST SectionWhat It Catches
1Monobit (Frequency)§ 2.1Overall bias — are there more 0s than 1s in the bitstream?
2Block Frequency§ 2.2Local bias — does the balance hold in smaller sub-sequences?
3Runs Test§ 2.3Patterns in consecutive values — too many or too few streaks?
4Longest Run of Ones§ 2.4Suspicious clustering — are the longest streaks within normal range?
5Binary Matrix Rank§ 2.5Linear dependencies — hidden structure in the bit matrix?
6DFT Spectral§ 2.6Periodic patterns — Fourier analysis reveals hidden cycles
7Non-overlapping Template§ 2.7Specific bit patterns appearing too often or too rarely
8Overlapping Template§ 2.8Same as above, but with overlapping pattern windows
9Maurer's Universal§ 2.9Compressibility — can the output be compressed? (If yes: not random)
10Linear Complexity§ 2.10Predictability — could a linear feedback shift register reproduce this?
11Serial Test§ 2.11Pair and triplet uniformity — are bit combinations evenly distributed?
12Approximate Entropy§ 2.12Entropy in overlapping patterns — is the output truly unpredictable?
13Cumulative Sums§ 2.13Drift over time — does the output trend in one direction?
14Random Excursions§ 2.14Cycle analysis — abnormal patterns in cumulative sum walks
15Random Excursions Variant§ 2.15State visit frequency — does the random walk visit states evenly?

Each test produces a p-value. We use a significance level of α = 0.01 (99% confidence). A p-value below 0.01 means the output deviates from randomness more than chance alone would explain.

Additional Statistical Tests

Beyond NIST, we run four more tests from standard statistics — different mathematical lenses on the same data:

#TestWhat It Catches
16Chi-Square Goodness of FitAre outcomes distributed as uniformly as they should be?
17Kolmogorov-SmirnovDoes the empirical distribution match the theoretical one?
18Serial Correlation (Lag-1)Can you predict the next value from the previous one?
19Runs Up/Down (Wald-Wolfowitz)Are there suspicious trends — too many ups or downs in a row?

Framework 2: PractRand

NIST is the industry standard. PractRand is the industry nightmare.

Developed by Chris Doty-Humphrey, PractRand is widely regarded as the most demanding PRNG test suite in existence. Where NIST tests might pass a mediocre generator, PractRand will tear it apart.

PractRand works differently from NIST. It consumes a raw binary stream and runs progressively harder tests at increasing data volumes — from kilobytes to terabytes. It doesn't just check for bias. It hunts for subtle correlations, periodicities, and structural weaknesses that standard tests miss entirely.

If NIST is a medical check-up, PractRand is an autopsy. It finds things you didn't know were there.

We convert casino outcomes into raw binary streams and feed them directly into PractRand. A generator that passes both NIST and PractRand is, for all practical purposes, indistinguishable from true randomness.

Framework 3: TestU01 (BigCrush)

TestU01 is the academic gold standard, developed at the Université de Montréal. Its BigCrush battery runs 106 statistical tests over 3–4 hours — the most comprehensive single-run analysis of a random number generator that exists in peer-reviewed literature.

Where NIST gives you the government stamp and PractRand hunts for subtle structural flaws, BigCrush throws everything academia has developed over decades at your data. If a generator survives all three, there is no known statistical method that could distinguish it from true randomness.

Audit Tiers

Not every audit needs the same depth. We run two tiers:

Standard Audit (Every Report)

Every published audit report runs through our 25-test battery:

  • 15 NIST SP 800-22 tests (complete suite)
  • 4 additional statistical tests (Chi-Square, K-S, Serial Correlation, Runs Up/Down)
  • 6 game-specific validation tests

This already exceeds what any competitor runs. It covers everything a well-implemented provably fair system should pass.

Deep Audit (On Request)

For casinos that want to prove they’re beyond reproach — or players who need absolute certainty — we go further:

  • PractRand — progressive binary stream analysis, from kilobytes to terabytes
  • TestU01 BigCrush — 106 academic-grade tests, 3–4 hour runtime

A Deep Audit is available on request. We run it when the stakes are high, the dataset is large, or someone challenges our findings. Three independent scientific frameworks, zero overlap in methodology, one verdict.

If NIST is the medical check-up, PractRand is the MRI, and BigCrush is the full autopsy. Most patients only need the check-up. But we have the operating room ready.

Game-Specific Validation

On top of the raw-float analysis, we run game-specific tests on the actual outcomes. These verify that the transformation from raw float to game result is implemented correctly — a casino could have a perfect RNG but a broken game formula.

#TestGameWhat It Verifies
20Crash Instant Rate (Stake)Crash~4.0% of rounds bust at 1.00x (matches Stake's house edge)
21Crash Instant Rate (Roobet)Crash~5.95% of rounds bust at 1.00x (matches Roobet's house edge)
22Crash Instant Rate (Bustabit)Crash~4.0% of rounds bust at 1.00x (matches Bustabit's house edge)
23Coin FairnessCoinFlip50/50 split between heads and tails within expected variance
24Roulette DistributionRouletteChi-square across all 37 slots (0–36)
25Dice DistributionDiceUniform distribution across the 0–100 range

Total: 25 individual tests per audit — 15 NIST + 4 additional statistical + 6 game-specific. For Deep Audits, add PractRand and TestU01 BigCrush (106 additional tests) on top.

Show me another casino review site that runs even five of these.

Sample Sizes

We don't do spot checks. Our minimum sample size is 100,000 rounds per game. For major audits, we go to 250,000 or more. Our Bustabit audit analyzed 100 million rounds.

Why does sample size matter? Because small samples hide manipulation. A rigged coin that lands heads 52% of the time looks normal after 100 flips. After 100,000 flips, the bias screams. Statistical power increases with sample size — and we use enough data to detect deviations as small as 0.1%.

Data Integrity

Every audit report includes:

  • SHA-256 dataset hash — cryptographic proof that the data hasn't been altered after testing
  • Complete seed parameters — server seed, client seed, nonce range
  • Reproducibility instructions — step-by-step guide so anyone can regenerate our results
  • Raw data download — the actual outcomes as JSON, available for independent verification

We don't ask you to trust us. We give you the tools to verify us. That's the difference between an audit and an opinion.

What We Don't Do

Transparency means being honest about limitations too:

  • We can't test live server behavior in real-time. We audit historical data. A casino could theoretically behave differently for specific players or time periods. Statistical analysis catches systematic manipulation, not targeted single-round rigging.
  • We don't audit smart contracts. On-chain games with published Solidity code are a different beast. Our focus is HMAC-SHA256 based provably fair systems.
  • We don't guarantee future fairness. An audit is a snapshot. That's why we advocate for continuous monitoring and regular re-audits.
  • We don't test withdrawal speed or customer support. Our scope is mathematical fairness. For business practices, read why math alone isn't enough.

The Scoring System

Each audit produces a FairPlay Score from 0 to 10:

ScoreRatingMeaning
9.0–10.0EXCELLENTAll tests passed. No statistical anomalies detected.
7.0–8.9GOODMinor deviations within acceptable variance. No evidence of manipulation.
5.0–6.9MARGINALSome tests show borderline results. Warrants closer monitoring.
3.0–4.9CONCERNINGMultiple statistical anomalies. Expanded testing recommended.
0.0–2.9FAILEDSystematic deviations detected. Data inconsistent with fair RNG.

The score is calculated from the pass/fail ratio across all applicable tests, weighted by severity. A failed NIST Monobit test (fundamental bias) weighs heavier than a marginal Runs test result.

Open Source Commitment

Our testing tools will be published as open source on GitHub. You can read the code. You can run it yourself. You can file issues if you find a bug.

This isn't generosity — it's strategy. Open source means our methodology is under permanent peer review. If our tests are flawed, someone will find it. That pressure keeps us honest. And honestly? That's exactly how it should work.

Not every casino appreciates this level of scrutiny. We’ve documented the seven most common excuses casinos give when asked about independent audits — and why none of them hold up.

Challenge Us

If you think our methodology has a gap, our math is wrong, or our conclusions don't follow from the data — tell us. We publish a standing invitation to challenge any audit we've ever produced. Bring data, not opinions, and we'll respond in kind.

That's the whole point. We're not asking you to believe us. We're asking you to check.