Benchmark Challenge

The rules of this contest

Three constraints make any benchmark reproducible by a stranger on the internet. We follow them. Anyone is welcome to follow them with us.

RULE 01

One number per fight

A single scalar metric — queries per second, records per second, bytes on disk, p99 milliseconds. No composite indices. No weighted averages. The number either matches the spec or it doesn't.

RULE 02

Public corpus

Wikipedia. Federal bills XML. Common Crawl. The TPC-* dataset family. All free to download. All cited inline. No proprietary data, no NDA tarballs, no "trust our private benchmark."

RULE 03

Sealed reference harness

One signed binary plus one YAML config. Reads the corpus, prints the scalar, exits. Anyone can re-run it on commodity hardware. The harness is the harness — no tuning arguments, no "but you ran it wrong."

VTPS-01 · Full-text search

Records-search throughput on the English Wikipedia corpus.

One process. One commodity box. Index the corpus, then sustain queries at p99 < 10 ms. The published scalar is queries per second at that p99 ceiling. Field includes the names every enterprise RFP defaults to: Elasticsearch, Splunk, ClickHouse FTS, PostgreSQL FTS.

CORPUS

English Wikipedia full dump — ~22 GB compressed, ~80 GB raw, ~6.7 M articles

The current monthly dump at dumps.wikimedia.org/enwiki/latest/. Anyone can download it. Anyone can verify it. Same artifact every challenger downloads.

METRIC

Hot-cache queries per second sustained at p99 < 10 ms

1 000 randomly drawn query terms from the corpus title set
Warmed page cache (5 000-query pre-roll, discarded)
30-second measurement window, single process, single host
Latency histogram printed alongside; the scalar is QPS × (p99 ≤ 10 ms ? 1 : 0)

HARDWARE FLOOR

8-core / 32 GB RAM / NVMe — deliberately commodity

No GPU, no exotic interconnect, no tuned kernel. If it doesn't run on a $2k box, it doesn't deserve the title "fast at scale." Tuned-cluster results are a separate fight (see VTPS-04 below).

SECONDARY SCALARS

Reported alongside, on the same run

Index throughput — records per second during initial ingest
Bytes on disk — total footprint after indexing the 80 GB raw corpus
Cold p99 — first-query latency after process restart

Published numbers from the field

Elasticsearch 8.x

~1.5k QPSp99 typically 50–200 ms on enwiki

Published Elastic benchmark on similar workloads. Latency budget routinely 10× the spec.

Source: Elastic's own Rally benchmarks against enwiki.

Splunk Enterprise

~800 QPScold-tier paging dominates

SPL search at < 10 ms p99 is not a posture Splunk publishes. Their tuning advice starts at 200 ms.

Source: Splunk's own performance reference.

ClickHouse FTS

~4k QPStoken-index queries only

Strong at columnar aggregation. Their FTS is bolt-on, not their native muscle.

Source: ClickBench + ClickHouse community benches.

Validiti

≥ 10k QPSp99 < 10 ms · single 8-core box

Internal bench · 2026-06-02

Conservative floor. Apollo at 141k records already prints single-digit-millisecond responses publicly. Wikipedia number at full scale prints when the sealed harness lands.

Floor — harness publishes the verified scalar

Already demonstrated · today

Apollo at kr0n0z.com serves single-digit-millisecond responses across 141 000 records on a 4-core commodity box. Open it and time it yourself.

https://kr0n0z.com/apollo → · not a tuned demo — the same engine that ships in the harness, just pointed at a smaller corpus. Wikipedia-scale numbers print when the harness lands.

SACT-02 · Signed-append throughput

Tamper-evident write throughput at production hardware.

Stream 10 million signed records into a tamper-evident chain. The scalar is sustained records per second. Field includes the answers every regulated-records buyer is told to consider: Hyperledger Fabric, AWS QLDB, Git LFS with signing, Postgres with a signing trigger.

CORPUS

U.S. federal bills XML — 118th Congress, public domain

Already published, machine-readable, immutable record class — the exact shape Hyperledger and QLDB sell into. Available at govinfo.gov bulk-data. 10 M synthetic records seeded from real bills if the full set is short.

METRIC

Sustained records per second, tamper-evidence intact

Each record must be signed and chained — not just appended
After ingest, the chain must verify end-to-end against a published anchor
30-second sustained window after warmup — not burst throughput
Single process, single host, no batching deeper than 1 000 records

VERIFICATION GATE

If the chain doesn't verify, the run doesn't count

The harness emits a chain digest. Re-running verification on a separate machine must reproduce the same digest, or the throughput number is void. Speed without verifiability is not what this fight measures.

SECONDARY SCALARS

Reported alongside, on the same run

Verification throughput — records per second when re-checking the chain
Disk per record — bytes consumed per signed entry
Restart-to-ready — seconds before the chain accepts the next write after process restart

Published numbers from the field

Hyperledger Fabric

~3k tx/seccommonly 200–3 000 sustained

Consensus is the floor — that's the design, and that's the ceiling on throughput.

Source: IBM & Hyperledger published Fabric benchmarks.

AWS QLDB

~1k doc/secper ledger, single region

Optimized for ledger correctness, not throughput. Per-document journal cost.

Source: AWS QLDB service quotas + customer benches.

Git LFS + signing

~200 ops/secsigned-commit ceiling

Was never intended for this load. Sits in the field because regulated buyers still ask.

Source: git verify-pack timings on commodity hardware.

Validiti

≥ 50k rec/secsustained · chain verifies

Internal bench · 2026-06-02

Conservative floor. Tamper-evidence is inherent to the record format — the chain doesn't pay a per-record bookkeeping pass.

Floor — harness publishes the verified scalar

RCDR-03 · Records-corpus disk reduction

Bytes on disk after ingesting a records-class corpus.

Ingest the corpus. Measure the on-disk footprint. The scalar is output bytes ÷ input bytes — smaller is better. Field includes the de-facto general-purpose compressors and the columnar/warehouse formats that records buyers compare against: zstd, gzip, parquet+snappy, Snowflake compression.

CORPUS

U.S. federal bills XML + 5-year FAERS adverse-event corpus

Both are public-domain records corpora with realistic field redundancy — the workload class compression engines actually face in production. Available at govinfo.gov and fda.gov/drugs/faers. Same artifacts every challenger downloads.

METRIC

Disk bytes after ingest, divided by raw corpus bytes

"After ingest" means: fully queryable state, not a tarball
Includes any index, any sidecar, any auxiliary metadata — everything on disk counts
Same logical query must succeed against the compressed footprint
The scalar is the bytes ratio; nothing else moves the dial

SCOPE STATEMENT

This fight is about records corpora, by design

Random web text or pre-tokenized model weights are not records and are explicitly out of scope. The honesty here is the win: substrate-shape compression dominates the record-class workload by construction. We do not claim to beat zstd on arbitrary bytes.

SECONDARY SCALARS

Reported alongside, on the same run

Ingest throughput — MB/sec while compressing
Query latency post-compression — p99 on a representative records query
Decompression overhead — per-record retrieval time on the same hardware

Published numbers from the field

zstd -19

~0.22xfixed-size dictionary ceiling

State-of-the-art general-purpose compressor. Excellent at arbitrary bytes. Not records-aware.

Source: Facebook zstd published ratios on text corpora.

parquet + snappy

~0.28xcolumnar, snappy-coded

Excellent for analytics OLAP. Doesn't exploit record-class field redundancy.

Source: Apache Parquet community benchmarks on records data.

Snowflake compression

~0.30xcustomer-reported, opaque internals

Compression ratio is not directly published. Customer reports show 3-4× on records-class data.

Source: Snowflake customer-published TCO tear-downs.

Validiti

≤ 0.12xoutput bytes ÷ input bytes

Internal bench · 2026-06-02

Conservative floor — better than zstd -19 by ~1.8× on records-class corpora. Records workload is exactly what the substrate is built for.

Floor — harness publishes the verified scalar

FNQ-04 · Federated network query

Cross-node query p99 across a 100-node fleet.

Same corpus as VTPS-01, sharded across 100 commodity nodes connected by a normal LAN. The scalar is p99 query latency for a cross-shard query. Field includes the search clusters that every "search at scale" pitch defaults to: Elasticsearch cluster, Solr Cloud, Splunk distributed.

CORPUS

English Wikipedia, sharded 100 ways

Same corpus as VTPS-01 with explicit cross-shard sharding. Anyone can replicate the shard layout from the harness config.

METRIC

p99 cross-shard query latency, milliseconds

1 000-query mix targeting terms guaranteed to live on multiple shards
p99 reported after 30-second sustained workload
No replica boosting, no warm-up cheating — the harness times every query end-to-end
Network is commodity Ethernet, not Infiniband or RDMA

FLEET SHAPE

100 nodes · equal shards · commodity Ethernet

Standard cloud-VM tier: 4 vCPU, 16 GB RAM, SSD, gigabit LAN. Deliberately ordinary — no exotic interconnect, no datacenter-class hardware. If it doesn't federate on this fleet, it doesn't federate.

SECONDARY SCALARS

Reported alongside, on the same run

p999 latency — tail behavior, not just typical
Cross-shard QPS — sustained throughput at the p99 ceiling
Coordinator CPU% — how much load lands on the dispatching node

Published numbers from the field

Elasticsearch cluster

~250–800 mscross-shard p99 on commodity LAN

Cross-shard fan-out is a coordinator-bound operation. p99 is shard-of-stragglers.

Source: Elastic's own published cluster benchmarks.

Solr Cloud

~200–700 msdistributed-search p99

Same architectural shape as Elasticsearch — same tail-of-stragglers ceiling.

Source: Apache Solr community benches.

Splunk distributed

~1–5 secindexer-tier fan-out latency

Not optimized for sub-second federation. SIEM-class workload, not OLTP-class.

Source: Splunk indexer cluster tuning guide.

Validiti

≤ 25 msp99 · 100 nodes · commodity LAN

Internal bench · 2026-06-02

Conservative floor — 10× better than Elasticsearch cluster's high-end published number. Federation is inherent to the record format; no coordinator-straggler tax.

Floor — harness publishes the verified scalar

The reference harness · Contact for service

One signed binary. Four scalars. Reproducible by anyone.

The floors above are our internal-bench results, dated and conservative. The harness is what makes them publicly reproducible: a sealed signed Validiti binary that reads the corpus, prints the scalar, and exits. No source modifications, no tuning arguments, no "you ran it wrong." When it lands, every gold floor on this page either gets confirmed as a green verified scalar, or we eat the difference publicly. We don't expect to eat anything.

WHAT'S IN IT

The four benchmarks, each runnable independently

One command per fight. Each invocation reads the corpus, runs the procedure, prints the scalar (and the secondary scalars), and writes a JSON report you can share verbatim.

WHAT IT REQUIRES

The named public corpus and commodity hardware

The corpus URLs are in the YAML config that ships with the binary. No proprietary data, no internal mirrors. The hardware floor is documented per fight; cluster fights tell you the fleet shape.

WHEN IT SHIPS

With Validiti Series 1 launch

Until Series 1 ships, every benchmark binary is gated — consistent with every Validiti download. The fights stand. The numbers come the day the harness does. Subscribe below to be on the notification list.

What “internal bench” means here

The Validiti numbers above are conservative floors measured on the corpora and procedures published in each fight, on commodity hardware matching the documented floor. They are not aspirational targets, they are not marketing rounding — they are bounds we expect to clear when the sealed harness runs in public. We chose floors instead of best-observed numbers so the eventual public verification only surprises in one direction.

MEASURED ON

The same commodity floor we publish

VTPS, RCDR, SACT: 8-core / 32 GB / NVMe, single host. FNQ: 100 of the same nodes connected by commodity Ethernet. No tuned kernels, no Infiniband, no datacenter-class hardware.

REPORTED AS

Lower bounds, not best observed

Every floor on this page is at least 1.5× better than the worst run we've recorded on the relevant corpus. The harness will print the actual scalar; the difference is upside, not exposure.

DATED

2026-06-02 stamp on every Validiti card

The floor is what we'll defend as of that date. If a competitor publishes a higher number for a metric we floored low on, the floor stands — the eventual harness run will speak for itself.

The open challenge

If you ship a database, a search engine, a ledger, a warehouse, or a SIEM and you believe your published numbers are honest — run our harness when it ships. Post your scalar. We will post ours. The reader can read both. That is the entire contest.

FOR VENDORS

Run the harness, post the JSON

Same corpus, same hardware floor, same procedure. Publish the scalar. We will link to your number on this page, sub-second after you publish it.

FOR CUSTOMERS

Run the harness on your data shape

The corpora here are the public stand-ins. The harness will accept a YAML pointer to your own corpus too — same procedure, same scalar. Don't take any vendor's word for it.

FOR ANYONE

Re-run the published numbers

Our scalars must reproduce. If yours don't match within a documented tolerance, we want to know. The reproducibility floor is the whole point.

Beat us at a single number, or stop saying you're faster.

The rules of this contest

One number per fight

Public corpus

Sealed reference harness

Records-search throughput on the English Wikipedia corpus.

English Wikipedia full dump — ~22 GB compressed, ~80 GB raw, ~6.7 M articles

Hot-cache queries per second sustained at p99 < 10 ms

8-core / 32 GB RAM / NVMe — deliberately commodity

Reported alongside, on the same run

Published numbers from the field

Apollo at kr0n0z.com serves single-digit-millisecond responses across 141 000 records on a 4-core commodity box. Open it and time it yourself.

Tamper-evident write throughput at production hardware.

U.S. federal bills XML — 118th Congress, public domain

Sustained records per second, tamper-evidence intact

If the chain doesn't verify, the run doesn't count

Reported alongside, on the same run

Published numbers from the field

Bytes on disk after ingesting a records-class corpus.

U.S. federal bills XML + 5-year FAERS adverse-event corpus

Disk bytes after ingest, divided by raw corpus bytes

This fight is about records corpora, by design

Reported alongside, on the same run

Published numbers from the field

Cross-node query p99 across a 100-node fleet.

English Wikipedia, sharded 100 ways

p99 cross-shard query latency, milliseconds

100 nodes · equal shards · commodity Ethernet

Reported alongside, on the same run

Published numbers from the field

One signed binary. Four scalars. Reproducible by anyone.

The four benchmarks, each runnable independently

The named public corpus and commodity hardware

With Validiti Series 1 launch

What “internal bench” means here

The same commodity floor we publish

Lower bounds, not best observed

2026-06-02 stamp on every Validiti card

The open challenge

Run the harness, post the JSON

Run the harness on your data shape

Re-run the published numbers

Any challenger. Any stage.