ADR-0019: Compiler Performance Dashboard
Status
Implemented
Summary
Create a system for tracking Gruel compiler performance over time and displaying it on the website. This includes a benchmark corpus of representative Gruel programs, infrastructure to collect and store timing data, and interactive visualizations for the website.
Context
As Gruel develops, we want to track compiler performance to:
- Detect regressions - Notice when changes slow down compilation
- Track improvements - See the impact of optimization work
- Provide transparency - Show the community our performance characteristics
The existing --time-passes flag provides per-pass timing, but there's no:
- Standardized benchmark corpus
- Historical data storage
- Visualization infrastructure
Decision
Part 1: Benchmark Corpus
Location: benchmarks/ directory at repository root
Design choices:
Decision: Hand-crafted benchmark programs
Spec tests are too small to meaningfully benchmark. We'll create purpose-built programs that exercise different compiler phases at scale.
Corpus structure:
benchmarks/
├── README.md # Describes the benchmark suite
├── stress/ # Hand-crafted stress tests
│ ├── many_functions.gruel # 100+ functions
│ ├── deep_nesting.gruel # Deeply nested blocks/expressions
│ ├── large_structs.gruel # Many struct types with fields
│ ├── arithmetic_heavy.gruel # Lots of arithmetic expressions
│ └── control_flow.gruel # Complex if/while/match patterns
└── manifest.toml # Benchmark metadata
Manifest format:
[[]]
= "spec_tests"
= "spec_aggregate" # Run all spec tests, aggregate timing
= "Aggregate timing across all spec tests"
[[]]
= "many_functions"
= "stress/many_functions.gruel"
= "100 trivial functions to stress function handling"
[[]]
= "deep_nesting"
= "stress/deep_nesting.gruel"
= "20 levels of nested blocks"
Part 2: Data Collection & Storage
CLI addition: --benchmark-json <file> flag
When provided, outputs structured JSON timing data instead of human-readable report.
JSON output format:
Data storage:
Decision: Dedicated perf branch
Since benchmarks run on every commit to trunk, storing results directly in the main branch would create noise. Instead:
- A dedicated
perfbranch holds benchmark history - CI runs benchmarks on each trunk commit, pushes results to
perfbranch website/static/benchmarks/history.jsonis built from theperfbranch during website deploy- Keep the last 100 runs to limit file size
Benchmark runner script: ./bench.sh
#!/bin/bash
# Build release compiler, run benchmarks, append to history
# Append to history (via a small Rust tool or script)
Part 3: Website Visualization
Page: /performance/ on the Gruel website
Decision: Static charts generated at build time
Keep scope minimal with static SVG charts generated during website build. We can add Chart.js for interactivity later if needed.
Charts to generate:
- Time-series chart - Total compilation time over commits
- Pass breakdown chart - Stacked bar showing time per pass
Implementation:
- A Rust tool or Python script reads
history.jsonand generates SVG charts - Charts are placed in
website/static/benchmarks/during build - Zola includes them in the performance page
Template structure:
website/
├── content/
│ └── performance.md # Performance page content
├── templates/
│ └── performance.html # Template including chart images
├── static/
│ └── benchmarks/
│ ├── history.json # Historical data (from perf branch)
│ ├── timeline.svg # Generated time-series chart
│ └── breakdown.svg # Generated pass breakdown
Future enhancement: Add Chart.js for hover tooltips and filtering when scope allows
Implementation Phases
Epic: gruel-a5ah
Phase 1: Benchmark corpus - gruel-a5ah.1
- Create
benchmarks/directory structure - Write initial stress test programs (many_functions, deep_nesting, etc.)
- Create
manifest.tomlformat
- Create
Phase 2: Data collection - gruel-a5ah.2
- Implement
--benchmark-jsonflag in the CLI - Extend
TimingDatato output JSON format - Support multiple iterations with mean/std calculation
- Implement
Phase 3: Runner & storage - gruel-a5ah.3
- Create
./bench.shscript - Create
scripts/append-benchmark.pyto manage history - Set up
perfbranch structure - Document benchmark workflow
- Add release/debug build modes via
--releaseflag
- Create
Phase 4: CI integration - gruel-a5ah.4
- GitHub Actions workflow to run benchmarks on trunk commits
- Push results to
perfbranch - Configure consistent benchmark environment
Phase 5: Website visualization - gruel-a5ah.5
- Create SVG chart generator (Rust or Python)
- Create
/performance/page template - Integrate chart generation into website build
Consequences
Positive
- Clear visibility into compiler performance over time
- Ability to detect regressions before they accumulate
- Public transparency about performance characteristics
- Foundation for future optimization work
- CI runs on every commit ensures continuous tracking
Negative
- CI runner variability may add noise (mitigated by multiple iterations)
- Adds maintenance burden (corpus, scripts, visualization)
- History file will grow over time (mitigated by limiting to 100 entries)
Neutral
- Benchmarks measure compilation time, not runtime performance
- Initial corpus may not represent "real-world" usage until we have users
- Dedicated
perfbranch keeps main branch clean but adds complexity
Resolved Questions
Multiple iterations? Yes, 5 iterations with mean/std reported.
What stress tests? Initial set: many_functions, deep_nesting, large_structs, arithmetic_heavy, control_flow.
History retention? 100 most recent runs.
Visualization approach? Static SVG charts for now, Chart.js as future enhancement.
When to run benchmarks? On every commit to trunk via CI.
Data storage? Dedicated
perfbranch to avoid noise in main.
Future Work
- Interactive charts with Chart.js (hover tooltips, filtering, zoom)
- Runtime performance benchmarks (execution speed of compiled programs)
- Memory usage tracking
- Code size tracking (binary size over time)
- Comparison with other compilers (Rust, Zig, etc.)
References
- ADR-0018: Tracing Infrastructure - The timing layer this builds on
- rustc-perf - Inspiration from Rust's approach
- Zig perf - Inspiration from Zig's approach