Performance
The Gruel compiler is designed for fast compilation. This dashboard tracks compilation performance over time, helping detect regressions and measure the impact of optimizations.
Benchmark Coverage
Recent Benchmark Runs
Loading...
Compilation Time Trend
Total compilation time across the last 20 commits. Lower is better.
Loading chart...
Hot vs Cold Compilation
Each benchmark is compiled multiple times in succession. The first run is "cold" (OS page cache empty for source files and the gruel binary); subsequent runs are "hot" with caches warm. The gap reflects what cache warmth alone is worth today, before any compiler-level incremental work lands.
Loading chart...
Time by Compiler Pass
Breakdown of where compilation time is spent in the most recent benchmark run.
Loading chart...
Peak Memory Usage
Peak memory consumption during compilation. Lower is better.
Loading chart...
Output Binary Size
Size of compiled binary. Smaller binaries are generally preferable.
Loading chart...
Runtime Performance
Execution time of compiled binaries over recent commits. Lower is better. Shows how well the compiler optimizes generated code.
Loading chart...
Detailed Metrics
Source metrics, throughput, memory usage, and binary size for the latest benchmark run.
Loading metrics...
Methodology
These benchmarks are run automatically on every commit to the main branch across all supported platforms. Each benchmark is executed multiple times to reduce noise, and both mean and standard deviation are recorded.
Platforms
Benchmarks run on the following platforms using GitHub Actions:
- Linux x86-64 - Ubuntu runner (ubuntu-latest)
- Linux ARM64 - Ubuntu ARM runner (ubuntu-24.04-arm)
- macOS ARM64 - Apple Silicon runner (macos-latest)
Benchmark Suite
The benchmark corpus includes hand-crafted stress tests that exercise different parts of the compiler:
- many_functions - 1000 functions to stress function handling and symbol resolution
- deep_nesting - 150 functions with deep block/if/while nesting (up to 40 levels)
- large_structs - 700 struct types with 4-8 fields each to stress type handling
- arithmetic_heavy - 250 functions with long arithmetic chains to stress parsing/codegen
- control_flow - 390 functions with complex if/while/match patterns to stress CFG construction
- array_heavy - 200 functions with array declarations, indexing, and modifications
- register_pressure - 210 functions with many simultaneous live variables
- comptime_heavy - 68 comptime blocks with loops, recursion, structs, arrays, enums, and pattern matching
Optimization Levels
Each benchmark is compiled at both -O0 (no optimization) and -O3 (full optimization). This tracks the cost of LLVM optimization passes and the quality of generated code. Use the "Opt Level" dropdown to switch between views.
Runtime Measurement
After compiling each benchmark, the resulting binary is executed multiple times to measure runtime performance. This tracks how well the compiler's generated code performs. Programs do deterministic computation (no I/O, no randomness) for reliable measurements.
Environment
Benchmarks run on GitHub Actions runners. While there is some variability between runs, running multiple iterations helps smooth out noise. Cross-platform comparisons should focus on trends rather than absolute numbers, as different architectures have different performance characteristics.
Benchmark Coverage
To handle high commit velocity, the performance testing system uses time-based batching: benchmarks run every 15 minutes, potentially covering multiple commits in a single run. The "Benchmark Coverage" section shows which commits have been benchmarked and tracks the commit ranges covered by each benchmark run.
Benchmark runs are triggered by three mechanisms:
- Scheduled - Automatic runs every 15 minutes via GitHub Actions
- Manual - On-demand runs triggered by developers
- Push - Triggered by pushes to trunk (subject to queue-based throttling)