Performance

The Gruel compiler is designed for fast compilation. This dashboard tracks compilation performance over time, helping detect regressions and measure the impact of optimizations.

Methodology

These benchmarks are run automatically on every commit to the main branch across all supported platforms. Each benchmark is executed multiple times to reduce noise, and both mean and standard deviation are recorded.

Platforms

Benchmarks run on the following platforms using GitHub Actions:

Linux x86-64 - Ubuntu runner (ubuntu-latest)
Linux ARM64 - Ubuntu ARM runner (ubuntu-24.04-arm)
macOS ARM64 - Apple Silicon runner (macos-latest)

Benchmark Suite

The benchmark corpus includes hand-crafted stress tests that exercise different parts of the compiler:

many_functions - 1000 functions to stress function handling and symbol resolution
deep_nesting - 150 functions with deep block/if/while nesting (up to 40 levels)
large_structs - 700 struct types with 4-8 fields each to stress type handling
arithmetic_heavy - 250 functions with long arithmetic chains to stress parsing/codegen
control_flow - 390 functions with complex if/while/match patterns to stress CFG construction
array_heavy - 200 functions with array declarations, indexing, and modifications
register_pressure - 210 functions with many simultaneous live variables
comptime_heavy - 68 comptime blocks with loops, recursion, structs, arrays, enums, and pattern matching

Optimization Levels

Each benchmark is compiled at both -O0 (no optimization) and -O3 (full optimization). This tracks the cost of LLVM optimization passes and the quality of generated code. Use the "Opt Level" dropdown to switch between views.

Runtime Measurement

After compiling each benchmark, the resulting binary is executed multiple times to measure runtime performance. This tracks how well the compiler's generated code performs. Programs do deterministic computation (no I/O, no randomness) for reliable measurements.

Environment

Benchmarks run on GitHub Actions runners. While there is some variability between runs, running multiple iterations helps smooth out noise. Cross-platform comparisons should focus on trends rather than absolute numbers, as different architectures have different performance characteristics.

Benchmark Coverage

To handle high commit velocity, the performance testing system uses time-based batching: benchmarks run every 15 minutes, potentially covering multiple commits in a single run. The "Benchmark Coverage" section shows which commits have been benchmarked and tracks the commit ranges covered by each benchmark run.

Benchmark runs are triggered by three mechanisms:

Scheduled - Automatic runs every 15 minutes via GitHub Actions
Manual - On-demand runs triggered by developers
Push - Triggered by pushes to trunk (subject to queue-based throttling)

Performance

Benchmark Coverage

Recent Benchmark Runs

Compilation Time Trend

Hot vs Cold Compilation

Time by Compiler Pass

Peak Memory Usage

Output Binary Size

Runtime Performance

Detailed Metrics

Methodology

Platforms

Benchmark Suite

Optimization Levels

Runtime Measurement

Environment

Benchmark Coverage