ADR-0012: Compiler Optimization Passes

Status

Superseded by ADR-0034

Summary

Add optimization passes to the Gruel compiler operating at the CFG level. Initial passes include constant folding (at CFG level to complement existing RIR-level evaluation) and dead code elimination. These improve codegen quality without changing program semantics. Optimization levels follow standard compiler conventions (-O0 through -O3).

Context

Currently, the Gruel compiler has minimal optimization:

  1. Constant evaluation at RIR level (ADR-0003): try_evaluate_const() in sema.rs folds arithmetic on literal constants during semantic analysis. This handles let x = 2 + 3 by computing 5 at compile time.

  2. No CFG-level optimization: After CFG construction, no simplification occurs before lowering to MIR.

This means:

  • Dead stores (variables written but never read) are still computed and stored
  • Unreachable code after unconditional returns is still emitted

While these don't affect correctness, they produce suboptimal machine code. For a systems language targeting performance-conscious users, better codegen quality is expected.

Current Pipeline

Source -> Lexer -> Parser -> AstGen -> Sema -> CfgBuilder -> CfgLower -> RegAlloc -> Emit
                                        |           |            |
                                   try_evaluate_  Creates      Lowers to
                                   const (RIR)    CFG          MIR

Proposed Pipeline

Source -> Lexer -> Parser -> AstGen -> Sema -> CfgBuilder -> [Optimize] -> CfgLower -> RegAlloc -> Emit
                                        |           |             |            |
                                   try_evaluate_  Creates     CFG->CFG     Lowers to
                                   const (RIR)    CFG        transforms    MIR

Decision

Add a CFG optimization pass that runs after CFG construction and before lowering to MIR. The pass operates as CFG -> CFG transformations, preserving the CFG structure while simplifying instructions.

Architecture

Create a new module crates/gruel-cfg/src/opt/ with:

gruel-cfg/src/opt/
├── mod.rs           # Optimization pipeline orchestration
├── constfold.rs     # Constant folding pass
└── dce.rs           # Dead code elimination

Pass 1: Constant Folding (constfold.rs)

Fold operations on constants that weren't caught at RIR level:

// Before optimization:
v0 = const 5
v1 = const 3
v2 = add v0, v1

// After optimization:
v0 = const 5      // May be DCE'd if unused
v1 = const 3      // May be DCE'd if unused
v2 = const 8      // Folded result

Foldable operations:

  • Arithmetic: Add, Sub, Mul, Div, Mod (with overflow/div-zero checks)
  • Comparisons: Eq, Ne, Lt, Gt, Le, Ge
  • Bitwise: BitAnd, BitOr, BitXor, Shl, Shr
  • Unary: Neg, Not, BitNot

Why CFG-level when RIR already folds?

  • RIR folding catches literal expressions (2 + 3)
  • CFG folding catches patterns that emerge after CFG construction
  • Block parameters (phi-like values) with constant arguments
  • Future: interprocedural constants after inlining

Pass 2: Dead Code Elimination (dce.rs)

Remove unused values and unreachable blocks:

Value-level DCE:

// Before:
v0 = const 42
v1 = alloc 0, v0   // Store to slot 0
v2 = const 10
ret v2             // v1 (slot 0) never read

// After:
v2 = const 10
ret v2             // Dead store eliminated

Block-level DCE:

// Before:
bb0:
  ret v0
bb1:                // Unreachable (no predecessors)
  v1 = const 42
  goto bb2

// After:
bb0:
  ret v0
// bb1 removed

Algorithm:

  1. Compute liveness: mark values reachable from terminators
  2. Mark side-effecting instructions as live (calls, stores to escaping slots)
  3. Remove unmarked instructions
  4. Remove blocks with no predecessors (except entry)

Optimization Levels

Follow standard compiler conventions:

LevelFlagBehavior
0-O0No optimization (default for now)
1-O1Basic optimizations (constant folding, DCE)
2-O2All optimizations (same as -O1 for now)
3-O3Aggressive optimizations (same as -O2 for now)

Future optimization passes will be added at appropriate levels.

API

// In gruel-cfg/src/opt/mod.rs

/// Optimization level, following standard compiler conventions.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)]
pub enum OptLevel {
    /// No optimization (-O0)
    #[default]
    O0,
    /// Basic optimizations (-O1): constant folding, DCE
    O1,
    /// Standard optimizations (-O2): all of O1
    O2,
    /// Aggressive optimizations (-O3): all of O2
    O3,
}

/// Run optimization passes on a CFG at the given level.
pub fn optimize(cfg: &mut Cfg, level: OptLevel) {
    match level {
        OptLevel::O0 => {
            // No optimization
        }
        OptLevel::O1 | OptLevel::O2 | OptLevel::O3 => {
            constfold::run(cfg);
            dce::run(cfg);
        }
    }
}

Integration

In gruel-compiler/src/lib.rs, add optimization after CFG building:

// Build CFG
let cfg = CfgBuilder::new(&air, &struct_defs, &array_types).build()?;

// Optimize based on -O level
let mut cfg = cfg;
gruel_cfg::opt::optimize(&mut cfg, options.opt_level);

// Lower to MIR
let mir = CfgLower::new(&cfg, ...).lower()?;

Testing Strategy

Optimizations don't change program semantics, so existing spec tests continue to pass. For testing the optimization passes themselves:

  1. Functional tests (spec tests): All existing tests remain valid - programs produce the same results.

  2. Golden tests (IR dumps): Update crates/gruel-spec/cases/golden/ir-dumps.toml to reflect optimized CFG/MIR output. Add new cases specifically for optimization.

  3. UI tests: Add crates/gruel-ui-tests/cases/optimization/ for:

    • Verifying specific optimizations trigger
    • Testing optimization flags (-O0, -O1, etc.)
    • Regression tests for miscompilations

Example UI test:

[[case]]
name = "constant_folding_at_O1"
opt_level = 1
source = """
fn main() -> i32 {
    let x = 2 + 3;
    x
}
"""
exit_code = 5
cfg_contains = ["const 5"]      # Verify folded constant
cfg_not_contains = ["add"]      # Verify no add instruction

Compiler Flags

Add -O flags to control optimization level:

  • -O0: No optimization (default)
  • -O1: Basic optimizations (constant folding, DCE)
  • -O2: Standard optimizations (same as -O1 for now)
  • -O3: Aggressive optimizations (same as -O2 for now)

Tests can specify opt_level = N to control which level is used for that test case.

Multi-Backend Considerations

CFG optimization is target-independent - it operates on typed CFG instructions before lowering to architecture-specific MIR. The optimization passes work identically for x86_64 and aarch64.

Implementation Phases

  • Phase 1: Optimization framework - gruel-aapc.1

    • Create gruel-cfg/src/opt/mod.rs with OptLevel enum and pass orchestration
    • Add -O0 through -O3 flags to CLI
    • Add UI test infrastructure with opt_level support
    • Wire up in gruel-compiler
  • Phase 2: Constant folding - gruel-aapc.2

    • Implement constfold.rs
    • Add tests verifying folding occurs at -O1+
    • Update golden tests as needed
  • Phase 3: Dead code elimination - gruel-aapc.3

    • Implement dce.rs with liveness analysis
    • Handle side-effects correctly (calls, escaping stores)
    • Add tests for dead store/block elimination

Consequences

Positive

  • Better code quality: Obvious inefficiencies are eliminated
  • Foundation for future optimization: Framework supports adding more passes
  • Predictable behavior: Optimizations are conservative and well-defined
  • Debuggability: -O0 preserves direct correspondence to source
  • Standard interface: -O levels familiar to developers from gcc/clang

Negative

  • Golden test maintenance: IR dump tests need updating when optimization changes
  • Compilation time: Minor increase (optimization passes take time)
  • Debugging complexity: Optimized code may differ from source structure

Neutral

  • No semantic changes: All existing tests pass unchanged
  • No preview gate needed: Optimizations don't change language behavior

Open Questions

None remaining - decisions made:

  • Optimization levels use standard -O0 through -O3 flags
  • No diagnostics emitted for optimizations
  • Strength reduction deferred to future work

Future Work

  • Strength reduction: Multiply/divide by power of 2 -> shifts
  • Loop optimizations: Unrolling, invariant hoisting
  • Inlining: For small functions
  • Common subexpression elimination: Reuse computed values
  • Register allocation hints: From optimization analysis
  • Profile-guided optimization: Hot path optimization

References