ADR-0012: Compiler Optimization Passes
Status
Superseded by ADR-0034
Summary
Add optimization passes to the Gruel compiler operating at the CFG level. Initial passes include constant folding (at CFG level to complement existing RIR-level evaluation) and dead code elimination. These improve codegen quality without changing program semantics. Optimization levels follow standard compiler conventions (-O0 through -O3).
Context
Currently, the Gruel compiler has minimal optimization:
Constant evaluation at RIR level (ADR-0003):
try_evaluate_const()in sema.rs folds arithmetic on literal constants during semantic analysis. This handleslet x = 2 + 3by computing 5 at compile time.No CFG-level optimization: After CFG construction, no simplification occurs before lowering to MIR.
This means:
- Dead stores (variables written but never read) are still computed and stored
- Unreachable code after unconditional returns is still emitted
While these don't affect correctness, they produce suboptimal machine code. For a systems language targeting performance-conscious users, better codegen quality is expected.
Current Pipeline
Source -> Lexer -> Parser -> AstGen -> Sema -> CfgBuilder -> CfgLower -> RegAlloc -> Emit
| | |
try_evaluate_ Creates Lowers to
const (RIR) CFG MIR
Proposed Pipeline
Source -> Lexer -> Parser -> AstGen -> Sema -> CfgBuilder -> [Optimize] -> CfgLower -> RegAlloc -> Emit
| | | |
try_evaluate_ Creates CFG->CFG Lowers to
const (RIR) CFG transforms MIR
Decision
Add a CFG optimization pass that runs after CFG construction and before lowering to MIR. The pass operates as CFG -> CFG transformations, preserving the CFG structure while simplifying instructions.
Architecture
Create a new module crates/gruel-cfg/src/opt/ with:
gruel-cfg/src/opt/
├── mod.rs # Optimization pipeline orchestration
├── constfold.rs # Constant folding pass
└── dce.rs # Dead code elimination
Pass 1: Constant Folding (constfold.rs)
Fold operations on constants that weren't caught at RIR level:
// Before optimization:
v0 = const 5
v1 = const 3
v2 = add v0, v1
// After optimization:
v0 = const 5 // May be DCE'd if unused
v1 = const 3 // May be DCE'd if unused
v2 = const 8 // Folded result
Foldable operations:
- Arithmetic: Add, Sub, Mul, Div, Mod (with overflow/div-zero checks)
- Comparisons: Eq, Ne, Lt, Gt, Le, Ge
- Bitwise: BitAnd, BitOr, BitXor, Shl, Shr
- Unary: Neg, Not, BitNot
Why CFG-level when RIR already folds?
- RIR folding catches literal expressions (
2 + 3) - CFG folding catches patterns that emerge after CFG construction
- Block parameters (phi-like values) with constant arguments
- Future: interprocedural constants after inlining
Pass 2: Dead Code Elimination (dce.rs)
Remove unused values and unreachable blocks:
Value-level DCE:
// Before:
v0 = const 42
v1 = alloc 0, v0 // Store to slot 0
v2 = const 10
ret v2 // v1 (slot 0) never read
// After:
v2 = const 10
ret v2 // Dead store eliminated
Block-level DCE:
// Before:
bb0:
ret v0
bb1: // Unreachable (no predecessors)
v1 = const 42
goto bb2
// After:
bb0:
ret v0
// bb1 removed
Algorithm:
- Compute liveness: mark values reachable from terminators
- Mark side-effecting instructions as live (calls, stores to escaping slots)
- Remove unmarked instructions
- Remove blocks with no predecessors (except entry)
Optimization Levels
Follow standard compiler conventions:
| Level | Flag | Behavior |
|---|---|---|
| 0 | -O0 | No optimization (default for now) |
| 1 | -O1 | Basic optimizations (constant folding, DCE) |
| 2 | -O2 | All optimizations (same as -O1 for now) |
| 3 | -O3 | Aggressive optimizations (same as -O2 for now) |
Future optimization passes will be added at appropriate levels.
API
// In gruel-cfg/src/opt/mod.rs
/// Optimization level, following standard compiler conventions.
/// Run optimization passes on a CFG at the given level.
Integration
In gruel-compiler/src/lib.rs, add optimization after CFG building:
// Build CFG
let cfg = new.build?;
// Optimize based on -O level
let mut cfg = cfg;
optimize;
// Lower to MIR
let mir = new.lower?;
Testing Strategy
Optimizations don't change program semantics, so existing spec tests continue to pass. For testing the optimization passes themselves:
Functional tests (spec tests): All existing tests remain valid - programs produce the same results.
Golden tests (IR dumps): Update
crates/gruel-spec/cases/golden/ir-dumps.tomlto reflect optimized CFG/MIR output. Add new cases specifically for optimization.UI tests: Add
crates/gruel-ui-tests/cases/optimization/for:- Verifying specific optimizations trigger
- Testing optimization flags (
-O0,-O1, etc.) - Regression tests for miscompilations
Example UI test:
[[]]
= "constant_folding_at_O1"
= 1
= """
fn main() -> i32 {
let x = 2 + 3;
x
}
"""
= 5
= ["const 5"] # Verify folded constant
= ["add"] # Verify no add instruction
Compiler Flags
Add -O flags to control optimization level:
-O0: No optimization (default)-O1: Basic optimizations (constant folding, DCE)-O2: Standard optimizations (same as -O1 for now)-O3: Aggressive optimizations (same as -O2 for now)
Tests can specify opt_level = N to control which level is used for that test case.
Multi-Backend Considerations
CFG optimization is target-independent - it operates on typed CFG instructions before lowering to architecture-specific MIR. The optimization passes work identically for x86_64 and aarch64.
Implementation Phases
Phase 1: Optimization framework - gruel-aapc.1
- Create
gruel-cfg/src/opt/mod.rswithOptLevelenum and pass orchestration - Add
-O0through-O3flags to CLI - Add UI test infrastructure with
opt_levelsupport - Wire up in gruel-compiler
- Create
Phase 2: Constant folding - gruel-aapc.2
- Implement
constfold.rs - Add tests verifying folding occurs at -O1+
- Update golden tests as needed
- Implement
Phase 3: Dead code elimination - gruel-aapc.3
- Implement
dce.rswith liveness analysis - Handle side-effects correctly (calls, escaping stores)
- Add tests for dead store/block elimination
- Implement
Consequences
Positive
- Better code quality: Obvious inefficiencies are eliminated
- Foundation for future optimization: Framework supports adding more passes
- Predictable behavior: Optimizations are conservative and well-defined
- Debuggability:
-O0preserves direct correspondence to source - Standard interface:
-Olevels familiar to developers from gcc/clang
Negative
- Golden test maintenance: IR dump tests need updating when optimization changes
- Compilation time: Minor increase (optimization passes take time)
- Debugging complexity: Optimized code may differ from source structure
Neutral
- No semantic changes: All existing tests pass unchanged
- No preview gate needed: Optimizations don't change language behavior
Open Questions
None remaining - decisions made:
- Optimization levels use standard
-O0through-O3flags - No diagnostics emitted for optimizations
- Strength reduction deferred to future work
Future Work
- Strength reduction: Multiply/divide by power of 2 -> shifts
- Loop optimizations: Unrolling, invariant hoisting
- Inlining: For small functions
- Common subexpression elimination: Reuse computed values
- Register allocation hints: From optimization analysis
- Profile-guided optimization: Hot path optimization
References
- ADR-0003: Constant Expression Evaluation - Existing RIR-level folding
- Engineering a Compiler, Chapter 8 - Optimization theory
- LLVM Optimization Passes - Reference for pass design