ADR-0007: Hindley-Milner Type Inference
Status
Implemented
Summary
Replace the current bidirectional type checking with constraint-based Hindley-Milner type inference. This eliminates ad-hoc "peek" heuristics and default-to-i32 fallbacks, providing consistent, predictable type inference across all expressions.
Context
The current type system (ADR-0002) uses bidirectional type checking with several limitations:
Current Limitations
Default i32 fallback: Integer literals without context default to
i32:// In TypeExpectation::integer_type() _ => I32 // The hackThis appears in 5+ places in sema.rs (lines 195, 811, 2583, 2907, 2949).
Inconsistent inference: The
peek_typefunction (lines 2895-2987) tries to guess types without full analysis, but can't handle complex cases:let x = 1 + 2; // Both literals -> defaults to i32, can't infer from later useOrder-dependent inference: The type of
1 < xdepends on whether x's type is already known. If x is also a literal expression, inference fails and falls back to i32.No postponed decisions: The single-pass design can't postpone type decisions when information comes later in the expression.
Why Hindley-Milner?
HM inference:
- Handles all type decisions uniformly through constraint solving
- Is principal: finds the most general type
- Is decidable: Algorithm W has proven termination
- Scales to generics/polymorphism if we add them later
- Is well-understood: decades of research and implementations
Note: We're implementing HM for inference consistency, not for polymorphism. Gruel currently has no generics, and this ADR doesn't add them.
Decision
Architecture Overview
┌─────────────────────────────────────────────────────────────────┐
│ RIR (input) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Phase 1: Constraint Generation │
│ - Walk RIR, assign type variables to unknowns │
│ - Generate constraints: τ₁ = τ₂, τ ∈ {integers} │
│ - Build substitution environment │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Phase 2: Constraint Solving (Unification) │
│ - Apply Algorithm W/J unification │
│ - Resolve type variables to concrete types │
│ - Detect and report type errors │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Phase 3: AIR Generation │
│ - Walk RIR again with resolved types │
│ - Emit typed AIR instructions │
│ - All types are now concrete │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ AIR (output) │
└─────────────────────────────────────────────────────────────────┘
Type Language
Extend the internal type representation (NOT the user-facing Type enum):
/// Internal type representation during inference.
/// Separate from the final Type enum to support type variables.
/// Unique identifier for type variables.
;
Constraint System
/// A type constraint generated during analysis.
Note: We don't need IsInteger because integer literals use the IntLiteral type directly, which encodes "must be an integer" in its unification rules.
Algorithm
Phase 1: Constraint Generation
Walk the RIR and for each expression:
- If type is known (annotation, parameter, etc.) → use
Concrete(Type) - If it's an integer literal → use
IntLiteral - If type is unknown (unannotated let, etc.) → create fresh type variable
Var(id) - Generate
Equalconstraints based on how expressions combine
Example:
fn foo(x: i64) -> i64 {
let y = 1 + x; // y's type unknown, 1 is IntLiteral
y
}
Generates:
type(1) = IntLiteral
type(x) = Concrete(i64)
β = fresh() // type of y
Equal(IntLiteral, i64) // 1 + x requires same types
Equal(β, i64) // y = result of addition
Equal(β, i64) // returned value must match return type
Phase 2: Unification
Apply unification with special rules for IntLiteral:
- Process constraints in order
- For
Equal(τ₁, τ₂):- If both
Concrete: check equality, error if different - If one is
Var: substitute the variable - If both
Var: unify (pick one) - If one is
IntLiteraland other isConcrete(integer):IntLiteralbecomes that integer type - If one is
IntLiteraland other isConcrete(non-integer): error - If both are
IntLiteral: they stay asIntLiteral(will be resolved later)
- If both
- Apply substitutions transitively
Integer defaulting (the key improvement):
- After all constraints are processed, any remaining
IntLiteraltypes default toi32 - This happens once, at the end, not scattered throughout
Phase 3: AIR Generation
Walk RIR again with resolved substitution:
- Look up each expression's type variable
- Apply substitution to get concrete type
- Emit AIR with concrete types
Changes to sema.rs
The main Sema struct gains new fields:
The analyze_inst function splits into:
generate_constraints- Phase 1- (Unification happens between phases)
emit_air- Phase 3
Error Messages
With full constraint context, we can provide better error messages:
Before:
error: type mismatch: expected i32, found i64
--> file.gruel:3:5
After:
error: type mismatch
--> file.gruel:3:5
|
3 | let y = 1 + x;
| ^ this has type i64 (from parameter x: i64)
|
note: literal 1 was inferred as i64 to match the other operand
Backwards Compatibility
This is an internal refactor. The surface language and semantics are unchanged:
- Same types: i8, i16, i32, i64, u8, u16, u32, u64, bool, etc.
- Same inference behavior (integer literals default to i32 when unconstrained)
- Same error messages (or better)
- Same AIR output
Code that compiled before will compile after with the same types.
Implementation Phases
Phase 1: Type variable infrastructure - gruel-xvc
- Add
InferType,TypeVarIdtypes - Add type variable allocation and substitution
- Add constraint types
- Unit tests for unification algorithm
- Add
Phase 2: Constraint generation - gruel-5os
- Add
generate_constraintsthat walks RIR - Generate constraints for all expression types
- Preserve span information for error reporting
- Tests for constraint generation
- Add
Phase 3: Unification - gruel-8sl
- Implement Algorithm W unification
- Handle
IntLiteralspecial unification rules - Apply integer defaulting at the end
- Error collection and reporting with
Type::Errorrecovery - Tests for unification edge cases
Phase 4: AIR emission - gruel-048
- Split current
analyze_instinto constraint gen + emission - Emit AIR with resolved types
- Verify all existing spec tests pass
- Legacy peek_type and integer_type kept as defensive fallbacks (may remove in Phase 5)
- Split current
Phase 5: Cleanup and stabilization - gruel-rvm
- Remove TypeExpectation enum and peek_type function
- Add InferType::Array for structural array type handling during inference
- Update FunctionSig and ParamVarInfo to use InferType (not Type)
- Pure HM inference without fallbacks
Consequences
Positive
- Consistent inference: No more scattered i32 defaults
- Principled approach: Well-understood algorithm with proven properties
- Better errors: Full constraint context for diagnostics
- Foundation for generics: If we add polymorphism later, the infrastructure exists
- Simpler mental model: One algorithm handles all inference
Negative
- Two-pass analysis: Constraint generation + AIR emission (vs current single pass)
- Memory overhead: Store constraints and type variable mappings
- Complexity: More code in sema.rs initially
- Learning curve: HM inference is less intuitive than bidirectional checking
Neutral
- Same behavior: No user-visible semantic changes
- Same performance: Two passes, but each is simpler
Design Decisions
IntLiteral type for integer literals: Integer literals get a special
IntLiteraltype rather than a type variable with anIsIntegerconstraint.IntLiteralcan unify with any concrete integer type (i8, i16, i32, i64, u8, u16, u32, u64), and multipleIntLiteraltypes unify with each other (staying asIntLiteraluntil a concrete type forces resolution). UnconstrainedIntLiteraltypes default toi32at the end. This is more principled - it models exactly what an integer literal is: a value that can become any integer type.No mixed-width widening:
1i32 + 2i64remains an error. Inference never widens types implicitly - explicit conversion is required. This matches Rust's behavior and avoids surprising implicit conversions.Error recovery: When unification fails, substitute
Type::Errorfor the failing type variable and continue processing remaining constraints. This provides better diagnostics by catching multiple errors in one pass, matching the current behavior.
Future Work
- Polymorphic functions:
fn identity<T>(x: T) -> T- this ADR lays groundwork - Type aliases:
type Int = i32- straightforward extension - Trait bounds:
fn print<T: Display>(x: T)- requires trait system first - Associated types: Complex, depends on traits
References
- Principal Type Schemes for Functional Programs - Damas & Milner, 1982
- ADR-0002: Single-Pass Bidirectional Types - Current system
- Typing Haskell in Haskell - Jones, 1999
- Bidirectional Typing - Dunfield & Krishnaswami
- Rust's type inference implementation (similar hybrid approach)