ADR-0037: Enum Data Variants and Full Pattern Matching
Status
Implemented
Summary
Extend Gruel enums to carry associated data per variant (algebraic data types), and extend pattern matching so match arm patterns can destructure and bind that data. This enables idiomatic sum types like Option and Result without generics, using concrete types initially.
Context
What Exists
ADR-0004 implemented C-style enums: discriminated unions with no per-variant data. Match expressions can branch on which variant is selected, but cannot extract any value from the variant:
enum Color { Red, Green, Blue }
match c {
Color::Red => 1,
_ => 0,
}
ADR-0036 implemented struct let-destructuring, establishing the ownership framework for decomposing composite values into independently-owned fields.
What's Missing
There is no way to associate data with a variant:
// Not yet possible:
enum IntOption { Some(i32), None }
let x = IntOption::Some(42);
match x {
IntOption::Some(v) => v,
IntOption::None => 0,
}
This gap forces users to simulate sum types with structs + enums, sacrificing type safety and expressiveness. Sum types are fundamental to writing safe, expressive code without null pointers or error codes.
Scope of This ADR
This ADR covers:
- Tuple-style data variants —
Variant(T1, T2, ...)in enum definitions - Variant construction with data —
Enum::Variant(expr1, expr2, ...) - Binding patterns in match arms —
Enum::Variant(x, y) => - Ownership of extracted data — consistent with ADR-0036/ADR-0008
Out of scope (future ADRs):
- Generic enums (
Option<T>,Result<T, E>) - Struct-style variant data (
Variant { field: T }) - Nested patterns (
Some(Some(x))) - Or-patterns (
A | B =>) - Pattern matching in let bindings for enums
Decision
1. Enum Variant Data Syntax
Variants may optionally carry a tuple of typed fields:
enum IntOption {
None,
Some(i32),
}
enum Outcome {
Ok(i32),
Err(i32),
}
enum Tagged {
Unit,
One(i32),
Two(i32, i32),
}
All existing C-style enums remain valid and unaffected.
Grammar Changes
enum_def = "enum" IDENT "{" enum_variants "}" ;
enum_variants = enum_variant { "," enum_variant } [ "," ] ;
enum_variant = IDENT [ "(" type_list ")" ] ; -- NEW: optional tuple data
type_list = type { "," type } ;
Variant Construction
Data variants are constructed using a call-like syntax:
let x = IntOption::Some(42);
let e = Outcome::Err(-1);
let t = Tagged::Two(10, 20);
Unit variants retain their existing syntax: IntOption::None, Tagged::Unit.
Updated Match Pattern Grammar
pattern = "_" | INTEGER | BOOL | enum_variant_pattern | enum_data_pattern ;
enum_variant_pattern = IDENT "::" IDENT ;
enum_data_pattern = IDENT "::" IDENT "(" binding_list ")" ;
binding_list = binding { "," binding } ;
binding = [ "mut" ] IDENT | "_" ;
Bindings in data patterns are simple identifiers or wildcards — no nested patterns in this ADR.
Match Arm Body Scoping
Each binding in a data pattern introduces a new local variable in the arm's body, with the type of the corresponding variant field:
match x {
IntOption::Some(v) => v + 1, // v: i32
IntOption::None => 0,
}
match t {
Tagged::Two(a, b) => a + b, // a: i32, b: i32
Tagged::One(n) => n,
Tagged::Unit => 0,
}
Wildcard _ discards the field (immediately drops it if the type has a destructor):
match x {
IntOption::Some(_) => 1,
IntOption::None => 0,
}
Mutability is per-binding:
match x {
IntOption::Some(mut v) => { v += 1; v },
IntOption::None => 0,
}
Exhaustiveness
Data variants are treated the same as unit variants for exhaustiveness: an arm with Enum::Variant(...) (any binding pattern for each field) exhausts that variant. The binding contents do not affect exhaustiveness.
Ownership of Variant Data
When a data variant is matched and its fields are bound by name, those fields are moved out of the enum value. The enum value itself is consumed by the match expression (the scrutinee's slot is forgotten). This mirrors how struct destructuring works in let bindings (ADR-0036).
If a field is bound to _, it is immediately dropped (destructor runs if applicable).
Copy types: fields of copy types are copied out; the original enum value is still consumed at the match expression level (the scrutinee is used up).
Non-copy types: fields are moved out and become independent values owned by the arm body. If the arm body exits without consuming them, they are dropped at scope exit.
2. LLVM Memory Representation
Current Representation (C-style enums)
Currently, enums are represented as their discriminant integer (u8, u16, u32 as needed). No struct is created; the LLVM value is just iN.
New Representation (data variants)
For enums with at least one data variant, the LLVM type becomes a struct:
%EnumName = type { iD, [N x i8] }
Where:
iDis the discriminant type (u8 for ≤256 variants, etc.)Nis the size in bytes of the largest variant's payload, aligned to the largest field alignment
For unit-only enums (no data variants), the representation is unchanged (just iD).
Rationale: An opaque byte array with proper alignment is safe and avoids LLVM union complexity. Individual field reads/writes use getelementptr + bitcast to access the payload as the appropriate field type. This is the standard approach used by Rust's MIR-to-LLVM lowering.
Variant Construction
Enum::Some(42) lowers to:
alloca %IntOptionto get a stack slot- Store discriminant
1into field 0 (theiDfield) - GEP into the payload byte array, bitcast to
i32*, store42 - Load the result as
%IntOption
Match Dispatch
Match on data enums lowers to:
- Extract the discriminant from field 0
- Use LLVM
switchon the discriminant value (as before) - In each arm's basic block, GEP into the payload to extract bound fields
Enums with No Data Variants
C-style enums retain their integer representation. No layout change. This ensures backward compatibility.
3. Type System Changes
EnumDef in gruel-air/src/types.rs is extended:
AirPattern is extended:
RirPattern is extended similarly.
4. Interaction with Existing Features
C-style enums: fully backward compatible. Unit-only enums continue to be represented as integers.
Match exhaustiveness: unchanged logic — exhaustiveness is determined by variant coverage, not by data patterns.
Struct methods on enums: not affected; method dispatch continues to work.
Copy vs non-copy: data variant fields inherit their type's copy/move semantics. An enum with any non-copy field variant is itself non-copy.
Destructors (ADR-0010): if an enum value with a data variant goes out of scope without being matched, its destructor must run. The destructor must check the discriminant and call the appropriate field destructors. This requires a new runtime pattern: a per-enum drop function that dispatches on discriminant.
Implementation Phases
Epic: TBD
Phase 1: Enum variant data declarations (parsing + type system)
- Parser: parse
Variant(Type, ...)in enum definitions; addfields: Vec<TypeExpr>to AST enum variant - RIR: extend
InstData::EnumDeclto store per-variant field types - AIR type system: replace
Vec<String>withVec<EnumVariantDef>inEnumDef; update allEnumDefconstruction sites - Sema gather pass: collect field types per variant; type-check them
- Add
PreviewFeature::EnumDataVariantsingruel-error - Gate the entire data-variant path behind this preview feature
- No codegen yet — just ensure C-style enums still compile
- Parser: parse
Phase 2: Variant construction with data
- Parser: parse
Enum::Variant(expr, ...)as a variant construction call - RIR:
InstData::EnumVariantConstruct { type_name, variant, args: Vec<InstRef> }(distinct from the existing unitEnumVariant) - AIR: new instruction
AirInstData::EnumCreate { enum_id, variant_index, fields: Vec<AirRef> }(or extend existing) - LLVM codegen: change
gruel_type_to_llvmfor data enums to emit{ iD, [N x i8] }; implement construction via alloca + field stores - Unit-only enums retain integer representation
- Add preview-gated spec tests:
Option::Some(42)compiles butexit_codetests are not yet meaningful (no way to extract the value)
- Parser: parse
Phase 3: Match patterns with binding
- Parser: parse
Enum::Variant(x, y)andEnum::Variant(mut z, _)as patterns - RIR:
RirPattern::DataVariant { type_name, variant, bindings }where each binding isOption<(bool, Spur)>(is_mut, name) - Sema: resolve pattern bindings; add bound variables to arm body scope; type-check each binding against the variant's field type
- AIR:
AirPattern::EnumDataVariant { enum_id, variant_index, bindings: Vec<Option<Symbol>> } - CFG: for each bound field, emit GEP into payload + field load into new StorageLive slot; forget the scrutinee slot (ownership transferred)
- Wildcard fields: emit Drop if the field type has a destructor
- Add
preview_should_pass = truespec tests for basic binding
- Parser: parse
Phase 4: Drop dispatch for data enums
- Implement enum destructor dispatch: when a data enum value is dropped, the CFG must emit a match on the discriminant, then drop each live field of the matched variant
- Update
gruel-cfgdrop logic: when dropping a value of an enum type thathas_data_variants(), emit aMatchDropsequence (or inline discriminant check + conditional field drops) - Add spec tests: non-copy data in enum variant is properly dropped at scope exit
Phase 5: LLVM layout correctness and ABI
- Verify that the
[N x i8]payload is correctly sized and aligned for all field type combinations - Use LLVM
getelementptr+ appropriate pointer types for field access - Ensure that LLVM does not incorrectly alias the payload accesses (use
!noaliasif needed) - Add spec tests for multi-field variants, alignment-sensitive types
- Verify that the
Phase 6: Spec, tests, stabilization
- Write spec paragraphs in
docs/spec/src/06-items/03-enums.mdfor data variant declarations - Write spec paragraphs in
docs/spec/src/04-expressions/07-match-expressions.mdfor binding patterns - Full test coverage: unit variants still work, data variants construct and match, ownership is correct, wildcards drop, mutability works
- Run traceability check (
cargo run -p gruel-spec -- --traceability) - When all tests pass and feature is stable: remove
previewfields from spec tests, removerequire_preview()call, removePreviewFeature::EnumDataVariants
- Write spec paragraphs in
Consequences
Positive
- Sum types: users can define
Option,Result, event enums, AST nodes — the cornerstone of safe, expressive programming - No null pointers:
Option<T>is the idiomatic alternative - Match exhaustiveness: still enforced; compiler rejects non-exhaustive patterns
- Backward compatible: C-style enums are unchanged in representation and behavior
- Foundation for generics: once a type parameter system exists (separate ADR), making enums generic (
Option<T>) becomes an extension of this work
Negative
- LLVM layout complexity: the tagged union approach requires more codegen machinery than the current integer-only approach
- Drop dispatch for enums: adds a new drop pattern (conditional drop based on discriminant)
- No nested patterns: users must use intermediate
letbindings for nested sum types until nested pattern matching is added
Neutral
- Non-generic only:
OptionandResultmust be defined with concrete types (IntOption,StrResult, etc.) until a generics ADR is implemented
Resolved Questions
- Pattern arity mismatch:
Variant(a, b)used for a variant with 3 fields is an error. Should this be a parse-time or sema-time error? Sema-time is simpler (parser doesn't know field counts).
Open Questions
Enum drop dispatch strategy: Should drop dispatch be emitted inline in the CFG (a
switchon discriminant followed by per-variant field drops), or should a generated per-enum__drop_EnumNamefunction be emitted and called? An inline approach is simpler initially; a function approach is more LLVM-IR-friendly (enables sharing and inlining decisions by LLVM). Start inline.Payload size calculation: Should the payload size be computed at compile time in
gruel-air(using type sizes), or delegated to the LLVM data layout? Computing it ingruel-airallows the type pool to know the enum's size, which may be needed for other purposes. Delegate to LLVM for now (LLVM knows target sizes).
Future Work
- Generic enums (
Option<T>,Result<T, E>) — requires a type parameter system (separate ADR) - Struct-style variant data (
Variant { field: T }) — straightforward extension once tuple-style is done - Nested patterns (
Some(Some(x))) — requires recursive pattern descent in sema - Or-patterns (
A | B =>) — independent extension to the pattern grammar - Pattern matching in let bindings for enums (consistent with struct let-destructuring)
if letsyntax (if let Some(x) = expr { ... }) — syntactic sugar overmatch- Range patterns for integers (
1..=5)
References
- ADR-0004: Enum Types — C-style enum foundation
- ADR-0036: Struct Destructuring and Partial Move Ban — Ownership model for destructuring
- ADR-0008: Affine Types and MVS — Ownership foundation
- ADR-0010: Destructors — Drop infrastructure
- ADR-0005: Preview Features — Feature gating system
- Rust Reference: Enum Types
- Rust Reference: Patterns
- Austral Language: Sum Types