ADR-0024: Type Intern Pool
Status
Implemented
Summary
Replace Gruel's current type representation (a Type enum with separate ID registries for structs, enums, and arrays) with a unified Type Intern Pool inspired by Zig's InternPool. All types become 32-bit indices into a canonical, thread-safe pool, enabling O(1) type equality, efficient memory usage, and clean parallel compilation.
Context
Current Architecture
Gruel currently has three separate ID systems for composite types:
| Type | ID Type | Storage | Creation Point |
|---|---|---|---|
| Structs | StructId(u32) | Vec<StructDef> in TypeContext | Declaration collection (pre-analysis) |
| Enums | EnumId(u32) | Vec<EnumDef> in TypeContext | Declaration collection (pre-analysis) |
| Arrays | ArrayTypeId(u32) | Vec<ArrayTypeDef> in FunctionAnalysisState | During function body analysis |
The Type enum is:
Problems
Dynamic array type creation: Arrays are created during type inference, requiring mutable state during what should be parallel function analysis. This led to
FunctionAnalysisStatewith per-function array registries that must be merged afterward (see gruel-wcg7).Inconsistent creation patterns: Structs/enums use one path, arrays use another. This asymmetry complicates the codebase.
Type comparison overhead: Comparing types requires matching on enum variants. For composite types, you then need to compare IDs, which requires knowing they came from the same registry.
Future generics: When we add
Vec<T>orOption<T>, we'll need to intern instantiated generic types likeVec<i32>vsVec<String>. The current architecture has no path to this.
What Zig Does
Zig uses an InternPool - a canonical, thread-safe, sharded hash table that deduplicates and interns all types and values:
- Types are represented as 32-bit indices
- The
Keyunion defines all possible type variants (includingArrayType { len, child, sentinel }) - Content-addressed deduplication: identical types share the same index
- Type equality is just
u32 == u32 - Thread-safe via sharded hash maps with atomic operations
Decision
Implement a unified TypeInternPool for all types in Gruel.
Core Design
/// Interned type index - 32 bits, Copy, cheap comparison.
///
/// Reserved indices 0-15 are primitives (no lookup needed).
/// Index 16+ are composite types stored in the pool.
;
/// Type data stored in the intern pool.
///
/// This is NOT Copy - it lives in the pool. You work with `Type` indices.
/// Thread-safe intern pool for all composite types.
Key Properties
O(1) type equality:
type_a == type_bis justu32 == u32Primitives are free: No lookup for
i32,bool, etc. - they're encoded in the index itselfStructural deduplication for arrays:
[i32; 5]interns to the sameTyperegardless of where it's createdNominal identity for structs/enums: Two structs with the same fields but different names are different types (as they should be)
Thread-safe: RwLock (or sharded locks) for concurrent access during parallel compilation
Self-contained: Arrays can reference other types via their
Typeindex, enabling nested types like[[i32; 3]; 4]
API
Migration Strategy
The key insight is that Type remains a small, Copy value - we're just changing its internal representation from an enum to an index. Most code that uses Type doesn't need to change semantically; it just needs mechanical updates.
Implementation Phases
Epic: gruel-igt6
Phase 1: Introduce TypeInternPool alongside existing system (gruel-3mjg)
Create the new TypeInternPool infrastructure without removing the old system. Both coexist temporarily.
- Create
gruel-air/src/intern_pool.rswithTypeInternPool,TypeData - Add
TypeInternPooltoSemaandSemaContext - Populate pool during declaration collection (structs, enums)
- Verify pool contents match existing registries (test coverage)
Ship criterion: All existing tests pass, pool is populated but not yet used.
Phase 2: Migrate array types to the pool (gruel-9e5t)
Replace ArrayTypeId and the per-function array type handling with pool interning.
- Replace
FunctionAnalysisState.array_typeswithTypeInternPool.intern_array() - Update
MergedAnalysisState- array merging becomes a no-op (pool handles dedup) - Update AIR instructions that reference
ArrayTypeIdto useType -
ArrayTypeIdnow wraps pool indices (kept for type safety)
Ship criterion: Arrays work, parallel function analysis uses shared pool, no per-function array registries.
Phase 3: Migrate struct/enum IDs to pool indices (gruel-ej3x)
Replace StructId and EnumId with Type indices directly.
- Change
Type::Struct(StructId)→ compositeTypeindex with tag encoding - Change
Type::Enum(EnumId)→ compositeTypeindex with tag encoding - Update all
StructId/EnumIdusages in AIR, CFG, codegen -
StructId/EnumIdnow wrap pool indices (kept for type safety)
Ship criterion: Type is now a u32 index. All lookups go through the pool.
Phase 4: Unify Type representation (gruel-wsny)
Replace the Type enum entirely with the Type(u32) newtype.
- Remove old
Typeenum variants - nowstruct Type(u32)with tag encoding - Update all pattern matches on
Typeto usekind()method returningTypeKind - Add helper methods to
Typefor common checks (is_integer(),is_signed(), etc.) - Optimize: inline primitive checks (no pool lookup for
Type::I32.is_integer())
Ship criterion: Single unified type representation. Clean API.
Phase 5: Performance optimization (gruel-ynk5, optional)
If profiling shows lock contention:
- Implement sharded locks (Zig uses 16 shards)
- Consider lock-free reads for the common case (append-only types array)
- Benchmark and tune
Ship criterion: No regression from current performance. Improvements for parallel compilation.
Note: Phase 5 is deferred until profiling shows a need. The current RwLock implementation works well for the current workload.
Consequences
Positive
Correctness: Single source of truth for types eliminates registry synchronization bugs
Performance:
- O(1) type comparison (u32 equality vs enum matching)
- Better cache locality (types are indices, pool is contiguous)
- Clean parallel compilation (no per-function merging for arrays)
Simplicity:
- One way to create and compare types
StructId,EnumId,ArrayTypeIdare now thin wrappers around pool indicesFunctionAnalysisStatebecomes much simpler (no array handling)
Future-ready:
- Clear path to generic type instantiation (
Vec<i32>) - Pointer types, function types, etc. fit naturally
- Foundation for incremental compilation (stable type hashes)
- Clear path to generic type instantiation (
Negative
Large refactor: ~4000-5000 lines across ~15-20 files
Indirection for type queries:
pool.get(ty)instead of direct enum match- Mitigated by helper methods and inlined primitive checks
Lock overhead for type creation: RwLock for concurrent access
- Mitigated by read-heavy workload (types created once, queried many times)
- Can shard if profiling shows contention
Learning curve: Contributors need to understand the interning pattern
Open Questions
Sharding from the start? Zig uses 16 shards. We could start with a single RwLock and shard later if needed. The API doesn't change.
Primitive representation: Reserved indices 0-15 vs. separate enum? Reserved indices are simpler and avoid branching.
Error handling for invalid Type indices: Panic (current approach) or Result? Panic is fine for compiler internals - an invalid Type index is always a bug.
Future Work
- Generic types:
Vec<T>instantiation will internVec<i32>etc. The pool design supports this naturally. - Pointer types:
&T,&mut Twould be interned composite types - Function types: For function pointers or closures
- Incremental compilation: Pool contents can be serialized/hashed for incremental cache keys