ADR-0020: Built-in Types as Synthetic Structs
Status
Implemented
Summary
Refactor built-in types like String from hardcoded Type enum variants into synthetic structs that are injected by the compiler before user code is analyzed. This removes ~50 scattered Type::String special cases across the compiler, centralizes built-in type metadata, and establishes an architecture that scales to future built-in types (Vec<T>, HashMap<K,V>, etc.).
Context
The Problem: Scattered Special-Casing
Today, String is a primitive variant in the Type enum:
// gruel-air/src/types.rs
Because String is "magic" (a built-in with heap semantics, 3-slot ABI, runtime methods), every compiler phase needs explicit Type::String checks. A grep shows ~50 locations:
| Phase | Location | What it does |
|---|---|---|
| Sema | sema.rs:4428 | Dispatches String::new(), methods |
| Sema | sema.rs:4864 | Disallows <, > on strings |
| Sema | sema.rs:4748 | Returns slot count (3) |
| Type inference | generate.rs:264 | Infers StringConst → String |
| CFG builder | build.rs:1682 | Marks String as needing drop |
| Codegen (x86) | cfg_lower.rs:1229 | Emits __gruel_str_eq call |
| Codegen (x86) | cfg_lower.rs:1371 | Handles 3-slot alloc |
| Codegen (arm) | cfg_lower.rs:946 | Same, duplicated |
| Drop glue | drop_glue.rs:47 | String needs drop |
| Drop glue | drop_glue.rs:85 | String has 3 slots |
Why This Doesn't Scale
If we wanted to add Vec<T>, HashMap<K,V>, or even &str, we'd need to:
- Add new
Typevariants - Hunt down every
matchonTypeand add new cases - Duplicate logic across both codegen backends (x86_64, aarch64)
- Handle special ABIs (multi-slot representations)
- Wire up runtime functions manually
How Zig and Rust Handle This
Zig: No built-in String type at all. Strings are []u8 (a compiler primitive slice). Growable strings are std.ArrayList(u8) — pure library code using comptime generics. The compiler only knows primitives, pointers, slices, and user-defined types.
Rust: Uses "lang items" — markers like #[lang = "owned_box"] that tell the compiler "this library type implements this language concept." The compiler knows about traits (Drop, Eq, Deref) but String and Vec<T> are plain library structs that use those traits. The compiler doesn't special-case them directly.
Gruel Today: Hard-codes String as a compiler primitive, requiring scattered special-case code everywhere.
The Insight
String isn't fundamentally different from a user-defined struct — it's just a struct whose methods are implemented in the runtime rather than generated from Gruel source. If the compiler sees it as "just a struct," we can unify the handling.
Decision
Core Idea: Synthetic Structs
Introduce the concept of synthetic structs: struct types that are injected by the compiler before user code is parsed, with methods that map to runtime functions rather than generated code.
From the type system's perspective, String becomes:
// Conceptually what the compiler "sees"
The Type enum loses its String variant:
Builtin Type Registry
Create a central registry that describes built-in types:
// New module: gruel-builtins or within gruel-air
/// Descriptor for a built-in type's properties
The String type is defined as:
pub static STRING_TYPE: BuiltinTypeDef = BuiltinTypeDef ;
pub static BUILTIN_TYPES: & = &;
StructDef Changes
Add a flag to identify synthetic structs:
Injection Point
During Sema::gather_declarations(), before processing user code:
Sema Changes
Replace Type::String checks with builtin struct queries:
// Before:
if ty == String
// After:
if self.is_builtin_type
// Or for slot counts:
Method dispatch becomes uniform:
// Before (sema.rs:4428):
if receiver_type == String
// After:
if let Struct = receiver_type
Codegen Changes
The codegen doesn't need to know about Type::String at all. It sees a struct with 3 fields and generates code accordingly. The only special handling is for runtime function calls:
// Before (cfg_lower.rs):
if lhs_ty == String
// After:
if let Some = self.get_builtin_operator
Drop Glue Changes
The existing drop glue system already handles structs with destructors. With is_builtin: true and destructor: Some("__gruel_drop_String"), the drop glue synthesizer will correctly generate calls to the runtime drop function.
// drop_glue.rs - no changes needed for String specifically!
// The existing code handles structs with destructors:
if let Some = &struct_def.destructor
StringConst Handling
String literals still need special handling because they create values from data in .rodata. The StringConst AIR instruction remains, but its type becomes the synthetic String struct:
// In sema.rs, when analyzing a string literal:
let string_idx = self.add_string;
let air_ref = self.air.add_inst;
Implementation Phases
Epic: gruel-c8lp
Phase 1: Builtin Registry Infrastructure
Issues: gruel-fgx3 (crate), gruel-cbsc (injection)
Goal: Create the builtin type registry without changing existing behavior.
Tasks:
- Create
gruel-builtinscrate - Define
BuiltinTypeDefand related types - Define
STRING_TYPEwith all current String operations - Add
is_builtinfield toStructDef - Add builtin injection to
Sema::gather_declarations() - Store the synthetic String's
StructIdfor later reference - Error if user defines type with reserved name
Verification: All existing tests pass. String is now also a synthetic struct (but Type::String still exists in parallel).
Phase 2: Migrate Sema
Issue: gruel-hp13
Goal: Replace Type::String checks in semantic analysis with struct-based queries.
Tasks:
- Add helper methods:
is_builtin_type(),get_builtin_operator(),get_builtin_method() - Migrate
analyze_type_name()to recognize String as a struct - Migrate associated function dispatch (
String::new,String::with_capacity) - Migrate method dispatch (
.len(),.push_str(), etc.) - Migrate operator restriction (no
<,>on strings) - Migrate slot counting to use struct fields
Verification: All spec tests and unit tests pass.
Phase 3: Migrate Codegen
Issues: gruel-s6mk (x86_64), gruel-tco7 (aarch64), gruel-5cfw (other)
Goal: Replace Type::String checks in both backends and remaining crates.
Tasks:
- Add
get_builtin_operator()lookup to codegen context - Migrate x86_64
cfg_lower.rs:Eq/Neoperators → runtime call lookupAllocfor strings → struct field handlingLoad/Storefor strings → struct handlingCallwith string args/returns → struct ABIDrop→ existing struct drop path
- Mirror all changes in aarch64
cfg_lower.rs - Migrate gruel-cfg, gruel-compiler/drop_glue, gruel-codegen/types
Verification: All tests pass on both architectures.
Phase 4: Remove Type::String
Issue: gruel-bmje
Goal: Delete the Type::String variant entirely.
Tasks:
- Remove
Type::Stringfrom the enum - Remove
Type::is_string()method - Fix any remaining compile errors (there should be none if phases 2-3 were thorough)
- Update type name formatting to use struct name
Verification: Compiler builds, all tests pass, Type::String no longer exists.
Phase 5: Documentation and Cleanup
Issue: gruel-n20l
Goal: Document the new architecture for future contributors.
Tasks:
- Add documentation to
gruel-builtinsexplaining how to add new built-in types - Update CLAUDE.md with builtin type information
- Remove any dead code from the migration
Consequences
Positive
- Scalability: Adding
Vec<T>becomes "add an entry toBUILTIN_TYPES" instead of editing 50 files - Consistency: Built-in types follow the same code paths as user-defined types
- Maintainability: Builtin type behavior is centralized in one registry
- Backend uniformity: Both x86_64 and aarch64 share the same builtin definitions
- Foundation for generics: When generics land,
Vec<T>follows the same pattern - Foundation for lang items: The registry is a stepping stone toward Rust-style lang items
Negative
- Initial complexity: Adding the registry infrastructure before removing
Type::String - Migration risk: Phased migration requires careful testing at each step
- Indirection: Looking up builtin properties is slightly more indirect than
match Type::String
Neutral
- No user-visible change: The language behaves identically
- Different from Zig: Zig has no built-in String; we still do, but it's internally a struct
- Similar to Rust's outcome: Rust's
Stringis also "just a struct" in the type system
Design Decisions
Where does the registry live? New
gruel-builtinscrate. This provides the cleanest separation of concerns and makes the builtin type definitions easy to find and modify.How do we handle String literals in inference? The
StringConstinstruction needs to know the String struct's ID. The registry will return theStructIdafter injection, and we'll store it in a well-known field (e.g.,Sema::builtin_string_id) for fast access.Should builtin structs be visible to users? Yes, for now. Users can technically construct
String { ptr: 0, len: 0, cap: 0 }. This is intentionally deferred — when the standard library and privacy/visibility rules land, those mechanisms will cleanly hide the internal fields. No need for a specialis_opaqueflag.How do we prevent users from defining their own
Stringtype? During declaration gathering, after injecting builtins, we check user-defined type names against the builtin registry. If a collision is found, emit an error: "cannot define typeString: name is reserved for built-in type".String literal optimization: String literals from
.rodatausecap: 0as a sentinel — the drop function checks this and skips freeing. This is the current behavior and remains correct.Builtin method error messages: Treat uniformly. When a user calls a non-existent method on String, the error message should be the same as for any other struct (e.g., "no method
fooon typeString"). No special "this is a built-in type" messaging.
Open Questions
None at this time.
Future Work
- Phase 2+ built-in types:
Vec<T>,HashMap<K,V>,Box<T>following the same pattern - Lang items: Evolve the registry toward trait-based lang items when traits land
- Opaque types: Prevent users from constructing built-in types directly
- Generic builtins: When generics land, extend the registry to support type parameters
References
- Zig Documentation - No built-in string type
- Zig ArrayList implementation
- Rust Lang Items
- Rust Tidbits: What Is a Lang Item?
- ADR-0010: Destructors — Drop glue infrastructure
- ADR-0014: Mutable Strings — Current String implementation