ADR-0014: Mutable Strings
Status
Implemented
Summary
Add a mutable String type that owns a heap-allocated byte buffer. String literals create immutable views into read-only data; mutation methods promote data to the heap. The type is affine (move semantics), uses a destructor for cleanup, and enables common string operations like appending and building strings dynamically.
Context
Current String Situation
Gruel currently has string literals but they're severely limited:
let s = "hello"; // Type: String (but immutable, points to rodata)
@dbg_str(s); // Can print
if s == "hello" { } // Can compare
// That's it - no appending, no building, no mutation
The current String type is a fat pointer (ptr + len) pointing into read-only memory. There's no way to build strings dynamically or modify them.
What We Need
For a practical programming language, we need:
- String building: Construct strings from parts
- Appending:
s.push_str("more") - Dynamic content: Build from user input or computation
Design Philosophy: Mutable Value Semantics
Following ADR-0008 (Affine Types and MVS), strings should be values:
- No aliasing: When you have a String, you own it exclusively
- Affine: Used at most once, then moved or dropped
- Mutation via ownership: You can mutate what you own
- Explicit duplication: Use
.clone()if you need a copy
This differs from languages with garbage collection where strings are often immutable and cheap to share (Java, Python) or reference-counted (Swift). In Gruel, a String is like Rust's String: an owned, mutable buffer.
Byte String Semantics (like bstr/Go)
Gruel strings are conventionally UTF-8 rather than strictly validated:
- String literals are valid UTF-8 (validated at compile time)
- At runtime, strings are byte sequences
- Methods like
push_straccept any bytes - Display/debug may show invalid UTF-8 as replacement characters or escapes
- No runtime UTF-8 validation overhead
This matches Go's string and Rust's bstr crate: UTF-8 is the convention, but the type doesn't enforce it at runtime.
Dependencies
This feature builds on infrastructure that is already implemented:
- ADR-0008 (Affine Types): Strings are affine, move on use ✓
- ADR-0009 (Struct Methods): Methods on String ✓
- ADR-0010 (Destructors): String destructor frees the buffer ✓
- ADR-0011 (Runtime Heap):
__gruel_alloc/__gruel_freefor buffer management ✓ - ADR-0013 (Borrowing Modes):
borrow selffor query methods ✓
Decision
String Representation
// Conceptually, though users don't see the internals:
struct String {
ptr: *mut u8, // Pointer to data (heap or rodata)
len: u64, // Current length in bytes
cap: u64, // Allocated capacity (0 for rodata strings)
}
Key insight: cap == 0 indicates a string literal (points to rodata). Any mutation requires heap promotion first.
Size: 24 bytes (3 × 8-byte fields).
Literal vs Heap Strings
There is only one String type. Literals simply have cap=0:
let s = "hello"; // cap=0, ptr points to rodata, len=5
var t = "world"; // cap=0, ptr points to rodata, len=5
t.push_str("!"); // push_str promotes to heap, then appends
// t is now "world!" with cap>0
String literals are cheap (no allocation). Mutation methods check cap and promote to heap automatically:
- Allocate a heap buffer with room to grow
- Copy the existing content (from rodata or old heap)
- Update ptr/cap
- Perform the mutation
This is transparent to the user - there's no "owned" vs "borrowed" distinction.
Type System
String is:
- Not
@copy: Copying a String would require allocating and copying the buffer - Affine: Consumed on use, cannot be used twice without explicit clone
- Has destructor: Frees heap buffer when dropped (if cap > 0)
fn takes_string(s: String) { ... }
fn main() -> i32 {
var s = "hello";
takes_string(s); // s is moved
// takes_string(s); // ERROR: use of moved value
0
}
Core Operations
All operations are methods via impl String:
Construction
impl String {
// Empty string (no allocation until first push)
fn new() -> String { ... }
// Pre-allocate capacity
fn with_capacity(cap: u64) -> String { ... }
}
// Usage:
let s = String::new();
let s = String::with_capacity(1024);
Query Methods
impl String {
// Length in bytes
fn len(borrow self) -> u64 { ... }
// Allocated capacity
fn capacity(borrow self) -> u64 { ... }
// Is empty?
fn is_empty(borrow self) -> bool { ... }
}
These borrow self (read-only access) so the string remains valid after the call.
Mutation Methods
impl String {
// Append bytes from another string
fn push_str(inout self, other: String) { ... }
// Append a single byte
fn push(inout self, byte: u8) { ... }
// Clear contents (keep capacity)
fn clear(inout self) { ... }
// Ensure at least `additional` more bytes can be appended
fn reserve(inout self, additional: u64) { ... }
}
// Usage:
var s = String::new();
s.push_str("hello");
s.push_str(" world");
s.push(33); // '!'
// s is now "hello world!"
Clone
impl String {
// Deep copy
fn clone(borrow self) -> String { ... }
}
// Usage:
let a = "hello";
let b = a.clone(); // Deep copy (allocates new heap buffer)
// Both a and b are valid, independent strings
Clone borrows self so the original remains valid. It's explicit (not implicit like @copy) because it allocates.
Destructor
When a String goes out of scope:
fn example() {
var s = "hello";
s.push_str("!"); // Promotes to heap
} // __gruel_drop_String called: frees heap buffer
Destructor logic:
Growth Strategy
When appending exceeds capacity:
- Allocate new buffer (2x current capacity, minimum 16 bytes)
- Copy existing content
- Free old buffer (if heap)
- Update ptr/cap
This amortizes allocation cost over many appends.
Comparison Semantics
String comparison (==, !=) uses an optimized algorithm:
- Pointer equality fast path: If both strings have the same
ptrandlen, they're equal (same memory) - Length check: If lengths differ, strings are not equal
- Byte-by-byte comparison: Otherwise, compare contents
This optimization is significant for literal comparisons:
let s = "hello";
if s == "hello" { // Same rodata pointer - fast path!
// ...
}
The runtime function __gruel_str_eq implements this logic.
Rodata to Heap Promotion
When a mutation method is called on a rodata string (cap == 0, but ptr is non-null):
let s = "hello"; // rodata: cap=0, len=5
var t = s; // Still rodata (no copy needed for move)
t.push_str("!"); // Promotes to heap: allocate, copy, then append
The promotion happens transparently inside mutation methods.
Implementation Phases
Epic: gruel-0hef
Note: As of 2025-12-27, mutable strings are now stable. All phases are complete.
Phase 1: Specification and Feature Gate - gruel-0hef.1 (COMPLETE)
Write the specification first:
- Add
MutableStringstoPreviewFeatureenum ingruel-error - Add spec section 3.10 for mutable strings
- Define String representation (ptr, len, cap)
- Define all method signatures and semantics
- Add spec tests with
preview = "mutable_strings"(these will fail initially)
Testable: Spec tests exist and are ignored (preview feature not yet implemented).
Phase 2: Three-Field String Representation - gruel-0hef.2 (COMPLETE)
Extend String from 2 fields (ptr, len) to 3 fields (ptr, len, cap):
- Update AIR String type to carry capacity
- Update codegen for 24-byte String type
- Literals have cap=0
- Gate new behavior behind preview feature
Testable: Existing string tests still pass with new representation.
Phase 3: Runtime String Functions - gruel-0hef.3 (COMPLETE)
Add string-specific functions to runtime:
-
__gruel_string_alloc(cap: u64) -> *mut u8- allocate buffer (min 16 bytes) -
__gruel_string_realloc(ptr: *mut u8, old_cap: u64, new_cap: u64) -> *mut u8- grow buffer -
__gruel_string_clone(ptr: *const u8, len: u64) -> *mut u8- deep copy -
__gruel_drop_String(ptr: *mut u8, len: u64, cap: u64)- free if heap (cap > 0) - Growth strategy implementation (2x, min 16)
Testable: Unit tests in Rust for allocation/reallocation.
Phase 4: String Destructor Integration - gruel-0hef.4 (COMPLETE)
Wire String type to use destructors:
- Mark String as
needs_dropin type system - Drop elaboration inserts Drop instructions for String
- Codegen calls
__gruel_drop_String
Testable: Valgrind-clean string allocation and deallocation.
Phase 5: Construction Methods - gruel-0hef.5 (COMPLETE)
Add impl String with construction (gated):
-
String::new() -> String- empty string -
String::with_capacity(cap: u64) -> String- pre-allocated
Testable: Create strings, verify cap is set correctly.
Phase 6: Query Methods - gruel-0hef.6 (COMPLETE)
Add query methods (gated):
-
fn len(borrow self) -> u64 -
fn capacity(borrow self) -> u64 -
fn is_empty(borrow self) -> bool
Testable: Query methods return correct values.
Phase 7: Mutation Methods - gruel-0hef.7 (COMPLETE)
Add mutation methods with inout self (gated):
-
fn push_str(inout self, other: String) -
fn push(inout self, byte: u8) -
fn clear(inout self) -
fn reserve(inout self, additional: u64) - Automatic rodata-to-heap promotion
- Automatic grow-on-append
Testable: Build strings through multiple appends.
Phase 8: Clone Method - gruel-0hef.8 (COMPLETE)
Add explicit cloning (gated):
-
fn clone(borrow self) -> String - Deep copy via
String__cloneruntime function
Testable: Clone a string, verify both are independent.
Phase 9: Equality Optimization - gruel-0hef.9 (COMPLETE)
Optimize string comparison:
- If both strings have same ptr and len, return true immediately (pointer equality)
- Otherwise fall back to byte-by-byte comparison
- Update
__gruel_str_eqruntime function
Testable: Comparing a literal to itself uses fast path.
Consequences
Positive
- Dynamic strings: Can finally build and manipulate strings at runtime
- Memory safe: Destructor ensures no leaks
- Predictable: Affine semantics mean clear ownership
- Efficient literals: String literals remain cheap (no allocation)
- No validation overhead: Byte string semantics avoid runtime UTF-8 checks
- Go/bstr compatible: Familiar model for Go developers
Negative
- Allocation cost: Mutations may allocate
- No implicit sharing: Can't cheaply pass string to multiple functions
- Clone required: Must explicitly copy if keeping original
- Larger type: 24 bytes vs 16 bytes for the fat pointer
Neutral
- Conventionally UTF-8: Not enforced, but expected
- No indexing: Can't access individual bytes/chars (for now)
Open Questions
None remaining.
Future Work
- String slices: Non-owning views into strings (requires lifetime-lite or separate type)
- Formatting:
format!()macro or similar - String interpolation:
"hello \(name)"syntax - Byte access: Indexing into string bytes
- UTF-8 iteration: Iterator over codepoints when needed
- Pattern matching:
match s { "foo" => ..., _ => ... }
References
- ADR-0008: Affine Types and MVS - Ownership model
- ADR-0009: Struct Methods - Method syntax
- ADR-0010: Destructors - Cleanup mechanism
- ADR-0011: Runtime Heap - Allocation support
- ADR-0013: Borrowing Modes -
borrow selffor query methods - Rust String - Similar design
- bstr crate - Byte string semantics
- Go strings - Conventionally UTF-8 approach