String / Vec(u8) Relationship
This section documents String as a newtype wrapper over Vec(u8) per ADR-0072.
Newtype Definition
String is a synthetic struct injected by the compiler. Conceptually:
synthetic struct String {
bytes: Vec(u8) // private
}
The bytes field is private: outside of String's own methods, sema rejects any field-access or assignment that names bytes on a String value with a "private field" diagnostic. Public access goes through the conversion API (§7.4:5–7) and the method surface inherited by composition.
The runtime layout is identical to Vec(u8) — a single { ptr, len, cap } aggregate (24 bytes on 64-bit targets, 8-byte aligned). String is affine; drop runs the contained Vec(u8)'s drop.
UTF-8 Invariant
Every well-formed String value upholds the invariant:
The bytes in
self.bytes[0..self.bytes.len()]form a valid UTF-8 sequence.
The invariant is established at construction time:
String::new()andString::with_capacity(n)produce empty buffers, which are trivially valid UTF-8.- String literals are UTF-8 by source-file enforcement (§2.1).
String::from_utf8(v)validatesv's contents at runtime and only yieldsOkwhen validation succeeds.String::push(c: char)andString::from_char(c)encode a Unicode scalar value into UTF-8 by construction (§3.x).- The
checkedconstructors (from_utf8_unchecked,push_byte,from_c_str_unchecked) shift the obligation to the caller.
Methods that mutate the buffer (push_str, concat, clear, reserve, clone, push, push_byte) preserve the invariant by appending only valid-UTF-8 byte sequences to an already-valid buffer (push_byte is the documented exception, see §7.4:8).
Method Surface
String's method surface is defined by composition over the inner Vec(u8):
| Method | Effect |
|---|---|
String::new() -> String | Empty String. |
String::with_capacity(n) -> String | Empty String with cap >= n. |
s.bytes_len() -> usize | Byte count (not codepoint count). |
s.bytes_capacity() -> usize | Byte capacity. |
s.is_empty() -> bool | bytes_len() == 0. |
s.clone() -> String | Deep copy of the inner buffer. |
s.contains(needle: String) -> bool | Byte-substring search. |
s.starts_with(prefix: String) -> bool | Byte-prefix check. |
s.ends_with(suffix: String) -> bool | Byte-suffix check. |
s.concat(other: String) -> String | Allocate len(self)+len(other) bytes; copy both. |
s.push_str(other: String) -> Self | Append other's bytes in place. |
s.clear() -> Self | Set len = 0; cap preserved. |
s.reserve(n: usize) -> Self | Ensure cap >= len + n. |
Equality (==, !=) and ordering (<, <=, >, >=) on String operate on the inner Vec(u8) lexicographically.
The legacy s.len() and s.capacity() accessors remain available as synonyms for bytes_len and bytes_capacity. Future chars_len will provide codepoint counting once iterators land.
Vec(u8) Method Additions
A future revision of Vec(T) (ADR-0066) is expected to add substring/element-search and bulk-append helpers — contains, starts_with, ends_with, concat, extend_from_slice — that String's composition surface can delegate to once that work lands. These are independent Vec(T) improvements; they are not part of this ADR's user-visible surface and do not affect the String-level methods in §7.4:3.
Conversions: String → Vec(u8)
String::into_bytes(self) -> Vec(u8) consumes the String and yields the underlying Vec(u8) in O(1). It is a struct-field move with no allocation, no copy, and no validation cost.
fn main() -> i32 {
let s = String::from_char('A');
let v: Vec(u8) = s.into_bytes();
v.len() as i32
}
Conversions: Vec(u8) → String validated
String::from_utf8(v: Vec(u8)) -> Result(String, Vec(u8)) performs an O(n) UTF-8 scan over v's live [0..len] range. On success it returns Result::Ok(s) with s adopting v's buffer (no copy). On failure it returns Result::Err(v) and the buffer is handed back unchanged so the caller may inspect, retry, or report without a defensive clone.
The Vec(u8).into_string(self) -> Result(String, Vec(u8)) method is a sugar synonym for String::from_utf8(self).
Conversions: Vec(u8) → String trusted
Inside a checked block:
String::from_utf8_unchecked(v: Vec(u8)) -> Stringconstructs aStringwithvas the byte buffer in O(1) without validation.Vec(u8).into_string_unchecked(self) -> Stringis the method-call sugar.
The caller is obligated to uphold the UTF-8 invariant. Constructing an ill-formed String via these APIs is undefined behavior — subsequent calls that rely on the invariant (codepoint iteration, slicing) may exhibit arbitrary behavior.
Mutation: push and push_byte
Two mutators write bytes into a String:
s.push(c: char) -> Self— safe; encodescto UTF-8 (1–4 bytes, per §3.x forchar) and appends those bytes to the buffer. The invariant is preserved by construction.s.push_byte(b: u8) -> Self— only callable inside acheckedblock. Appends a single raw byte. The caller is obligated to preserve the UTF-8 invariant; the compiler does not validate.
The legacy String::push(byte: u8) is renamed to push_byte and gated to checked. The new push(c: char) becomes the primary codepoint-aware mutator.
fn main() -> i32 {
let mut s = String::new();
s.push('H'); // 1 byte
s.push('é'); // 2 bytes
s.push('🦀'); // 4 bytes
s.bytes_len() as i32 // 7
}
C Interop
Inside a checked block:
s.terminated_ptr() -> Ptr(u8)— ensurescap > len, writes a NUL byte atptr[len], and returns the buffer pointer suitable for passing to a C function expecting a NUL-terminated string. The sentinel sits outside the live[0..len]range and is overwritten by the next mutating call. Delegates toVec(u8)::terminated_ptr(0u8).String::from_c_str(p: Ptr(u8)) -> Result(String, Vec(u8))— computesstrlen(p), allocates aVec(u8)of that size, copies the bytes, then forwards tofrom_utf8.String::from_c_str_unchecked(p: Ptr(u8)) -> String— same copy, then forwards tofrom_utf8_unchecked.
checked {
fn write_label(s: String) {
let mut s = s;
let p = s.terminated_ptr();
// pass `p` to a C function expecting `const char *`...
}
}
from_c_str and from_c_str_unchecked always copy; Gruel cannot adopt foreign-allocated buffers because it does not know the allocator.
Privacy of Synthetic Fields
Synthetic built-in struct fields may carry a private flag. Access to a private field outside of the type's own methods is a compile-time error with diagnostic kind "private field". The mechanism is narrow — it exists to hide internal state of synthetic builtins (currently only String::bytes) and is replaced by the general visibility / module system when that lands. User-defined structs are unaffected; their fields are public per §6.2.
Examples
Round-trip a String through Vec(u8):
fn main() -> i32 {
let s = String::from_char('Z');
let v = s.into_bytes();
match String::from_utf8(v) {
Result::Ok(s2) => s2.bytes_len() as i32,
Result::Err(_) => -1,
}
}
Reject invalid UTF-8:
fn main() -> i32 {
checked {
let mut v: Vec(u8) = Vec(u8)::with_capacity(2);
v.push(0xFFu8);
v.push(0xFEu8);
match String::from_utf8(v) {
Result::Ok(_) => 0,
Result::Err(_) => 1, // expected
}
}
}