ADR-0030: Place Expressions for Memory Locations
Status
Implemented
Summary
Introduce a Place abstraction in Gruel's IR to represent memory locations (lvalues) as first-class values. This eliminates redundant Load instructions for array indexing and field access, fixing an asymmetry where read operations take loaded values while write operations take slots directly.
Context
The Problem
Currently, Gruel's IR conflates memory locations ("places") with values, leading to:
Asymmetric instruction signatures:
IndexGet { base: AirRef, ... }- takes a loaded array valueIndexSet { slot: u32, ... }- takes a slot directly- Same asymmetry exists for
FieldGetvsFieldSet
Redundant Load instructions: For
arr[0] + arr[1] + arr[2], the compiler generates:%1 = load $0 ; Load entire array %2 = index_get %1[0] %3 = load $0 ; Load entire array AGAIN %4 = index_get %3[1] %5 = load $0 ; Load entire array AGAIN %6 = index_get %5[2]Each
arr[i]access requires loading all array elements first.Workarounds for nested access:
ParamFieldSethas aninner_offsetfield to handlep.inner.xbecause there's no way to compose places.Performance impact: The
array_heavybenchmark generates 2x more code than comparable benchmarks due to redundant array loads.
Inspiration from Rust MIR
Rust's MIR uses a Place struct that cleanly separates memory locations from values:
A place like arr[i].field is represented as:
Place { local: arr, projection: [Index(i), Field(0)] }
Operations on places are explicit:
PlaceRead(place)- Load value from placePlaceWrite(place, value)- Store value to placeAddressOf(place)- Get pointer to place (for&)
Decision
Phase 1: Introduce Place Type in CFG
Add a Place type to gruel-cfg that represents memory locations:
/// A memory location that can be read from or written to.
Phase 2: Unify Read/Write Operations
Replace the asymmetric instruction pairs with unified place-based operations:
Before (6 instruction variants):
IndexGet
IndexSet
ParamIndexSet
FieldGet
FieldSet
ParamFieldSet
After (2 instruction variants):
/// Read a value from a memory location
PlaceRead
/// Write a value to a memory location
PlaceWrite
Phase 3: Update Codegen
The codegen trace_index_chain and trace_field_chain functions become trivial - they just walk the Place::projections list to compute the final memory offset.
Example: arr[i][j] codegen
// Place representation:
Place
// Codegen walks projections to compute:
// offset = slot_offset(0) + i * 3 * 8 + j * 8
IR Examples
Simple array read: arr[i]
Before:
%1 = load $0 ; Load all elements of arr
%2 = index_get %1[i] ; Then index
After:
%1 = place_read Place { Local(0), [Index(arr_type, i)] }
Nested field access: point.inner.x
Before:
%1 = load $0 ; Load point
%2 = field_get %1.inner ; Get inner struct (intermediate value)
%3 = field_get %2.x ; Get x field
After:
%1 = place_read Place { Local(0), [Field(Point, 0), Field(Inner, 0)] }
Array element field: arr[i].x
Before:
%1 = load $0 ; Load all array elements
%2 = index_get %1[i] ; Get element (intermediate value)
%3 = field_get %2.x ; Get x field
After:
%1 = place_read Place { Local(0), [Index(arr_type, i), Field(Point, 0)] }
AIR Changes
AIR will also adopt the Place abstraction, but can use a simpler representation since types are already resolved:
/// Reference to a place in AIR - stored as index into places array
;
/// A memory location
Implementation Phases
- Phase 1: Add Place type to CFG - Define
Place,PlaceBase,Projectioningruel-cfg - Phase 2: Add PlaceRead/PlaceWrite instructions - Add new CFG instruction variants alongside existing ones
- Phase 3: Update CFG builder - Generate Place-based instructions for simple cases
- Phase 4: Update x86_64 codegen - Emit efficient code for Place-based instructions
- Phase 5: Update aarch64 codegen - Same for ARM64 backend
- Phase 6: Migrate remaining cases - Handle all array/field operations via places
- Phase 7: Remove old instructions - Delete IndexGet/FieldGet with base values
- Phase 8: Apply same changes to AIR - Propagate place abstraction to AIR level
- Added
AirPlace,AirPlaceBase,AirProjection,AirPlaceReftypes togruel-air - Added
AirInstData::PlaceReadandAirInstData::PlaceWriteinstruction variants - Added
Air::make_place,Air::get_place,Air::get_place_projectionshelper methods - Updated CFG builder to lower
AirInstData::PlaceRead→CfgInstData::PlaceRead - Updated CFG builder to lower
AirInstData::PlaceWrite→CfgInstData::PlaceWrite - Added
PlaceTraceandtry_trace_placeto trace RIR expressions to places - Updated
analyze_field_getandanalyze_index_getto emitPlaceReadfor place expressions - Updated
analyze_field_set_implandanalyze_index_set_implto emitPlaceWrite - Old instructions (
FieldGet,IndexGet, etc.) remain as fallback for computed bases (e.g.,get_struct().fieldwhere base is a function call result)
- Added
Consequences
Positive
- Eliminates redundant loads: Array access
arr[0] + arr[1] + arr[2]generates 3 loads instead of 9 - Unified instruction set: 2 instructions instead of 6 for place operations
- Simpler codegen: No need for
trace_index_chain/trace_field_chain- just walk projections - Smaller IR: Fewer intermediate Load instructions
- Better optimization opportunities: Places can be analyzed for aliasing, CSE, etc.
- Future-proof: Easy to add
Derefprojection for pointers
Negative
- Larger change: Touches AIR, CFG, and both codegen backends
- Migration complexity: Need to handle both old and new instructions during transition
- Place storage: Projections stored in vectors/arenas (memory overhead, but small)
Open Questions
Should RIR also use places? Currently RIR is untyped, so places would need type information added. We should check what other compilers do here - likely construct places during AIR lowering since that's when we have full type information.
How to handle computed array bases? For
get_array()[i], the array isn't in a local.Decision: Spill to temporary local, then use place. This is simple, correct, and matches what optimizing compilers do anyway.
Performance Considerations
Rust's Approach
According to the rustc documentation, Rust's Place is 16 bytes:
local: Local(4 bytes, au32newtype)projection: &'tcx List<PlaceElem>(8 bytes, interned reference)- 4 bytes padding
Projections are interned in rustc, meaning:
- Common projection sequences are deduplicated
- Equality checks are pointer comparisons (O(1))
- Memory is arena-allocated (fast allocation, batch deallocation)
See rustc memory management for details.
Gruel's Approach
Gruel already uses an extra: Vec<CfgValue> pattern for variable-length data (struct fields, array elements). We'll extend this for projections:
/// A place is 12 bytes (fits in 2 cache lines with instruction data)
// Stored in Cfg::projections: Vec<Projection>
Why Not Intern?
For Gruel's current scale, interning adds complexity without clear benefit:
- Most projection chains are short (1-3 elements for
arr[i].field) - Duplication is limited - each function has its own CFG
- Equality checks are rare - we mostly iterate projections, not compare them
If profiling shows projection storage is a bottleneck, we can add interning later.
Comparison: Before vs After
Before (current):
IndexGet { base: CfgValue, array_type: Type, index: CfgValue }
// 24 bytes per instruction, plus separate Load instruction (another 16+ bytes)
// For arr[i][j]: 2 Load + 2 IndexGet = ~80 bytes
After (proposed):
PlaceRead { place: Place }
// 12 bytes for Place + projection storage
// For arr[i][j]: 1 PlaceRead + 2 projections = ~44 bytes
Net effect: Smaller IR, fewer instructions, faster compilation.
Future Work
- Deref projection: When pointers/references are added,
Placecan represent*ptr.field - Slice projections: Subslice operations like
arr[1..3] - Alias analysis: Place representation enables tracking which places may alias
- Place-based optimizations: Dead store elimination, load forwarding
References
- Rust MIR Place struct
- Rust MIR ProjectionElem
- Rust Compiler Dev Guide - MIR
- Benchmark showing array_heavy generating 2x more code than other benchmarks