ADR-0023: Multi-File Compilation
Status
Implemented
Summary
Enable the Gruel compiler to accept multiple source files and compile them into a single executable. This is a foundational capability that unblocks real-world programs that don't fit in a single file, and lays groundwork for a future module system.
Context
The Problem: Single-File Limitation
Today, the Gruel compiler accepts exactly one source file:
This works for learning and small programs, but becomes limiting quickly:
- Large programs become unwieldy in a single file
- No way to share code between programs (copy-paste only)
- Can't incrementally build real projects
- Blocks progress toward a module system and standard library
What We Need
A minimal multi-file compilation model that:
- Accepts multiple
.gruelfiles on the command line - Compiles each file independently
- Links them together into a single executable
- Handles cross-file function calls
What We Explicitly Defer
This ADR does not address:
- Module syntax (
mod,use,pub) — future work (TBD) - Visibility/privacy — all symbols are public for now
- Namespacing — all symbols share a flat global namespace
- Incremental compilation — we rebuild everything each time
- Build system integration — no
gruel.tomlor package manifest
Why Flat Namespace First?
A flat namespace (all functions globally visible, no mod/use) is simpler to implement and provides immediate value:
- Low implementation cost — no parser changes, minimal sema changes
- Immediately useful — users can split large programs today
- Tests the linker — exercises cross-file symbol resolution
- Foundation for modules — the plumbing we build here (multi-file parsing, symbol merging, cross-file linking) is reused when modules land
The UX is admittedly awkward (gruel a.gruel b.gruel c.gruel -o out), but this is a stepping stone, not the final design.
Decision
CLI Interface
# Single file (unchanged)
# Multiple files (new)
# Glob patterns via shell expansion
# Explicit output required with multiple inputs
The -o flag becomes required when multiple source files are provided, to avoid ambiguity about which positional argument is the output.
Compilation Model
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ main.gruel │ │ utils.gruel │ │ math.gruel │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Lexer │ │ Lexer │ │ Lexer │
│ Parser │ │ Parser │ │ Parser │
│ AstGen │ │ AstGen │ │ AstGen │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└───────────────────┼───────────────────┘
│
▼
┌─────────────────────┐
│ Symbol Merging │
│ (global scope) │
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ Sema (all files) │
│ Cross-file calls │
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ CFG Construction │
│ (parallel) │
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ Codegen + Link │
└──────────┬──────────┘
│
▼
┌───────────┐
│ Executable│
└───────────┘
Key insight: Parse files independently (parallelizable), then merge symbols into a unified scope for semantic analysis.
Implementation Details
1. CLI Changes (gruel/src/main.rs)
Argument parsing:
- Collect all non-option arguments as potential source files
- If multiple sources and no
-o, error - If single source and no
-o, usea.out(unchanged behavior)
2. CompileOptions Changes (gruel-compiler/src/lib.rs)
/// Input to the compiler - either single source or multiple files.
3. Frontend Changes
Parallel parsing (one thread per file):
Symbol merging after parsing:
Duplicate detection:
- Same function name in two files → error with both locations
- Same struct name in two files → error with both locations
- Same enum name in two files → error with both locations
4. Sema Changes
Currently, Sema::new() takes a single &Rir. We need to support merged RIR from multiple files:
5. Error Reporting
Errors must include the source file path:
error[E0001]: type mismatch
--> utils.gruel:15:12
|
15 | return "hello";
| ^^^^^^^ expected i32, found String
error[E0002]: duplicate function definition
--> math.gruel:5:1
|
5 | fn add(a: i32, b: i32) -> i32 { a + b }
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
note: first defined here
--> utils.gruel:10:1
|
10 | fn add(x: i32, y: i32) -> i32 { x + y }
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The DiagnosticFormatter already supports source file names; we need to ensure each error carries the correct file context.
6. Linking
The linker already supports multiple object files — one per function. Multi-file compilation just means more functions come from different source files, which the linker handles transparently.
Backwards Compatibility
Single-file compilation remains unchanged:
The new multi-file mode is additive.
Entry Point
The main() function must exist in exactly one of the input files:
- No
main()in any file → error: "no main function found" main()in multiple files → error: "duplicate function definition: main"
Implementation Phases
Phase 1: CLI and Input Handling
Goal: Accept multiple source files, read them all, but still process only the first one.
Tasks:
- Update
Optionsto holdVec<String>for source paths - Add
-oflag requirement for multiple inputs - Update argument parsing tests
- Read all source files into memory
Verification: gruel a.gruel b.gruel -o out reads both files but only compiles a.gruel.
Phase 2: Parallel Parsing
Goal: Parse all files in parallel, producing separate ASTs.
Tasks:
- Add
SourceFileandParsedFiletypes - Implement
parse_all_files()with Rayon - Merge string interners from all files
- Error if any file fails to parse
Verification: Parsing errors from any file are reported with correct file paths.
Phase 3: Symbol Merging
Goal: Merge declarations from all files into a unified global scope.
Tasks:
- Implement
merge_symbols()function - Detect and report duplicate definitions
- Build merged RIR for semantic analysis
- Update error messages to show both locations for duplicates
Verification: Duplicate function names produce clear errors with both file locations.
Phase 4: Cross-File Semantic Analysis
Goal: Functions in one file can call functions in another.
Tasks:
- Update
Semato work with merged program - Ensure cross-file function calls resolve correctly
- Struct and enum types visible across files
- Update tests
Verification: main.gruel can call helper() defined in utils.gruel.
Phase 5: Documentation and Polish ✓
Goal: Document the feature and ensure good UX.
Tasks:
- Update CLAUDE.md with multi-file examples
- Add
--helptext for multiple inputs - Update
--emitmodes to label output by source file - Performance testing with many files (10+, 50+ files)
Consequences
Positive
- Real programs possible: Users can organize code across files
- Foundation for modules: Parsing, merging, linking all exercised
- Parallel parsing: Multiple files parse simultaneously
- Incremental progress: Ship value before full module system
Negative
- Flat namespace: All symbols globally visible (no privacy)
- Manual file listing: Users must list all files explicitly
- No incremental builds: Recompile everything on each change
- Symbol collisions: Easy to accidentally have name conflicts
Neutral
- Stepping stone: This is explicitly a transitional design
- UX will improve: A future module system will provide better ergonomics
- Tests as validation: Spec tests can use multiple files once modules land
Design Decisions
1. Why require -o for multiple files?
Without it, gruel a.gruel b.gruel is ambiguous — is b.gruel the output or a second source file? Requiring -o makes intent explicit.
2. Why not auto-discover files?
Some languages (Go) discover files automatically from a directory. We chose explicit listing because:
- Simpler implementation
- No need for "which files are part of this project?" logic
- Build systems can generate file lists
- Aligns with how C/Rust compilers work
3. Why merge at RIR level?
We could merge at AST level or later. RIR is the right boundary because:
- ASTs are per-file naturally (parser doesn't need changes)
- RIR represents "program items" that can be combined
- Sema expects a program-level view, not file-level
4. How do we handle the string interner?
Each file gets its own interner during parsing (since ThreadedRodeo is thread-safe for insertion but we want parallel parsing without contention). After parsing, we merge into a single interner that's used for sema and codegen.
5. What about --emit modes?
The --emit modes (tokens, ast, air, etc.) work per-file in multi-file mode:
# Outputs:
# === AST (a.gruel) ===
# ...
# === AST (b.gruel) ===
# ...
This is useful for debugging which file contributed what.
Open Questions
None at this time.
Future Work
- Module system: Adds
mod,use,pubsyntax (future ADR) - Visibility: Private-by-default, explicit
pubfor exports - Incremental compilation: Rebuild only changed files
- Build system:
gruel.tomlor similar for project definition - Parallel sema: Currently sema is single-threaded; could parallelize per-function
References
- ADR-0020: Built-in Types as Synthetic Structs — Related type system work
- Current CLI implementation:
crates/gruel/src/main.rs - Current compiler driver:
crates/gruel-compiler/src/lib.rs