ADR-0015: Test Suite Optimization

Status

Implemented

Summary

Optimize the Gruel test suite for faster development iteration by: (1) adding parameterized test support to consolidate redundant spec tests, (2) enhancing unit test coverage for development-time feedback, and (3) adding an integration unit test layer that tests the compilation pipeline without execution.

Context

The current test suite is comprehensive but has grown to a size that impacts development velocity:

Test Type	Count	Purpose
Unit tests	646	Fast, focused component tests
Spec tests	1,083	Full integration tests with spec traceability
UI tests	39	Compiler diagnostic behavior

Problems Identified

Spec Test Redundancy: Analysis shows ~60-70% of spec tests could be consolidated:
- integers.toml: 95 cases with 8 nearly identical patterns per type (8 types × N operations)
- arithmetic.toml: 67 cases, 5+ tests covering the same spec paragraph
- let.toml: 54 cases with excessive shadowing/naming variants
- functions.toml: 27 inout tests that are structural duplicates
Development Workflow Gap: Developers must run the full spec suite to verify changes, when faster unit tests could catch most issues.
Unit Test Coverage Gaps: Some areas have limited unit tests:
- RIR generation (13 tests)
- CFG construction (21 tests)
- No "integration unit tests" that test parse→codegen without execution

Current Test Overlap

Many language features are tested at multiple levels:

Lexer: 18 unit tests + ~100 spec tests for tokens
Parser: 38 unit tests + extensive spec coverage
Type system: 143 unit tests + 281 spec type cases
Codegen: 200 unit tests + arithmetic/control flow spec tests

Decision

1. Parameterized Test Support

Phase 1: Multi-Process Expansion (Implemented)

The initial implementation uses a simple template expansion approach where each parameter set generates a separate test case that runs in its own process:

[[case]]
name = "{type}_return"
spec = ["3.1:1"]
params = [
  { type = "i8", value = "42", exit_code = 42, spec_extra = ["3.1:2"] },
  { type = "i16", value = "100", exit_code = 100, spec_extra = ["3.1:3"] },
  { type = "i32", value = "42", exit_code = 42, spec_extra = ["3.1:4"] },
  { type = "u8", value = "42", exit_code = 42, spec_extra = ["3.1:9"] },
]
source = "fn main() -> {type} { {value} }"

Template syntax: {param_name} placeholders are replaced with parameter values.

Field overrides: Parameters can override case fields like exit_code, compile_fail, skip, etc.

Spec merging: spec_extra in params is appended to the base spec array.

This approach:

Reduces test file verbosity significantly (8 cases → 1 definition)
Maintains spec traceability per variant via spec_extra
Works with existing test infrastructure (no changes to test execution)
Each variant still runs as a separate process

Future: Single-Process Execution (Phase 2+)

For maximum performance, a future enhancement could generate a single program that tests all variants internally:

[[case]]
name = "integer_return"
spec = ["3.1:1", "3.1:2", "3.1:3", "3.1:4", "3.1:8", "3.1:9", "3.1:10", "3.1:11"]
parameters = [
    { type = "i8", value = "42", expected = "42" },
    { type = "i16", value = "100", expected = "100" },
    # ...
]
test_fn = """
fn test_${type}() -> ${type} { ${value} }
"""
expected_fn = "${expected}"

This would generate a single program with all test functions and a harness, eliminating per-variant process overhead.

2. Consolidate Spec Tests

Target consolidation in high-duplication files while maintaining 100% spec paragraph coverage:

File	Current	Target	Strategy
`integers.toml`	95	~30	Parameterize type variants
`arithmetic.toml`	67	~25	Parameterize operators, reduce redundant precedence tests
`let.toml`	54	~20	Remove excessive shadowing/naming variants
`functions.toml` (inout)	27	~8	Keep 1 representative per pattern (primitive, struct, array)
Other files	~840	~350	Review for similar patterns

Total target: Reduce from ~1,083 to ~400-500 cases.

3. Add Integration Unit Tests

Create a new test layer in gruel-compiler that tests the pipeline without execution:

// In gruel-compiler/src/lib.rs
#[cfg(test)]
mod integration_tests {
    use super::*;

    #[test]
    fn test_arithmetic_compiles() {
        let result = compile_to_air("fn main() -> i32 { 1 + 2 * 3 }");
        assert!(result.is_ok());
        // Can optionally verify AIR structure
    }

    #[test]
    fn test_type_mismatch_error() {
        let result = compile_to_air("fn main() -> i32 { true }");
        assert!(result.is_err());
        assert!(result.unwrap_err().contains("type mismatch"));
    }
}

Benefits:

Fast: No file I/O, no process spawning, no execution
Comprehensive: Tests full parse→sema→codegen pipeline
Debuggable: Can inspect intermediate IRs in tests

4. Development Workflow

Update CLAUDE.md with recommended workflow:

# During development - fast feedback (unit tests only)
./quick-test.sh                       # Unit tests only (~2-5 seconds)

# Before committing - full verification
./test.sh                             # Full suite including spec tests

# Targeted spec tests
cargo run -p gruel-spec -- "arithmetic"  # Run specific tests

Add a new script ./quick-test.sh that runs only unit tests for faster iteration.

Implementation Phases

Phase 1: Parameterized Test Support - gruel-9jdv.1
- Added ParamSet struct and params field to Case in gruel-test-runner
- Implemented template expansion with {param} syntax
- Implemented expand_case() and expand_test_file() functions
- Added unit tests for expansion logic
- Added example parameterized test to integers.toml
Phase 2: Consolidate Integer Tests - gruel-9jdv.2
- Rewrote integers.toml using parameterized format (95 → 41 case definitions)
- Fixed traceability report to handle parameterized tests with spec_extra
- Verified 100% spec coverage maintained via traceability check
Phase 3: Consolidate Other Spec Tests - gruel-9jdv.3
- Consolidated arithmetic.toml (530→274 lines), let.toml (688→400 lines)
- Consolidated functions.toml (1405→800 lines, heavily reduced inout section)
- Consolidated bitwise.toml (550→325 lines), comparison.toml (439→274 lines)
- All 1021 tests pass with 100% normative spec coverage maintained
Phase 4: Integration Unit Tests - gruel-9jdv.4
- Added compile_to_air() and compile_to_cfg() test helpers to gruel-compiler
- Added 115+ integration unit tests covering major language features
- Tests organized by category: types, arithmetic, comparison, logical, bitwise, control flow, functions, structs, enums, arrays, strings, intrinsics, CFG construction, error messages, warnings, and edge cases
Phase 5: Workflow Documentation - gruel-9jdv.5
- Added Development Workflow section to CLAUDE.md
- Added ./quick-test.sh script for fast unit test iteration
- Added "Choosing the Right Test Type" table documenting when to use each test level

Consequences

Positive

Reduced test file verbosity: 8 similar tests become 1 parameterized definition
Easier maintenance: Change pattern once, affects all variations
Maintained spec coverage: Parameterization preserves traceability via spec_extra
Better test organization: Clear separation between development tests (fast) and verification tests (comprehensive)
Future optimization path: Can evolve to single-process execution for further speedup

Negative

Template syntax complexity: Parameterized tests are slightly more complex to write
Migration effort: Consolidating existing tests requires careful review
Potential coverage gaps: Must verify traceability after each consolidation

Neutral

Test count reduction: Fewer tests is not inherently better or worse; coverage matters
Two test commands: Developers need to remember when to use each

Open Questions

Template syntax: Should we use ${param} or {param} or {{param}}? Resolved: Using {param} for simplicity in Phase 1.
Single-process optimization: Should Phase 2+ implement single-process test generation for performance? (Deferred - current approach may be sufficient)
Nested parameters: Should we support arrays of arrays for testing combinations? (e.g., type × operator combinations)

Future Work

Single-process parameterized tests: Generate one program testing all variants for maximum speed
Test performance metrics: Add timing to test output to identify slow tests
Parallel spec tests: Run spec tests in parallel for faster full-suite runs
Test coverage visualization: Generate coverage reports showing which spec paragraphs are most/least tested

References

ADR-0005: Preview Features - Feature gating mechanism
CLAUDE.md - Development workflow documentation
crates/gruel-test-runner/src/lib.rs - Test infrastructure with parameterized support
Similar concepts: pytest parametrize, JUnit @ParameterizedTest, Go table-driven tests