ADR-0090: Tree-sitter Grammar and Parser Differential Fuzzing

Status

Implemented

Summary

Add a tree-sitter grammar for Gruel (housed in-tree under tree-sitter-gruel/) targeted at editor / IDE integrations (Zed, Neovim, Helix, GitHub highlighting, etc.). Keep gruel-parser (chumsky-based) as the canonical compiler parser. Guard against drift between the two grammars with a parser_differential fuzz target that asserts both parsers agree on whether a given input is syntactically valid. Add a smoke-fuzz CI job that runs every fuzz target (existing + new) for a short, fixed time budget on every PR so regressions are caught at PR time rather than the next nightly fuzz.yml run.

Context

gruel-parser is a chumsky-based parser tightly coupled to the compiler's AST shape, span model, and diagnostic surface. It is well-suited to producing high-quality compiler errors but is not directly usable by editors:

  • Editors expect either an LSP that supplies syntax tokens, or a tree-sitter grammar.
  • Tree-sitter grammars unlock highlighting in Zed/Neovim/Helix/Emacs/GitHub with no per-editor plugin work.
  • Tree-sitter is incremental and error-tolerant, which is what an editor wants while a user is typing.

The standard pattern across production compilers (Rust, Zig, Swift) is to keep a hand-written/canonical parser for the compiler and maintain a tree-sitter grammar separately for editor tooling. The well-known cost is drift: the tree-sitter grammar silently rots whenever the canonical parser learns new syntax. The mitigation, per the Rust project's experience and others, is differential testing: feed the same program to both parsers in CI and assert they agree on acceptance.

Today, fuzz coverage runs only nightly via .github/workflows/fuzz.yml (5 minutes per target). A grammar regression introduced in a PR can sit unnoticed until the next nightly run, and by then the bisect target is a full day of merges. A short smoke-fuzz pass on every PR catches the obvious crashes immediately.

Decision

Part 1: Tree-sitter grammar layout

A new top-level directory tree-sitter-gruel/ (sibling to crates/, not a workspace member, so cargo workflows are unaffected):

tree-sitter-gruel/
├── grammar.js                 # Grammar source
├── package.json               # npm metadata (for tree-sitter CLI / editor consumption)
├── src/                       # Generated by `tree-sitter generate` — committed
│   ├── parser.c
│   ├── grammar.json
│   ├── node-types.json
│   └── tree_sitter/parser.h
├── bindings/
│   └── rust/                  # Rust crate that wraps parser.c
│       ├── Cargo.toml
│       ├── build.rs           # cc-builds parser.c
│       └── lib.rs             # exposes `LANGUAGE: tree_sitter::Language`
├── queries/
│   ├── highlights.scm         # Editor highlighting
│   ├── locals.scm             # Scope / local-variable queries
│   ├── indents.scm
│   └── folds.scm
├── test/corpus/               # tree-sitter's native corpus tests
│   ├── lexical.txt
│   ├── items.txt
│   ├── expressions.txt
│   └── ...
└── README.md

Rationale for in-tree:

  • Keeps grammar and canonical parser in lockstep — the same PR can update both.
  • The differential fuzzer needs the tree-sitter crate as a path dependency; in-tree avoids the chicken-and-egg of versioning a separate repo while the language is still moving.
  • We can split out to a standalone tree-sitter-gruel repository (the conventional location editors look for) once syntax stabilizes; nothing in this layout precludes that.

Generated src/ is committed. This means contributors do not need node + tree-sitter-cli to build the differential fuzzer or run CI — only to regenerate after editing grammar.js. A make tree-sitter-generate Make target encapsulates this.

Part 2: Grammar scope

The grammar must cover all syntax that gruel-parser accepts, structurally — i.e., enough that the differential fuzzer is meaningful rather than trivially "tree-sitter rejects everything past keyword X." Tree shape does not need to match the chumsky AST; tree-sitter produces a CST optimized for tooling queries.

Initial coverage targets:

  • Lexical: all keywords, operators, literals (int/float/string/char), comments, doc comments (///)
  • Items: fn, struct, enum, impl, use, @import (ADR-0026), comptime blocks at item level
  • Statements: let, assignment, expression statements, return
  • Expressions: literals, binary (with precedence matching the Pratt parser in chumsky_parser.rs), unary, calls, method calls, field access, indexing, struct literals, blocks, if/while/for/match, intrinsic calls (@name(...)), path expressions, comptime { ... }
  • Types: named, generic params via comptime T: type syntax (ADR per [[project_no_user_generics]] memory — no user-facing <T>), references via Ref(I) / MutRef(I), arrays

What the grammar can omit:

  • Constant evaluation (purely a sema concern)
  • Type inference rules (sema)
  • Anything that requires resolving symbols across files

Part 3: Rust bindings

tree-sitter-gruel/bindings/rust/ is a standalone cargo crate (not a workspace member; consumed only by fuzz/ and any future tooling) that:

  1. Uses cc in build.rs to compile src/parser.c into a static library.
  2. Exposes a single pub const LANGUAGE: tree_sitter::Language for callers to plug into tree_sitter::Parser::set_language.
// fuzz/fuzz_targets/parser_differential.rs
use tree_sitter::Parser as TsParser;
use tree_sitter_gruel::LANGUAGE;

let mut ts = TsParser::new();
ts.set_language(&LANGUAGE.into()).unwrap();
let tree = ts.parse(source, None).unwrap();
let ts_accepted = !tree.root_node().has_error();

Part 4: Differential fuzzer

A new fuzz target parser_differential:

fuzz_target!(|prog: MaybeInvalidProgram| {
    let source = &prog.0;

    // Path A: chumsky parser
    let chumsky_accepted = match gruel_lexer::Lexer::new(source).tokenize() {
        Ok((tokens, interner)) => gruel_parser::Parser::new(tokens, interner).parse().is_ok(),
        Err(_) => false,
    };

    // Path B: tree-sitter parser
    let mut ts = TsParser::new();
    ts.set_language(&LANGUAGE.into()).unwrap();
    let tree = ts.parse(source.as_bytes(), None).unwrap();
    let ts_accepted = !tree.root_node().has_error();

    assert_eq!(
        chumsky_accepted, ts_accepted,
        "parser disagreement on:\n{}\n(chumsky={}, tree-sitter={})",
        source, chumsky_accepted, ts_accepted,
    );
});

Comparison criterion: acceptance only. Both parsers must agree on whether a program is syntactically valid. Tree shape, error positions, and recovery strategies are explicitly not compared — they will differ legitimately, and forcing parity there is a losing battle.

Input sources:

  1. MaybeInvalidProgram from gruel-fuzz/src/lib.rs — biases toward programs near the validity boundary, which is where parsers most often disagree.
  2. GruelProgram (valid) — sanity check that the tree-sitter grammar accepts everything chumsky accepts.
  3. Raw &[u8] via UTF-8 conversion — catches lexical edge cases.

We will likely need all three as separate sub-targets or as alternatives within a single Arbitrary enum.

Part 5: Corpus-based differential test (non-fuzz)

In addition to fuzzing, add a deterministic test that runs the differential check over every source = "..." string in crates/gruel-spec/cases/ and crates/gruel-ui-tests/cases/. This:

  • Runs under make test (no nightly Rust required, unlike fuzzing)
  • Catches drift immediately on PR
  • Gives a fixed, reproducible regression suite

Implementation: a new integration test in tree-sitter-gruel/bindings/rust/tests/ (or a small gruel-parser-diff test crate) that walks the TOML cases and runs both parsers.

Part 6: CI smoke fuzz

Add a smoke-fuzz job to .github/workflows/ci.yml (PR-gating), distinct from the existing nightly fuzz.yml:

smoke-fuzz:
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - name: Install LLVM 22
      run: ...  # same as other jobs
    - name: Install nightly + cargo-fuzz
      run: |
        rustup toolchain install nightly
        cargo +nightly install cargo-fuzz --locked
    - name: Restore fuzz corpus
      uses: actions/cache/restore@v4
      with:
        path: fuzz/corpus
        key: fuzz-corpus-${{ github.run_id }}
        restore-keys: fuzz-corpus-
    - name: Smoke fuzz each target (60s)
      run: |
        for target in lexer parser compiler structured_compiler structured_invalid comptime_differential parser_differential; do
          cargo +nightly fuzz run "$target" -- -max_total_time=60
        done
    - name: Save corpus
      uses: actions/cache/save@v4
      with:
        path: fuzz/corpus
        key: fuzz-corpus-${{ github.run_id }}

Time budget: 60 seconds per target × 7 targets = 7 minutes of pure fuzzing, plus build / cache overhead. Acceptable for PR CI; not so long that contributors avoid it.

Why on PRs, not just merge: A grammar drift caught at PR is a 5-line fix; the same drift caught 24 hours later in nightly fuzz is a bisect across N merges.

The nightly fuzz.yml is unchanged — it continues to run each target for 5 minutes (and we may extend per-target time there separately).

Implementation Phases

  • Phase 1: Tree-sitter scaffolding + lexical grammar

    • Create tree-sitter-gruel/ with grammar.js, package.json, README.md
    • Define lexical rules: identifiers, keywords, all operator tokens, integer / float / string / char literals, // / /// / /* */ comments
    • Add test/corpus/lexical.txt
    • Wire up make tree-sitter-generate and commit generated src/
  • Phase 2: Grammar for items, statements, types

    • fn, struct, enum, impl, use, @import
    • let, assignment, return, expression statements
    • Type expressions (named, Ref(...), MutRef(...), arrays, comptime T: type)
    • Item-level comptime { } blocks
    • Expand test/corpus/
  • Phase 3: Grammar for expressions

    • Pratt-style precedence matching chumsky_parser.rs
    • All operator forms, calls, method calls, field access, indexing
    • if/while/for/match, blocks
    • Intrinsic calls @name(...)
    • Struct literals (with the grammar disambiguation against block-expressions)
    • Full test/corpus/expressions.txt
  • Phase 4: Rust bindings + spec-corpus differential test

    • tree-sitter-gruel/bindings/rust/ crate with build.rs and lib.rs
    • Integration test that walks crates/gruel-spec/cases/ + crates/gruel-ui-tests/cases/ and asserts acceptance parity
    • Wire into make test
    • Fix any genuine grammar gaps surfaced by the spec corpus
  • Phase 5: Differential fuzz target

    • New fuzz/fuzz_targets/parser_differential.rs
    • Add tree-sitter and tree-sitter-gruel to fuzz/Cargo.toml
    • Register the binary, run locally for 5 minutes, fix anything found
    • Document the target in CLAUDE.md fuzz section
  • Phase 6: CI smoke-fuzz job

    • Add smoke-fuzz job to .github/workflows/ci.yml
    • 60s per target × 7 targets
    • Corpus caching across runs
    • Make it a required check via repo settings (manual step, noted in PR description)
  • Phase 7: Editor queries + docs

    • queries/highlights.scm, locals.scm, indents.scm, folds.scm
    • tree-sitter-gruel/README.md with usage from Zed / Neovim / Helix
    • Update CLAUDE.md "Modifying the Language" section with: "If you change syntax, also update tree-sitter-gruel/grammar.js and regenerate"
    • Optional: GitHub language detection PR to github-linguist (deferred, out of scope here)

Consequences

Positive

  • Editors get syntax highlighting and basic structural queries with no per-editor plugin work.
  • Differential fuzzer + spec-corpus differential test catch grammar drift automatically.
  • Smoke fuzz on PRs catches regressions at the time of introduction, not 24 hours later.
  • The tree-sitter grammar is a stepping stone toward an LSP (incremental reparse for free).
  • Generated src/ committed means most contributors never need to install node.

Negative

  • Two grammars to maintain. Mitigated by the differential infrastructure, but adding new syntax now requires touching grammar.js as well. Documented in CLAUDE.md.
  • CI time increases by ~8–10 minutes for the smoke-fuzz job. Acceptable given the value of catching regressions early.
  • node + tree-sitter-cli required to regenerate the parser. Not required for builds or CI fuzzing; only for grammar edits.
  • Acceptance-only differential will miss some real bugs (e.g., chumsky and tree-sitter parse the same program but produce wildly different structures). This is a known limitation and matches industry practice — tree comparison is impractical between a compiler AST and a tree-sitter CST.

Neutral

  • Tree-sitter version pinning: we'll target a specific tree-sitter runtime version (likely 0.24+). Bumping it is a deliberate ADR-worthy decision down the line.
  • In-tree vs separate repo: starts in-tree, can be split out when the language stabilizes. Editors that auto-discover grammars by repo name (tree-sitter-<lang>) won't find ours until then; this is fine for early-stage adoption.

Open Questions

  1. External scanner needed? Some constructs (string interpolation, raw strings, indentation-sensitive blocks) require tree-sitter's external scanner (scanner.c). Gruel currently has none of these — but worth a check during Phase 1.

  2. Should the smoke-fuzz job be required or advisory? Recommendation: required, but with a clear runbook for "the fuzzer found something" to avoid blocking unrelated PRs on flaky finds. Decision deferred to Phase 6.

  3. tree-sitter-gruel/bindings/rust outside the workspace — this avoids dragging tree-sitter into every cargo build. Need to verify fuzz/Cargo.toml can path-depend on a non-workspace crate cleanly. Expected to work; verifying in Phase 5.

  4. GitHub Linguist registration to get GitHub-native syntax highlighting — deferred to future work since it requires a stable grammar and external policy compliance.

Future Work

  • LSP server for Gruel building on tree-sitter-gruel for incremental reparse. Out of scope here; this ADR is the foundation.
  • Split tree-sitter-gruel to its own repository once syntax stabilizes (so editors can discover it conventionally).
  • Publish to crates.io + npm for editor consumption.
  • Tree-shape differential (not just acceptance) — only worth pursuing if a normalized common form proves tractable.
  • Extend nightly fuzz.yml to run longer (e.g., 30 min per target) now that PR smoke-fuzz handles the regression-detection role.

References