tree-sitter-al

One Parser, Six Tools

tree-sitter-al on GitHub ·al-corpus on GitHub ·code-graph-rag fork
One Parser, Six Tools

I built a parser. Then I couldn’t stop building things on top of it.

tree-sitter-al parses AL, the language behind Business Central. I got it to 100% on 15,358 production files. Zero errors. It already powers al-perf and the AL Language Server integration. But I keep finding new uses for it. Here are six more.

GitHub Code Navigation

GitHub uses tree-sitter to power code navigation (Go to Definition, Find References, symbol search) on github.com. This is separate from syntax highlighting and managed through the github/code-navigation repository.

tree-sitter-al meets all the requirements:

  • AL is already recognized in GitHub Linguist
  • The parser is published to crates.io
  • queries/tags.scm defines 226 lines of definitions and references covering all AL object types, procedures, triggers, fields, variables, namespaces, and more
  • AL definitions are lexically nested (procedures inside objects), so GitHub can infer fully-qualified names from tree structure automatically

I’ve opened the request. Now it’s up to GitHub to review and add AL to their supported languages. GitHub is owned by Microsoft. AL is Microsoft’s language. You’d think this would be a quick yes.

If you happen to be at Days of Knowledge UK right now, maybe mention it to someone from Microsoft. Threaten them with a public hug from a stranger. They’re Danes. The mere suggestion should be enough.

ast-grep is a structural search and lint tool. Unlike grep, which matches text, ast-grep matches syntax tree patterns. It understands your code’s structure.

For AL, this means you can write rules like “find every property where ObsoleteState is set to Removed” and get precise, zero-false-positive results across an entire codebase:

# rules/find-obsolete-removed.yml
id: find-obsolete-removed
language: al
severity: warning
message: "Found ObsoleteState = Removed property"
rule:
  pattern:
    context: 'table 1 X { ObsoleteState = Removed; }'
    selector: property
$ ast-grep scan --rule rules/find-obsolete-removed.yml ./BaseApp/

warning[find-obsolete-removed]: Found ObsoleteState = Removed property
    ┌─ BaseApp/Source/Warehouse/Setup/WarehouseSetup.Table.al:120:13

120 │             ObsoleteState = Removed;
    │             ^^^^^^^^^^^^^^^^^^^^^^^^

AL files need a full object wrapper, so ast-grep rules use the context + selector pattern: you provide a valid AL wrapper, then select the node you care about.

Setup:

# sgconfig.yml
customLanguages:
  al:
    libraryPath: path/to/al.dll  # built with: tree-sitter build -o al.dll
    extensions: [al]

Build the DLL from the tree-sitter-al repo with tree-sitter build -o al.dll and point ast-grep at it. That’s it.

al-corpus: Training Data Extraction

al-corpus is a Rust CLI that walks an AL codebase using tree-sitter-al and extracts structured JSONL suitable for fine-tuning LLMs.

Point it at a codebase and get:

  • Object records. Every table, page, codeunit, report, enum with metadata (fields, procedures, properties)
  • Procedure records. Every procedure/trigger with signature, parameters, variables, call references
  • Prompt/completion pairs. Procedure signature as prompt, body as completion, ready for supervised fine-tuning
  • Anti-pattern detection. 10 detectable AL anti-patterns with severity, location, and fix suggestions
al-corpus extract ./my-al-project -o corpus.jsonl --pairs pairs.jsonl
al-corpus lint ./my-al-project -o lint.jsonl

The parser does the heavy lifting. al-corpus walks the typed AST and extracts exactly what it needs. No regex fragility.

code-graph-rag: AI-Powered Code Understanding

When you ask an AI about your codebase, it needs context. Most tools give it a handful of files and hope for the best. code-graph-rag takes a different approach: it builds a knowledge graph from your code. Procedures, tables, dependencies, call chains, event subscriptions. All connected, all queryable.

I forked it at SShadowS/code-graph-rag and added AL support. This is the one I’m most excited about. More to come, it deserves a blog post all by itself.

Editor Support: Neovim, Helix, Zed

tree-sitter-al ships with five query files that editors consume directly:

FileWhat it powers
highlights.scmSyntax highlighting
locals.scmScope-aware highlighting (local vs global)
tags.scmSymbol navigation (Go to Definition)
folds.scmCode folding
indents.scmAuto-indentation

Any tree-sitter-native editor can use these. Neovim (via nvim-treesitter), Helix, and Zed all support custom tree-sitter grammars with their own configuration steps.

Interactive Playground

I use the WASM build to highlight every AL code block on this site. Same parser that handles production files, running in your browser. No TextMate grammars, no regex approximations.

Try it. Edit the code below and watch the parse tree update in real time:

AL CodeSyntax Tree
codeunit 50100 "My Codeunit"
{
    var
        CustomerName: Text[100];

    procedure GetCustomerName(CustomerNo: Code[20]): Text[100]
    var
        Customer: Record Customer;
    begin
        if Customer.Get(CustomerNo) then
            exit(Customer.Name);
        exit('');
    end;
}
Loading parser...

What’s Next

Every tool on this list runs on the same parse tree. I built the parser because no one else had. The rest grew from there: a linter, a training pipeline, code navigation, an interactive playground. One parser, and it turns out that’s all you need.

If you’re working with AL and want to build on it, the parser is MIT-licensed: github.com/SShadowS/tree-sitter-al.