active al-train order 15

AL Train

A fine-tuning pipeline that trains language models to write AL code, using corpus data enhanced with AI-generated descriptions.

View source

Stack: Python
Status: Active
Released: 2026
License: MIT

Overview

A Python pipeline for fine-tuning LLMs to write AL code. Takes code pairs from al-corpus, enriches them with AI-generated natural language descriptions via Anthropic’s Batch API, formats everything into ChatML training data, and produces a QLoRA adapter using Unsloth.

Evaluation suite covers five metrics: syntax validation, anti-pattern detection, BLEU score comparison against reference code, and training loss tracking. Designed for 24GB+ VRAM GPUs with 4-bit quantization to keep resource requirements manageable.

Highlights

Enriches AL code pairs via Anthropic Batch API

ChatML training data, Unsloth QLoRA adapters

4-bit quantization, targets 24GB+ VRAM GPUs

Five-metric eval: syntax, anti-patterns, BLEU, loss

Pipeline: pairs.jsonl to train.jsonl + eval.jsonl