projects/al-train
active al-train order 15
AL Train
A fine-tuning pipeline that trains language models to write AL code, using corpus data enhanced with AI-generated descriptions.
Overview
A Python pipeline for fine-tuning LLMs to write AL code. Takes code pairs from al-corpus, enriches them with AI-generated natural language descriptions via Anthropic’s Batch API, formats everything into ChatML training data, and produces a QLoRA adapter using Unsloth.
Evaluation suite covers five metrics: syntax validation, anti-pattern detection, BLEU score comparison against reference code, and training loss tracking. Designed for 24GB+ VRAM GPUs with 4-bit quantization to keep resource requirements manageable.
Highlights
Enriches AL code pairs via Anthropic Batch API
ChatML training data, Unsloth QLoRA adapters
4-bit quantization, targets 24GB+ VRAM GPUs
Five-metric eval: syntax, anti-patterns, BLEU, loss
Pipeline: pairs.jsonl to train.jsonl + eval.jsonl