Tagged: benchmark

2 posts and 1 project

Posts

CentralGauge Benchmark Update: Why the Numbers Changed

A transparency report on significant fixes to the CentralGauge AL code benchmark infrastructure, including bugs in code extraction, broken tasks, and vague specs, along with updated LLM rankings.

aialbenchmarkcentralgaugedeveloper-tools

How I Benchmark LLMs on AL Code

An in-depth look at CentralGauge, an open source benchmark for evaluating LLM performance on AL code generation for Business Central, covering task design, scoring methodology, and cross-model comparison results.

alllmbenchmarkbusiness-centraldeveloper-tools

Projects

CentralGauge - AL Code Benchmark for LLMs

Active

An open source benchmark for evaluating LLM performance on AL code generation for Microsoft Dynamics 365 Business Central, with 56 tasks across three difficulty tiers, real compilation, and test execution.

alllmbenchmarkbusiness-centralai