Teaching an AI Bot Your Metrics, Part 2: One Layer for Logic, One Thin Layer for Aggregation

In Part 1 I wrote about how our AI Slack bot was confidently wrong about revenue, and how a hand-maintained corrections file fixed it. That file grew. After three months it was six hundred lines long and still did not stop the bot from producing numbers that disagreed with the dashboards. This post is about why the corrections file could never work, and the week-long migration that finally made it unnecessary.

The short version: we already had a semantic layer. That was the problem. The bot had to reconcile business logic that lived in two places at once, and no amount of prompting was going to make it good at that.

The two-source problem

Our stack, before the migration, looked like this.

The warehouse was in BigQuery, modelled in dbt, with a reporting layer of views called things like rpt_damage_report_metrics. Those views contained business logic — a 1,800-line file with fifteen CASE expressions encoding what counted as revenue, which status buckets mapped to which funnel stage, how conversions were defined, how attribution worked.

On top of BigQuery we ran a BI tool with its own modelling language — AML — where analysts defined dimensions, measures, and ratios. AML also contained business logic. Forty computed dimensions, dozens of measures with nested CASE WHEN in the aggregation, a handful of ratios with custom null handling. Some of it duplicated the reporting views. Some of it contradicted them. None of it was reconcilable by reading either source alone.

The Slack bot sat underneath both. When someone asked “what was revenue last month,” the bot had two choices:

Write a BigQuery SQL query against the warehouse, trying to match what the reporting views would produce.
Call the BI tool’s semantic layer, trying to match what the dashboards would produce.

Neither choice was reliable. Option 1 meant the bot had to replicate all the AML logic in SQL, which it could not keep in sync. Option 2 meant the bot had to generate valid AML, which is finicky, and the bot would often generate something syntactically correct that picked up a subtly different aggregation than the dashboard used. Every question was “hard thinking,” and every other answer was wrong.

The corrections file was my attempt to patch this with English. “Revenue for partner X is bid price plus quote adjustment, but only when status is accepted, and exclude rental cars before January.” Six hundred lines of that, and the bot still had to choose between two code paths that encoded the same rule slightly differently.

The 97.7% to 19.4% bug

The obvious fix, on paper, was “collapse onto one source of truth.” We had a Unified Star Schema waiting to be the winner — a metrics bridge with one row per (damage_report, stage) and pre-computed additive measures. The dashboards were pointing at the reporting views. We just had to swap the table_name in each AML model and we were done.

I tried it on one dashboard. The conversion rate dropped from 97.7% to 19.4% overnight.

That is not a “looks slightly different” bug. That is a “this is completely broken” bug. It cost a day to understand, and the lesson now lives as a permanent guardrail in my agent’s memory:

Never filter on stage in the USS metrics bridge. The measures are NULL on irrelevant stages — filtering drops real rows.

What happened: the rpt_ view was not a table. It was a stage-gated CASE wrapped in a WHERE that pre-filtered rows and pre-computed flags. A row came out only if the damage report had reached the right stage, and dr_is_converted was either 1 or absent. The bridge, by contrast, carries one row per (damage_report, stage), and dr_is_converted returns 0 on non-converted damage reports. Dimension joins fanned out across every stage the DR had touched. Every dashboard that joined a dimension to a measure was now double-counting, or averaging zeros that did not exist in the old view.

The reporting view was not a bad table. It was a function, and we had swapped it for a different function and expected the same return value.

The fix: move all logic down, make the top layer thin

The architecture we converged on has a single load-bearing rule: all business logic lives in one place, and that place is the dbt bizlogic layer. Everything above is aggregation only.

In practice that meant three moves, executed across about fifteen pull requests and two repos:

Strip the reporting views. Every CASE WHEN, every IF, every stage-gated flag in the rpt_ views was extracted into a named bizlogic column in layer 08. The reporting views became thin projections — joins and selects, nothing conditional.
Strip AML. The forty computed dimensions in the BI tool moved down into the same bizlogic layer. Any AML measure that used CASE WHEN was rewritten as a pre-computed counter on the bridge — a dr_is_converted column in bizlogic, surfaced as an additive measure sum(dr_is_converted) in AML.
Bridge as pure UNION ALL. The metrics bridge holds numerics and keys, nothing else. No text, no booleans, no CASE. Every measure is additive by construction, which means dashboard filters and aggregations compose cleanly.

When the migration was done, AML contained exactly four kinds of operation: sum, avg, min/max, and safe_divide. That’s it. Every metric in every dashboard is expressible as one of those four applied to a column the bridge exposes. No CASE, no conditional joins, no null-coalescing tricks. Forty computed dimensions became zero.

What the bot sees now

The practical payoff is specific. The Slack bot used to get a system prompt that explained, in English, what “revenue” meant — plus a corrections file of exceptions, plus a warning to double-check against the BI tool, plus a list of columns it was allowed to read.

Now the bot sees this:

A schema. Here are the measures on the bridge (dr_revenue_total, dr_is_converted, bid_gov_total, and about sixty others). Here are the dimensions (partner_name, country, customer_type, repair_type, week, about twenty others).
Four functions. sum, avg, min/max, safe_divide. Nothing else.
Filters. Any dimension can filter any query.

That’s the whole contract. When someone asks “what was revenue for a specific partner in Germany last month,” the bot writes SELECT SUM(dr_revenue_total) FROM bridge WHERE partner_name='' AND country='DE' AND month='2026-03'. There is no harder version of the question, because there is no harder version of the model. Every rule the bot used to get wrong was a rule that now lives in one bizlogic column with one definition, and the bot does not have to know it exists — it just asks for the column by name.

No hard thinking anymore.

The date field nobody tells you about

One last trap is worth calling out, because it survives the whole migration and almost nobody thinks about it until they get burned.

Ask “what was revenue last month.” The bot picks a date column and filters. A damage report has at least six plausible dates: created_at, submitted_at, scheduled_at, appointment_at, completion_at, revenue_attribution_date. They are not the same. A DR created in February, scheduled in March, completed in April, and attributed to February for invoicing will land in four different buckets depending on which column you filter.

The bot does not know which one is right. “Right” is a business decision, and it is different per metric — revenue typically attributes by revenue_attribution_date, funnel conversion by submitted_at, operational throughput by completion_at. The dashboards encode that choice. The bot, left alone, will pick whichever column sounds most reasonable, get a number that is internally consistent with itself, and disagree with the dashboard by something between 2% and 30% depending on how far off it was.

The fix is the same shape as everything else in this post: bind each measure to its canonical date in the semantic layer, rather than letting the consumer pick. We use role-playing date relationships — six hidden date dimensions on the bridge (date_created, date_submitted, date_scheduled, date_appointment, date_completed, date_revenue), each measure declaring which one it joins to by default. Revenue measures join date_revenue. Funnel counters join date_submitted. Repair-time measures join date_completed. The bot just asks for “revenue by month” and the semantic layer picks the date for it.

That moves the last piece of hidden business logic out of the consumer and into the model. The bot no longer has to “know which date is right” any more than it has to know what revenue means — the column and its relationship to time are both definitions the warehouse owns.

The corrections file is down from six hundred lines to about eighty, and what remains is vocabulary — telling the bot that “deal” is a Pipedrive opportunity and “DR” is a damage report. That is not a correction. That is a glossary.

Six lessons worth writing down

The migration added six permanent rules to the agent’s memory file:

Never filter on stage in the USS bridge. Measures are NULL on irrelevant stages; filtering drops real rows. Fix inflated counts at PBS or bizlogic, not at the filter.
The bridge carries only numerics and keys. No text, no booleans. Text and booleans go on dimension models and are joined for slicing.
Zero CASE WHEN in PBS or bridge. Every conditional lives in bizlogic. If a reviewer finds an IF() in PBS, it’s a bug.
No CASE in AML either. AML only does sum, avg, min/max, safe_divide. Any conditional there is a missing bizlogic column.
No measures on dimensions, no dimensions doing measure work. Mixing them is the fastest way to produce numbers that look right and are wrong.
Bind each measure to a canonical date. The consumer should never have to pick between created_at, submitted_at, completion_at, revenue_attribution_date — the semantic layer picks. Role-playing date relationships in the dataset, one default per measure.

What’s next: Google’s Graph

There is a more ambitious version of this architecture coming. Google recently released Graph — a managed semantic layer for BigQuery, where the model definitions live in the warehouse itself and any consumer (SQL, BI, an AI bot) reads from the same object. Same idea as what we built, but managed, versioned, and queryable from anywhere that can reach BigQuery.

I am planning to try it. The appeal is obvious: the semantic layer stops being a thing I maintain in a BI tool’s proprietary language, and becomes a thing the warehouse owns. The same definition of dr_revenue_total would serve Holistics, the Slack bot, Connected Sheets, and ad-hoc SQL, without any translation layer.

It is very new. At the time of writing it does not yet support ratios — the safe_divide(sum(num), sum(den)) pattern we rely on for conversion rates, average order value, and every percentage metric has to be replicated by hand in the consumer. That is a blocker for us today. But ratios are a natural addition, and when they land, the AML layer shrinks again — from “four functions and a schema” to “a schema,” with even the four functions standardised at the warehouse level.

For now, the bizlogic-down migration is enough. The bot gets the number right. Revenue questions that used to produce wrong answers multiple times a week have produced zero in the last month. When finance asks “why is this different from the dashboard,” the answer is the one I wanted from the start: it isn’t.

Teaching an AI Bot Your Metrics, Part 2: One Layer for Logic, One Thin Layer for Aggregation

The two-source problem

The 97.7% to 19.4% bug

The fix: move all logic down, make the top layer thin

What the bot sees now

The date field nobody tells you about

Six lessons worth writing down

What’s next: Google’s Graph

Four Months Debugging a DRAM-less NVMe on a Raspberry Pi 5

Tricks and Behavioural Patterns I Taught My Coding Agent

You may also like

Leave a Comment Cancel Reply