Here is a jar of jelly beans. How many are in it?
Take a guess and hold onto it. We’ll come back to it — and to the slightly unsettling fact that nobody, including the people who set the puzzle, actually knows the answer.
Here is a jar of jelly beans. How many are in it?
Take a guess and hold onto it. We’ll come back to it — and to the slightly unsettling fact that nobody, including the people who set the puzzle, actually knows the answer.
In Part 1 I wrote about how our AI Slack bot was confidently wrong about revenue, and how a hand-maintained corrections file fixed it. That file grew. After three months it was six hundred lines long and still did not stop the bot from producing numbers that disagreed with the dashboards. This post is about why the corrections file could never work, and the week-long migration that finally made it unnecessary.
The short version: we already had a semantic layer. That was the problem. The bot had to reconcile business logic that lived in two places at once, and no amount of prompting was going to make it good at that.
Our AI-powered Slack bot had a credibility problem. It could query BigQuery, generate syntactically valid SQL, format results in neat tables, and respond conversationally. But when someone asked “what was revenue last month?” the number was wrong. Not because of a bug in the code — the SQL executed perfectly. The problem was semantic: the bot did not understand what “revenue” actually means in our domain.
We had three Cloud Functions handling exchange rate data. One fetched rates from an external API and wrote them to Google Cloud Storage. A second read from GCS and loaded into BigQuery. A third was an abandoned earlier attempt that loaded directly from the API to BigQuery but had never been decommissioned. Three functions, three sets of logs, three potential failure points — for what is fundamentally one job: “get today’s exchange rates into BigQuery.”
Our helpdesk ticket sync to BigQuery was broken. The Cloud Function responsible for it was hitting a 504 gateway timeout after 900 seconds — and the root cause was simple: it was listing every single ticket via the API on every run. With 52,000+ tickets and a page size of 50, that’s over 1,040 API calls before any data lands in BigQuery. Around 35 minutes of sequential HTTP requests, every single time.
We lost ~25 webhook events in 7 minutes because our Cloud Run service was scaling to zero. Here is how I used Cloud Scheduler to toggle min-instances during business hours — and the workaround for Cloud Scheduler not supporting PATCH requests.
We had a managed connector (think Fivetran, Airbyte, or similar) syncing our helpdesk data — tickets, contacts, threads, comments — from a SaaS helpdesk platform into BigQuery. It worked fine until it did not: the connector would miss webhook events, lag behind on incremental syncs, and occasionally produce phantom duplicates that our dbt models would faithfully propagate into dashboards. The cost was also non-trivial for what amounted to six tables.
We had four service accounts for BigQuery, one per GCP project. Each had its own JSON key file, its own set of IAM roles, and its own set of problems. When a local CLI tool needed to query production data, it used one key. When it needed to write to the raw data project, it used a different key. Our Slack bot had two keys mounted as secrets. The monitoring scripts had yet another. Every new tool or integration meant figuring out which key to use, and getting it wrong meant cryptic PERMISSION_DENIED errors that could take 20 minutes to debug.
Revenue numbers that don’t add up are the kind of bug that erodes trust fast. Last month I tracked down a case where a quote adjustment of over two thousand euros was producing zero revenue in our reporting layer. The root cause? Duplicated business logic across two dbt models — logic that had drifted apart over time.
The application was silently rewriting created_at after record creation, making child records appear older than their parents. Here is how I caught it and fixed it using raw CDC history in the dbt staging layer.
