Category:

Data Engineering

Death by a Thousand €3s: the EU import tax that’s pushing me off AliExpress

by Marc 10 June 2026

written by Marc

Ten years of cheap electronics, and a new €3 charge that quietly makes the whole habit pointless.

I’ve been buying from AliExpress since November 2015. My first order, for the record, was a €1 bag of replacement screws for a MacBook — the sort of thing no shop near me stocks and no sane person pays shipping for. That pretty much set the tone for the next decade.

I pulled my whole account history the other day and counted 322 orders. Just under €5,000 over ten years, which sounds like a lot until you look at what it actually was: jumper wires, GPIO headers for a Raspberry Pi, solder, a handful of sensor boards, plastic sleeves for my game cartridges, little displays you genuinely can’t buy here in the Netherlands without paying three times the price. Supplies, basically — the raw material for a hundred small weekend projects. There were a couple of real splurges buried in there too — a brand-new iPhone, the odd used MacBook SSD — but mostly it’s a long, boring list of things that cost a euro or two.

For all that it comes to ten years and nearly €5,000, the median order was about €5. Almost half of everything came in under that, fifty-odd orders were under €2, and the cheapest was a literal one cent. The most expensive single thing in a decade was that new iPhone, at €499 — the exception that proves how small the rest of it is. That’s the shape of it: a steady drip of tiny parcels, around thirty a year, with the occasional proper purchase thrown in.

On 1 July 2026 the EU changes the rules in a way that makes that kind of shopping stop adding up. So I’m winding the account down. Here’s the thinking.

10 June 2026 0 comments

Data Engineering

How many jelly beans are in the jar?

by Marc 27 May 2026

written by Marc

Here is a jar of jelly beans. How many are in it?

Take a guess and hold onto it. We’ll come back to it — and to the slightly unsettling fact that nobody, including the people who set the puzzle, actually knows the answer.

27 May 2026 0 comments

Teaching an AI Bot Your Metrics, Part 2: One Layer for Logic, One Thin Layer for Aggregation

by Marc 25 April 2026

written by Marc

In Part 1 I wrote about how our AI Slack bot was confidently wrong about revenue, and how a hand-maintained corrections file fixed it. That file grew. After three months it was six hundred lines long and still did not stop the bot from producing numbers that disagreed with the dashboards. This post is about why the corrections file could never work, and the week-long migration that finally made it unnecessary.

The short version: we already had a semantic layer. That was the problem. The bot had to reconcile business logic that lived in two places at once, and no amount of prompting was going to make it good at that.

25 April 2026 0 comments

Data Engineering Lessons Learned Python

Teaching an AI Bot Your Metrics: The Corrections File Pattern

by Marc 17 March 2026

written by Marc

Our AI-powered Slack bot had a credibility problem. It could query BigQuery, generate syntactically valid SQL, format results in neat tables, and respond conversationally. But when someone asked “what was revenue last month?” the number was wrong. Not because of a bug in the code — the SQL executed perfectly. The problem was semantic: the bot did not understand what “revenue” actually means in our domain.

17 March 2026 0 comments

Data Engineering Tools & Automation

Consolidating Three Cloud Functions into One

by Marc 15 March 2026

written by Marc

We had three Cloud Functions handling exchange rate data. One fetched rates from an external API and wrote them to Google Cloud Storage. A second read from GCS and loaded into BigQuery. A third was an abandoned earlier attempt that loaded directly from the API to BigQuery but had never been decommissioned. Three functions, three sets of logs, three potential failure points — for what is fundamentally one job: “get today’s exchange rates into BigQuery.”

15 March 2026 0 comments

Data Engineering Python

Incremental Sync: From 35 Minutes to 5 Seconds

by Marc 10 March 2026

written by Marc

Our helpdesk ticket sync to BigQuery was broken. The Cloud Function responsible for it was hitting a 504 gateway timeout after 900 seconds — and the root cause was simple: it was listing every single ticket via the API on every run. With 52,000+ tickets and a page size of 50, that’s over 1,040 API calls before any data lands in BigQuery. Around 35 minutes of sequential HTTP requests, every single time.

10 March 2026 0 comments

Data Engineering Tools & Automation

Cold Starts Kill Webhooks: Scheduling Cloud Run Min-Instances

by Marc 8 March 2026

written by Marc

We lost ~25 webhook events in 7 minutes because our Cloud Run service was scaling to zero. Here is how I used Cloud Scheduler to toggle min-instances during business hours — and the workaround for Cloud Scheduler not supporting PATCH requests.

8 March 2026 0 comments

Data Engineering Python

Migrating a SaaS Helpdesk Connector to a Custom Cloud Run Sync

by Marc 6 March 2026

written by Marc

We had a managed connector (think Fivetran, Airbyte, or similar) syncing our helpdesk data — tickets, contacts, threads, comments — from a SaaS helpdesk platform into BigQuery. It worked fine until it did not: the connector would miss webhook events, lag behind on incremental syncs, and occasionally produce phantom duplicates that our dbt models would faithfully propagate into dashboards. The cost was also non-trivial for what amounted to six tables.

6 March 2026 0 comments

Data Engineering Lessons Learned

Consolidating GCP Service Accounts: Fewer Keys, Less Chaos

by Marc 5 March 2026

written by Marc

We had four service accounts for BigQuery, one per GCP project. Each had its own JSON key file, its own set of IAM roles, and its own set of problems. When a local CLI tool needed to query production data, it used one key. When it needed to write to the raw data project, it used a different key. Our Slack bot had two keys mounted as secrets. The monitoring scripts had yet another. Every new tool or integration meant figuring out which key to use, and getting it wrong meant cryptic PERMISSION_DENIED errors that could take 20 minutes to debug.

5 March 2026 0 comments

Data Engineering dbt Lessons Learned

DRY Revenue Logic: Extracting a Shared dbt Macro for Partner-Specific Calculations

by Marc 25 February 2026

written by Marc

Revenue numbers that don’t add up are the kind of bug that erodes trust fast. Last month I tracked down a case where a quote adjustment of over two thousand euros was producing zero revenue in our reporting layer. The root cause? Duplicated business logic across two dbt models — logic that had drifted apart over time.

25 February 2026 0 comments

Newer Posts

Older Posts