Marc de Mas

Four Months Debugging a DRAM-less NVMe on a Raspberry Pi 5

by Marc 24 April 2026

written by Marc

In January I mounted a Crucial P3 Plus 4TB NVMe on a Raspberry Pi 5 inside an Argon Neo 5 M.2 case and moved everything onto it — Immich photo library, motionEye camera recordings, Docker volumes, a few databases, the odd SFTP sync target. It worked. Then it froze. A week later, it froze again. Then every few days. Then every few hours.

What followed was four months of hunting a ghost. If you are running a DRAM-less NVMe (P3 Plus, WD Blue SN580, Kingston NV2/NV3, HP EX900, many others) on a Pi 5, this post is written for you.

TL;DR. The Pi 5’s firmware disables Host Memory Buffer by default; DRAM-less drives need HMB to stay alive under load. There is no fix that makes a DRAM-less drive truly stable on a Pi 5 — only mitigations (HMB on, PCIe Gen 1, throttle stack) that reduce the wedge rate without eliminating it. The Practical Checklist below is the mitigation list. The full post is the four-month version of how I got there — and why the only real answer is to replace the drive with one that has on-board DRAM.

24 April 2026 0 comments

Ditching Cloud Secret Managers: Age Encryption in a Git Repo

by Marc 5 April 2026

written by Marc

Google Secret Manager costs money. Not much per secret, but when you run five side projects across four servers, and each has its own set of API keys, database passwords, and service tokens, the bill adds up — and so does the cognitive overhead of managing IAM permissions, service accounts, and secret versions across multiple GCP projects. So I replaced it with a private GitHub repo and age encryption.

5 April 2026 0 comments

Lessons Learned Tools & Automation

Three Months of Developing with an AI Coding Agent

by Marc 18 March 2026

written by Marc

I have been using an AI coding agent as my primary development tool for three months. Not a chatbot I paste code into, but an agent with filesystem access, shell execution, and persistent memory that works inside my terminal. It reads my codebase, runs my tests, pushes my commits, deploys my services, and remembers what it learned yesterday. This post is about the technical patterns I have developed to make that work well — the context files, the memory system, the tool integrations, and the compounding feedback loop that makes it better over time.

18 March 2026 0 comments

Data Engineering Lessons Learned Python

Teaching an AI Bot Your Metrics: The Corrections File Pattern

by Marc 17 March 2026

written by Marc

Our AI-powered Slack bot had a credibility problem. It could query BigQuery, generate syntactically valid SQL, format results in neat tables, and respond conversationally. But when someone asked “what was revenue last month?” the number was wrong. Not because of a bug in the code — the SQL executed perfectly. The problem was semantic: the bot did not understand what “revenue” actually means in our domain.

17 March 2026 0 comments

Data Engineering Tools & Automation

Consolidating Three Cloud Functions into One

by Marc 15 March 2026

written by Marc

We had three Cloud Functions handling exchange rate data. One fetched rates from an external API and wrote them to Google Cloud Storage. A second read from GCS and loaded into BigQuery. A third was an abandoned earlier attempt that loaded directly from the API to BigQuery but had never been decommissioned. Three functions, three sets of logs, three potential failure points — for what is fundamentally one job: “get today’s exchange rates into BigQuery.”

15 March 2026 0 comments

Lessons Learned Tools & Automation

Auto-Deploying Knowledge Changes Across Repos with GitHub Actions

by Marc 12 March 2026

written by Marc

We have an AI-powered Slack bot that answers business data questions. It uses a knowledge base — schema descriptions, business rules, SQL query patterns — to generate accurate queries. The problem: this knowledge base must stay in sync with two upstream repos. When a BI model changes, the bot needs to know about new dimensions and measures. When dbt models change, the bot needs an updated schema index. Keeping this in sync manually was a recipe for stale knowledge and wrong answers.

12 March 2026 0 comments

Data Engineering Python

Incremental Sync: From 35 Minutes to 5 Seconds

by Marc 10 March 2026

written by Marc

Our helpdesk ticket sync to BigQuery was broken. The Cloud Function responsible for it was hitting a 504 gateway timeout after 900 seconds — and the root cause was simple: it was listing every single ticket via the API on every run. With 52,000+ tickets and a page size of 50, that’s over 1,040 API calls before any data lands in BigQuery. Around 35 minutes of sequential HTTP requests, every single time.

10 March 2026 0 comments

Data Engineering Tools & Automation

Cold Starts Kill Webhooks: Scheduling Cloud Run Min-Instances

by Marc 8 March 2026

written by Marc

We lost ~25 webhook events in 7 minutes because our Cloud Run service was scaling to zero. Here is how I used Cloud Scheduler to toggle min-instances during business hours — and the workaround for Cloud Scheduler not supporting PATCH requests.

8 March 2026 0 comments

Data Engineering Python

Migrating a SaaS Helpdesk Connector to a Custom Cloud Run Sync

by Marc 6 March 2026

written by Marc

We had a managed connector (think Fivetran, Airbyte, or similar) syncing our helpdesk data — tickets, contacts, threads, comments — from a SaaS helpdesk platform into BigQuery. It worked fine until it did not: the connector would miss webhook events, lag behind on incremental syncs, and occasionally produce phantom duplicates that our dbt models would faithfully propagate into dashboards. The cost was also non-trivial for what amounted to six tables.

6 March 2026 0 comments

Data Engineering Lessons Learned

Consolidating GCP Service Accounts: Fewer Keys, Less Chaos

by Marc 5 March 2026

written by Marc

We had four service accounts for BigQuery, one per GCP project. Each had its own JSON key file, its own set of IAM roles, and its own set of problems. When a local CLI tool needed to query production data, it used one key. When it needed to write to the raw data project, it used a different key. Our Slack bot had two keys mounted as secrets. The monitoring scripts had yet another. Every new tool or integration meant figuring out which key to use, and getting it wrong meant cryptic PERMISSION_DENIED errors that could take 20 minutes to debug.

5 March 2026 0 comments