Consolidating Three Cloud Functions into One

We had three Cloud Functions handling exchange rate data. One fetched rates from an external API and wrote them to Google Cloud Storage. A second read from GCS and loaded into BigQuery. A third was an abandoned earlier attempt that loaded directly from the API to BigQuery but had never been decommissioned. Three functions, three sets of logs, three potential failure points — for what is fundamentally one job: “get today’s exchange rates into BigQuery.”

The Before: Three Functions, Two Hops

Here’s what the architecture looked like:

┌──────────────────────────────────────────────────────────────────┐
│ BEFORE: 3 Cloud Functions                                        │
│                                                                  │
│  ┌─────────────────────┐    ┌──────────────────────────────┐    │
│  │ fn_rates_api_to_gcs │───▶│ GCS: rates/2026-03-15.json   │    │
│  │ (Pub/Sub trigger)   │    └──────────────┬───────────────┘    │
│  └─────────────────────┘                   │                     │
│                                            ▼                     │
│  ┌──────────────────────────┐    ┌─────────────────────┐        │
│  │ fn_rates_gcs_to_bigquery │───▶│ BigQuery: rates table│        │
│  │ (Pub/Sub trigger)        │    └─────────────────────┘        │
│  └──────────────────────────┘                                    │
│                                                                  │
│  ┌──────────────────────────┐                                    │
│  │ fn_rates_direct_load     │  ← abandoned, still deployed       │
│  │ (HTTP trigger, unused)   │                                    │
│  └──────────────────────────┘                                    │
└──────────────────────────────────────────────────────────────────┘

The GCS intermediate step was originally designed as a “staging area” for auditability. In practice, nobody ever looked at the JSON files in GCS. They just accumulated, costing storage and adding latency.

The After: One Function, Direct Load

┌──────────────────────────────────────────────────────────────────┐
│ AFTER: 1 Cloud Function                                          │
│                                                                  │
│  ┌──────────────────────────┐    ┌─────────────────────┐        │
│  │ fn_exchange_rates_sync   │───▶│ BigQuery: rates table│        │
│  │ (Pub/Sub trigger)        │    └─────────────────────┘        │
│  └──────────────────────────┘                                    │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘

The Consolidated Function

The new function fetches exchange rates from the API and loads them directly into BigQuery in a single invocation. Key design choice: WRITE_TRUNCATE instead of WRITE_APPEND. The old pipeline used append mode, which occasionally caused duplicate rows when retries happened. Since the exchange rate table is small (one row per currency pair per day, ~180 rows per day), truncating and rewriting the full daily set is cheap and guarantees no duplicates.

import functions_framework
import requests
import logging
from datetime import date
from google.cloud import bigquery

logger = logging.getLogger(__name__)

API_URL = "https://api.exchangerate-provider.com/v1/latest"
PROJECT_ID = "my-data-project"
DATASET = "reference_data"
TABLE = "exchange_rates"
TABLE_ID = f"{PROJECT_ID}.{DATASET}.{TABLE}"

# All currencies we need
CURRENCIES = [
    "EUR", "USD", "GBP", "PLN", "CHF", "SEK", "NOK", "DKK",
    "CZK", "HUF", "RON", "BGN", "HRK", "TRY", "RUB",
]

SCHEMA = [
    bigquery.SchemaField("base_currency", "STRING"),
    bigquery.SchemaField("target_currency", "STRING"),
    bigquery.SchemaField("rate", "FLOAT64"),
    bigquery.SchemaField("rate_date", "DATE"),
    bigquery.SchemaField("loaded_at", "TIMESTAMP"),
]


@functions_framework.cloud_event
def sync_exchange_rates(cloud_event):
    """
    Fetch today's exchange rates and load directly to BigQuery.
    Uses WRITE_TRUNCATE on a date-partitioned table to prevent duplicates.
    """
    today = date.today().isoformat()

    # Step 1: Fetch from API
    logger.info(f"Fetching exchange rates for {today}")
    response = requests.get(
        API_URL,
        params={
            "base": "EUR",
            "symbols": ",".join(CURRENCIES),
        },
        timeout=30,
    )
    response.raise_for_status()
    data = response.json()

    rates = data["rates"]
    logger.info(f"Received {len(rates)} currency pairs")

    # Step 2: Transform into rows
    from datetime import datetime, timezone

    loaded_at = datetime.now(timezone.utc).isoformat()
    rows = [
        {
            "base_currency": "EUR",
            "target_currency": currency,
            "rate": rate,
            "rate_date": today,
            "loaded_at": loaded_at,
        }
        for currency, rate in rates.items()
    ]

    # Step 3: Load to BigQuery with WRITE_TRUNCATE
    bq_client = bigquery.Client(project=PROJECT_ID)

    job_config = bigquery.LoadJobConfig(
        schema=SCHEMA,
        write_disposition="WRITE_TRUNCATE",
        time_partitioning=bigquery.TimePartitioning(
            type_=bigquery.TimePartitioningType.DAY,
            field="rate_date",
        ),
        # Only truncate today's partition, not the entire table
        destination_table_partition=f"{TABLE_ID}${today.replace('-', '')}",
    )

    load_job = bq_client.load_table_from_json(
        rows,
        TABLE_ID,
        job_config=job_config,
    )
    load_job.result()  # Wait for completion

    logger.info(
        f"Loaded {len(rows)} rows to {TABLE_ID} "
        f"(partition: {today})"
    )

    return {"status": "ok", "rows": len(rows), "date": today}

Note the partition-level truncate: by setting destination_table_partition, we only overwrite today’s partition. Historical data is untouched. If the function runs twice on the same day (e.g., due to a retry or manual trigger), the second run cleanly replaces the first — no duplicates.

Deploying the New Function

# Deploy the consolidated function
gcloud functions deploy fn-exchange-rates-sync \
    --gen2 \
    --runtime python312 \
    --region europe-west3 \
    --source . \
    --entry-point sync_exchange_rates \
    --trigger-topic exchange-rates-daily \
    --timeout 120 \
    --memory 256MB \
    --service-account [email protected] \
    --project my-data-project

# Delete the old functions
gcloud functions delete fn-rates-api-to-gcs \
    --region europe-west3 --project my-data-project --quiet

gcloud functions delete fn-rates-gcs-to-bigquery \
    --region europe-west3 --project my-data-project --quiet

gcloud functions delete fn-rates-direct-load \
    --region europe-west3 --project my-data-project --quiet

The Bigger Cleanup: Cloud Scheduler Audit

While consolidating the exchange rate functions, I discovered a broader mess in Cloud Scheduler. We had 15+ scheduled jobs spread across two regions, with inconsistent naming, overlapping schedules, and several that triggered functions that no longer existed. Time for an audit.

Inventory Discovery

First, I listed everything:

# List all scheduler jobs across both regions
for REGION in europe-west1 europe-west3; do
    echo "=== $REGION ==="
    gcloud scheduler jobs list \
        --location=$REGION \
        --project=my-data-project \
        --format="table(name, schedule, state, httpTarget.uri)"
done

This revealed the full picture:

=== europe-west1 ===
NAME                          SCHEDULE        STATE     TARGET
rates-api-fetch               0 6 * * *       ENABLED   fn-rates-api-to-gcs
rates-gcs-load                15 6 * * *      ENABLED   fn-rates-gcs-to-bigquery
rates-direct                  0 7 * * *       PAUSED    fn-rates-direct-load
sheets-sync                   */30 * * * *    ENABLED   fn-sheets-to-bigquery
helpdesk-sync                 0 */2 * * *     ENABLED   fn-helpdesk-sync
old-helpdesk-sync             0 3 * * *       PAUSED    (deleted function)
vehicle-data-daily            0 4 * * 1       ENABLED   fn-vehicle-data
holiday-refresh               0 5 1 1 *       ENABLED   fn-holiday-data

=== europe-west3 ===
NAME                          SCHEDULE        STATE     TARGET
exchange-rates-v2             0 6 * * *       PAUSED    (deleted function)
sheets-sync-backup            */30 * * * *    ENABLED   fn-sheets-to-bigquery
mysql-replication              */15 * * * *   ENABLED   fn-mysql-sync
daily-report-trigger          0 8 * * 1-5     ENABLED   fn-daily-report
weekly-cleanup                0 2 * * 0       ENABLED   fn-cleanup
test-scheduler-job            * * * * *       PAUSED    (test endpoint)

Several problems jumped out:

Orphaned jobs: 3 scheduler jobs pointed to functions that had been deleted months ago
Duplicate jobs: sheets-sync and sheets-sync-backup triggered the same function from different regions
Inconsistent naming: rates-api-fetch, exchange-rates-v2, rates-direct — three names for the same concept
Wrong region: Several jobs in europe-west1 triggered functions in europe-west3, adding unnecessary cross-region latency
Forgotten test job: test-scheduler-job was set to run every minute (paused, thankfully)

The Cleanup

# Delete orphaned and duplicate jobs
ORPHANED_JOBS=(
    "europe-west1/rates-api-fetch"
    "europe-west1/rates-gcs-load"
    "europe-west1/rates-direct"
    "europe-west1/old-helpdesk-sync"
    "europe-west3/exchange-rates-v2"
    "europe-west3/sheets-sync-backup"
    "europe-west3/test-scheduler-job"
)

for JOB in "${ORPHANED_JOBS[@]}"; do
    REGION=$(echo "$JOB" | cut -d/ -f1)
    NAME=$(echo "$JOB" | cut -d/ -f2)
    echo "Deleting $NAME in $REGION"
    gcloud scheduler jobs delete "$NAME" \
        --location="$REGION" \
        --project=my-data-project \
        --quiet
done

Standardised Naming and New Schedule

I created a single new scheduler job for the consolidated exchange rate function, following a naming convention I established for all jobs:

# Naming convention: {domain}-{action}-{frequency}
# All jobs in europe-west3 (same region as functions)

# New exchange rate job
gcloud scheduler jobs create pubsub exchange-rates-daily-sync \
    --location=europe-west3 \
    --project=my-data-project \
    --topic=exchange-rates-daily \
    --message-body='{}' \
    --schedule="0 7 * * 1-5" \
    --time-zone="Europe/Amsterdam" \
    --description="Fetch exchange rates and load to BigQuery (weekdays 7AM)"

# Migrate sheets sync to correct region
gcloud scheduler jobs delete sheets-sync \
    --location=europe-west1 --project=my-data-project --quiet

gcloud scheduler jobs create pubsub sheets-sync-halfhourly \
    --location=europe-west3 \
    --project=my-data-project \
    --topic=sheets-sync \
    --message-body='{}' \
    --schedule="*/30 8-18 * * 1-5" \
    --time-zone="Europe/Amsterdam" \
    --description="Sync Google Sheets to BigQuery (every 30 min, business hours)"

Notice the sheets sync improvement: instead of running every 30 minutes 24/7, it now only runs during business hours on weekdays. Nobody updates Google Sheets at 3 AM on Sunday — no point syncing then.

The Final State

=== europe-west3 (all jobs consolidated here) ===
NAME                              SCHEDULE              STATE     DESCRIPTION
exchange-rates-daily-sync         0 7 * * 1-5           ENABLED   Exchange rates → BQ
sheets-sync-halfhourly            */30 8-18 * * 1-5     ENABLED   Google Sheets → BQ
helpdesk-sync-bihourly            0 */2 * * *           ENABLED   Helpdesk tickets → BQ
mysql-replication-quarterly       */15 * * * *          ENABLED   MySQL CDC → BQ
vehicle-data-weekly               0 4 * * 1             ENABLED   Vehicle registry → BQ
holiday-data-yearly               0 5 1 1 *             ENABLED   Public holidays → BQ
daily-report-weekday              0 8 * * 1-5           ENABLED   Daily report trigger
weekly-cleanup-sunday             0 2 * * 0             ENABLED   Temp data cleanup

Documenting in the Architecture Map

After the cleanup, I updated our architecture map spreadsheet — a living document that tracks every data source, function, scheduler job, and dataset. For the Cloud Functions tab, each row contains:

Function name
Trigger type (Pub/Sub, HTTP, Scheduler)
Schedule (cron expression)
Source (API, GCS, MySQL)
Target (BigQuery table/dataset)
Region
Last modified date
Status (active, deprecated, deleted)

For the Scheduler Jobs tab, I added the same level of detail. This makes it trivial for anyone on the team to answer “what runs when and why” without spelunking through the GCP console.

# Update the architecture map spreadsheet with the new state
# (using our Google Sheets CLI tool)

# Cloud Functions tab — mark old functions as deleted
read-gsheet write "$SHEET_ID" "Cloud Functions" "H3" "deleted (2026-03-15)"
read-gsheet write "$SHEET_ID" "Cloud Functions" "H4" "deleted (2026-03-15)"
read-gsheet write "$SHEET_ID" "Cloud Functions" "H5" "deleted (2026-03-15)"

# Add the new consolidated function
read-gsheet append "$SHEET_ID" "Cloud Functions" \
    "fn-exchange-rates-sync" "Pub/Sub" "exchange-rates-daily" \
    "Exchange Rate API" "reference_data.exchange_rates" \
    "europe-west3" "active" "Consolidated from 3 functions"

# Scheduler Jobs tab — update with final state
read-gsheet write "$SHEET_ID" "Scheduler Jobs" "A2:H9" \
    "exchange-rates-daily-sync" "0 7 * * 1-5" "europe-west3" "..."

Results

Cloud Functions: 3 reduced to 1 (exchange rates). Across the project, went from 12 to 9 active functions total after also removing other abandoned ones discovered during the audit.
Scheduler Jobs: 15 reduced to 8. All in one region. Consistent naming. Documented.
GCS intermediate storage: eliminated for this pipeline. The JSON files in GCS were just waste.
Duplicate prevention: partition-level WRITE_TRUNCATE means retries and manual re-runs are safe.
Cost savings: marginal on compute (these are tiny functions), but meaningful in reduced operational overhead — fewer things to monitor, fewer alerts to investigate, fewer logs to search through.

Lessons Learned

The intermediate GCS hop was premature architecture. It was added “for auditability” but nobody ever audited it. If you need an audit trail, BigQuery’s own query history and table snapshots are better tools. Don’t add infrastructure for hypothetical requirements.
Abandoned infrastructure is technical debt with a carrying cost. The unused function and its scheduler job were still running (and failing silently). Each failure generated an alert that was being ignored. Alert fatigue from known-broken systems is how real problems go unnoticed.
Cross-region scheduler jobs are a code smell. If your function is in europe-west3, your scheduler job should be too. Cross-region invocations add latency and make debugging harder.
WRITE_TRUNCATE on a partition is the sweet spot. You get the idempotency of truncate-and-reload without touching historical data. For small dimension tables that refresh daily, this is almost always the right approach.
Schedule for when data is actually used. Running syncs 24/7 when the data is only consumed during business hours wastes resources and generates unnecessary noise. Match your schedule to your consumers.
An architecture map is only useful if it’s maintained. The act of documenting this cleanup was as valuable as the cleanup itself. Six months from now, someone will look at the map and understand why there’s only one exchange rate function, not wonder where the other two went.

The whole cleanup — consolidating functions, auditing schedulers, deleting dead code, updating documentation — took about half a day. The reduction in cognitive load is worth far more than the time spent.