Home Data EngineeringConsolidating GCP Service Accounts: Fewer Keys, Less Chaos

Consolidating GCP Service Accounts: Fewer Keys, Less Chaos

by Marc

We had four service accounts for BigQuery, one per GCP project. Each had its own JSON key file, its own set of IAM roles, and its own set of problems. When a local CLI tool needed to query production data, it used one key. When it needed to write to the raw data project, it used a different key. Our Slack bot had two keys mounted as secrets. The monitoring scripts had yet another. Every new tool or integration meant figuring out which key to use, and getting it wrong meant cryptic PERMISSION_DENIED errors that could take 20 minutes to debug.

I consolidated everything into a single service account with cross-project roles. Here is how.

Before: The SA Sprawl

| Service Account                      | Project         | Key File                    | Used By                    |
|--------------------------------------|-----------------|-----------------------------|----------------------------|
| claude-code@project-production       | production      | claude-code-production.json | CLI tools, Slack bot (BQ)  |
| claude-code@project-raw              | raw-data        | claude-code.json            | CLI tools, Cloud Functions |
| claude-code@project-dev              | dev-personal    | claude-code-dev.json        | dbt dev, testing           |
| claude-code@project-data             | shared-data     | claude-code-data.json       | Historical queries         |

Four keys on disk. Four SAs to remember. And the fun part: the “shared-data” project SA had intermittent permission issues that required switching to a personal user account instead. So some scripts used service account auth and some used user auth, depending on which project they targeted. It was a mess.

After: One SA, Cross-Project Roles

The consolidated SA lives in the raw-data project (where most writes happen) and has carefully scoped roles granted across all four projects.

| Project         | Roles                                              |
|-----------------|----------------------------------------------------|
| production      | bigquery.dataViewer, bigquery.user                 |
| shared-data     | bigquery.dataViewer, bigquery.user                 |
| dev-personal    | bigquery.dataViewer, bigquery.user, bigquery.dataEditor |
| raw-data        | bigquery.dataViewer, bigquery.user, bigquery.jobUser   |
|                 | + dataset-level WRITER on: usage_logs, crm_sync,      |
|                 |   bot_logs, external_sync                              |

The key principle: read access everywhere, write access scoped to specific datasets in the raw project only. The dev project gets dataEditor because dbt needs to create and drop tables during development. The raw project gets jobUser (to run queries) plus dataset-level WRITER grants on exactly the four datasets that services write to.

The IAM Commands

#!/bin/bash
# consolidate-sa.sh — Grant cross-project roles to the consolidated SA

SA="[email protected]"

# Production project: read-only
gcloud projects add-iam-policy-binding production-project \
    --member="serviceAccount:${SA}" \
    --role="roles/bigquery.dataViewer" \
    --condition=None --quiet

gcloud projects add-iam-policy-binding production-project \
    --member="serviceAccount:${SA}" \
    --role="roles/bigquery.user" \
    --condition=None --quiet

# Shared data project: read-only
gcloud projects add-iam-policy-binding shared-data-project \
    --member="serviceAccount:${SA}" \
    --role="roles/bigquery.dataViewer" \
    --condition=None --quiet

gcloud projects add-iam-policy-binding shared-data-project \
    --member="serviceAccount:${SA}" \
    --role="roles/bigquery.user" \
    --condition=None --quiet

# Dev project: read + write (dbt needs to create tables)
gcloud projects add-iam-policy-binding dev-project \
    --member="serviceAccount:${SA}" \
    --role="roles/bigquery.dataViewer" \
    --condition=None --quiet

gcloud projects add-iam-policy-binding dev-project \
    --member="serviceAccount:${SA}" \
    --role="roles/bigquery.user" \
    --condition=None --quiet

gcloud projects add-iam-policy-binding dev-project \
    --member="serviceAccount:${SA}" \
    --role="roles/bigquery.dataEditor" \
    --condition=None --quiet

# Raw project: dataset-level write grants (not project-wide)
for DATASET in usage_logs crm_sync bot_logs external_sync; do
    bq update --project_id=raw-project \
        --source <(bq show --format=prettyjson raw-project:${DATASET} | \
        python3 -c "
import sys, json
d = json.load(sys.stdin)
d['access'].append({
    'role': 'WRITER',
    'userByEmail': '${SA}'
})
json.dump(d, sys.stdout)
        ") \
        raw-project:${DATASET}
done

echo "Done. Consolidated SA has cross-project access."

Revoking the Old SAs

I did not delete the old SAs immediately. Instead, I revoked their IAM bindings but left the key files on disk (renamed with a .revoked suffix). This way, if something broke that I hadn't accounted for, the error message would say PERMISSION_DENIED for the old SA, and I could trace exactly which tool was still using the wrong key.

Monitoring for Stragglers

To catch any remaining references to the old SAs, I wrote a monitoring script that checks Cloud Logging for PERMISSION_DENIED errors from the revoked service accounts. It runs daily via cron with a 2-week expiry (after which I'll remove it -- if nothing has broken in two weeks, nothing will).

#!/bin/bash
# monitor-revoked-sas.sh — Check Cloud Logging for PERMISSION_DENIED
# on revoked service accounts. Alerts via Slack DM if any are found.

REVOKED_SAS=(
    "[email protected]"
    "[email protected]"
    "[email protected]"
)

SLACK_WEBHOOK_URL="${SLACK_WEBHOOK_URL}"  # from environment
PROJECTS=("production-project" "shared-data-project" "dev-project" "raw-project")
FOUND_ISSUES=0
REPORT=""

for project in "${PROJECTS[@]}"; do
    for sa in "${REVOKED_SAS[@]}"; do
        # Query Cloud Logging for PERMISSION_DENIED in the last 24 hours
        RESULTS=$(gcloud logging read \
            "protoPayload.authenticationInfo.principalEmail=\"${sa}\" AND severity>=ERROR AND timestamp>=\"$(date -u -v-1d '+%Y-%m-%dT%H:%M:%SZ')\"" \
            --project="${project}" \
            --format="value(timestamp,protoPayload.methodName)" \
            --limit=5 2>/dev/null)

        if [ -n "$RESULTS" ]; then
            FOUND_ISSUES=$((FOUND_ISSUES + 1))
            REPORT="${REPORT}\n*${sa}* on \`${project}\`:\n\`\`\`${RESULTS}\`\`\`\n"
        fi
    done
done

if [ $FOUND_ISSUES -gt 0 ]; then
    # Send Slack alert
    curl -s -X POST "$SLACK_WEBHOOK_URL" \
        -H "Content-Type: application/json" \
        -d "{
            \"text\": \"SA Monitor: ${FOUND_ISSUES} revoked SA(s) still being used\",
            \"blocks\": [{
                \"type\": \"section\",
                \"text\": {
                    \"type\": \"mrkdwn\",
                    \"text\": \"*Revoked SA Activity Detected*\n${REPORT}\nThese tools need to be updated to use the consolidated SA.\"
                }
            }]
        }"
    echo "Alert sent: ${FOUND_ISSUES} issues found"
else
    echo "No revoked SA usage detected"
fi

Updating Downstream Services

The Slack bot was the main consumer that needed updating. It had two separate BigQuery service account keys mounted as Cloud Run secrets -- one for production reads, one for raw-data writes. I updated both to point to the consolidated SA's key in Secret Manager. One secret instead of two, one SA instead of two, same access.

The CLI tools were simpler: I updated the default key path in their config files and tested each one against each project.

The Cron Setup

# Crontab entry: run daily at 9:13am, auto-remove after 2 weeks
# (the script checks its own age and exits if >14 days old)

13 9 * * * /home/user/scripts/monitor-revoked-sas.sh >> /tmp/sa-monitor.log 2>&1

The script includes a self-expiry check at the top:

# Self-expiry: stop running after the grace period
CREATED="2026-03-12"
EXPIRY_DAYS=14
DAYS_OLD=$(( ( $(date +%s) - $(date -j -f "%Y-%m-%d" "$CREATED" +%s) ) / 86400 ))
if [ "$DAYS_OLD" -gt "$EXPIRY_DAYS" ]; then
    echo "$(date): Monitor expired after ${EXPIRY_DAYS} days. Remove cron entry."
    exit 0
fi

Results

  • 4 service accounts consolidated to 1
  • 4 key files reduced to 1 (3 old ones renamed to .revoked)
  • 2 Cloud Run secrets reduced to 1
  • Zero PERMISSION_DENIED alerts in the 2-week monitoring window
  • New tool setup time: zero -- every tool uses the same key, no more "which SA do I need for this project?"

The key takeaway: service account sprawl creeps up on you. Every new integration adds "just one more SA" until you have a drawer full of JSON key files and nobody remembers which does what. Consolidating to one SA with cross-project roles and dataset-level write grants gives you the same security boundaries with far less operational overhead. Just make sure to monitor for stragglers before you delete the old keys.

You may also like

Leave a Comment