Auto-Deploying Knowledge Changes Across Repos with GitHub Actions

We have an AI-powered Slack bot that answers business data questions. It uses a knowledge base — schema descriptions, business rules, SQL query patterns — to generate accurate queries. The problem: this knowledge base must stay in sync with two upstream repos. When a BI model changes, the bot needs to know about new dimensions and measures. When dbt models change, the bot needs an updated schema index. Keeping this in sync manually was a recipe for stale knowledge and wrong answers.

The Architecture

Three repos are involved:

BI Modeling Repo — contains semantic model definitions (dimensions, measures, relationships) in a modeling language
Data Warehouse Repo — contains dbt models (197 tables, 7,075 columns across multiple schemas)
Slack Bot Repo — the bot itself, deployed on Cloud Run, with a knowledge.md file that describes the data landscape

The goal: when someone merges a change to the BI models or dbt models, the bot’s knowledge base updates automatically — no manual copy-paste, no stale docs.

Pipeline 1: BI Model Changes to Bot Knowledge

When BI model definitions change (new measures, renamed dimensions, updated relationships), a GitHub Action in the BI repo generates a knowledge snippet and creates a PR in the bot repo.

The Trigger Workflow (BI Repo)

# .github/workflows/sync-knowledge-to-bot.yml
name: Sync Knowledge to Bot

on:
  push:
    branches: [master]
    paths:
      - '**.model.aml'
      - '**.dataset.aml'

jobs:
  sync-knowledge:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout BI repo
        uses: actions/checkout@v4

      - name: Generate knowledge snippet
        run: |
          python scripts/generate_knowledge_snippet.py \
            --models-dir ./models \
            --output /tmp/bi_knowledge_snippet.md

      - name: Checkout bot repo
        uses: actions/checkout@v4
        with:
          repository: our-org/slack-bot
          token: ${{ secrets.BOT_REPO_PAT }}
          path: slack-bot

      - name: Update knowledge file
        run: |
          # Replace the BI section in knowledge.md
          python scripts/update_knowledge_section.py \
            --knowledge-file slack-bot/knowledge.md \
            --section "BI Models" \
            --content /tmp/bi_knowledge_snippet.md

      - name: Create PR in bot repo
        uses: peter-evans/create-pull-request@v6
        with:
          path: slack-bot
          token: ${{ secrets.BOT_REPO_PAT }}
          branch: auto/sync-bi-knowledge
          delete-branch: true
          title: "chore: sync BI model knowledge"
          body: |
            Automated PR — BI model definitions changed in the modeling repo.

            Updated sections in `knowledge.md`:
            - Available dimensions and measures
            - Dataset relationships
            - Query patterns

            Triggered by: ${{ github.event.head_commit.url }}
          commit-message: "chore: sync BI model knowledge from modeling repo"

Auto-Merging the PR

Since these are automated knowledge updates (not code changes), I set up auto-merge. A second workflow in the bot repo handles this:

# .github/workflows/auto-merge-knowledge-prs.yml (in bot repo)
name: Auto-merge Knowledge PRs

on:
  pull_request:
    types: [opened, synchronize]

jobs:
  auto-merge:
    if: startsWith(github.head_ref, 'auto/sync-')
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Validate knowledge file
        run: |
          # Basic sanity checks
          python -c "
          import sys
          content = open('knowledge.md').read()
          # Must have all required sections
          required = ['## BI Models', '## Schema Index', '## Business Rules']
          missing = [s for s in required if s not in content]
          if missing:
              print(f'Missing sections: {missing}')
              sys.exit(1)
          # Must not be empty/truncated
          if len(content) < 1000:
              print(f'Knowledge file suspiciously small: {len(content)} chars')
              sys.exit(1)
          print(f'Validation passed: {len(content)} chars, all sections present')
          "

      - name: Enable auto-merge
        run: gh pr merge --squash --auto "${{ github.event.pull_request.number }}"
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

Pipeline 2: dbt Schema Changes to Bot Knowledge

The dbt pipeline is more complex. When models change, we need to regenerate a schema index — a summary of every table and column the bot might need to query. This runs as a post-merge job in the data warehouse repo.

Generating the Schema Index

A Python script parses dbt's manifest.json (generated during CI) and produces a compact schema summary:

# scripts/generate_schema_index.py
import json
import sys

def generate_index(manifest_path, output_path):
    with open(manifest_path) as f:
        manifest = json.load(f)

    tables = {}
    for node_id, node in manifest["nodes"].items():
        if node["resource_type"] != "model":
            continue

        schema = node["schema"]
        table_name = node["name"]
        columns = sorted(node.get("columns", {}).keys())
        description = node.get("description", "")

        key = f"{schema}.{table_name}"
        tables[key] = {
            "description": description,
            "columns": columns,
            "layer": node.get("fqn", [None, None])[1],  # e.g. "08_analytics_bizlogic"
        }

    # Generate markdown summary
    lines = [f"## Schema Index\n"]
    lines.append(f"*{len(tables)} tables, "
                 f"{sum(len(t['columns']) for t in tables.values())} columns*\n")

    for key in sorted(tables.keys()):
        t = tables[key]
        lines.append(f"### {key}")
        if t["description"]:
            lines.append(f"{t['description']}\n")
        lines.append(f"Layer: {t['layer']}")
        lines.append(f"Columns: {', '.join(t['columns'])}\n")

    with open(output_path, "w") as f:
        f.write("\n".join(lines))

    print(f"Generated index: {len(tables)} tables")

if __name__ == "__main__":
    generate_index(sys.argv[1], sys.argv[2])

The Sync Workflow (Data Warehouse Repo)

# .github/workflows/sync-schema-to-bot.yml
name: Sync Schema to Bot

on:
  push:
    branches: [master]
    paths:
      - 'models/**'
      - 'macros/**'

jobs:
  generate-and-sync:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout warehouse repo
        uses: actions/checkout@v4

      - name: Download latest manifest
        run: |
          # Fetch the manifest artifact from the most recent CI run
          # (generated by dbt compile during CI)
          python scripts/fetch_ci_artifact.py \
            --job-id ${{ vars.CI_JOB_ID }} \
            --artifact manifest.json \
            --output /tmp/manifest.json

      - name: Generate schema index
        run: |
          python scripts/generate_schema_index.py \
            /tmp/manifest.json \
            /tmp/schema_index.md

      - name: Checkout bot repo
        uses: actions/checkout@v4
        with:
          repository: our-org/slack-bot
          token: ${{ secrets.BOT_REPO_PAT }}
          path: slack-bot

      - name: Update knowledge file
        run: |
          python scripts/update_knowledge_section.py \
            --knowledge-file slack-bot/knowledge.md \
            --section "Schema Index" \
            --content /tmp/schema_index.md

      - name: Create PR in bot repo
        uses: peter-evans/create-pull-request@v6
        with:
          path: slack-bot
          token: ${{ secrets.BOT_REPO_PAT }}
          branch: auto/sync-schema-index
          delete-branch: true
          title: "chore: sync dbt schema index"
          body: |
            Automated PR — dbt models changed in the data warehouse repo.

            Updated schema index in `knowledge.md`.

            Triggered by: ${{ github.event.head_commit.url }}
          commit-message: "chore: sync dbt schema index from warehouse repo"

Pipeline 3: Auto-Deploy After Merge

Once a knowledge PR is squash-merged into the bot repo's master branch, a deploy workflow kicks off automatically:

# .github/workflows/deploy.yml (in bot repo)
name: Deploy to Cloud Run

on:
  push:
    branches: [master]

jobs:
  deploy:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      id-token: write

    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Authenticate to GCP
        uses: google-github-actions/auth@v2
        with:
          workload_identity_provider: ${{ vars.WIF_PROVIDER }}
          service_account: ${{ vars.DEPLOY_SA }}

      - name: Set up Cloud SDK
        uses: google-github-actions/setup-gcloud@v2

      - name: Build and push container
        run: |
          gcloud builds submit \
            --tag europe-west3-docker.pkg.dev/$PROJECT/bot/slack-bot:${{ github.sha }} \
            --timeout=600 \
            2>&1 | tee /tmp/build.log

          # Cloud Build may exit non-zero due to VPC Service Controls
          # log streaming errors, even when the build succeeds.
          # Check the actual build status instead of trusting the exit code.
          BUILD_ID=$(grep -oP 'BUILD_ID: \K[a-f0-9-]+' /tmp/build.log || true)
          if [ -n "$BUILD_ID" ]; then
            STATUS=$(gcloud builds describe "$BUILD_ID" \
              --format='value(status)' 2>/dev/null || echo "UNKNOWN")
            if [ "$STATUS" = "SUCCESS" ]; then
              echo "Build succeeded (despite non-zero exit from log streaming)"
            else
              echo "Build status: $STATUS"
              exit 1
            fi
          fi

      - name: Deploy to Cloud Run
        uses: google-github-actions/deploy-cloudrun@v2
        with:
          service: slack-bot
          region: europe-west3
          image: europe-west3-docker.pkg.dev/${{ vars.PROJECT }}/bot/slack-bot:${{ github.sha }}

The Cloud Build Exit Code Problem

This one cost me an hour of debugging. gcloud builds submit streams build logs in real time. When VPC Service Controls are enabled on the project, the log streaming connection sometimes gets interrupted — and gcloud translates this into a non-zero exit code, even though the build itself completed successfully.

The fix is to not trust the exit code of gcloud builds submit. Instead, capture the build ID from the output and explicitly check the build status with gcloud builds describe. If the status is SUCCESS, proceed. This pattern is ugly but reliable:

# Don't do this:
gcloud builds submit --tag $IMAGE || exit 1

# Do this instead:
gcloud builds submit --tag $IMAGE 2>&1 | tee /tmp/build.log || true
BUILD_ID=$(grep -oP 'BUILD_ID: \K[a-f0-9-]+' /tmp/build.log)
STATUS=$(gcloud builds describe "$BUILD_ID" --format='value(status)')
if [ "$STATUS" != "SUCCESS" ]; then
    echo "Build failed with status: $STATUS"
    exit 1
fi

The Section Replacement Script

Both pipelines use a shared script to replace a specific section in knowledge.md. This keeps the rest of the file (manually written business rules, query patterns) untouched:

# scripts/update_knowledge_section.py
"""
Replace a named section in a markdown file with new content.
Sections are delimited by ## headings.
"""
import argparse
import re

def update_section(knowledge_path, section_name, content_path):
    with open(knowledge_path) as f:
        original = f.read()

    with open(content_path) as f:
        new_content = f.read().strip()

    # Pattern: from "## Section Name" to the next "## " or end of file
    pattern = re.compile(
        rf'(## {re.escape(section_name)}\n)(.*?)(?=\n## |\Z)',
        re.DOTALL,
    )

    if pattern.search(original):
        updated = pattern.sub(rf'\1{new_content}\n', original)
    else:
        # Section doesn't exist yet — append it
        updated = original.rstrip() + f'\n\n## {section_name}\n{new_content}\n'

    with open(knowledge_path, 'w') as f:
        f.write(updated)

    print(f"Updated section '{section_name}' in {knowledge_path}")

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--knowledge-file", required=True)
    parser.add_argument("--section", required=True)
    parser.add_argument("--content", required=True)
    args = parser.parse_args()
    update_section(args.knowledge_file, args.section, args.content)

End-to-End Flow

Here's how it all connects:

Developer merges a BI model change (e.g., adds a new measure) to the BI repo's master branch
GitHub Action in BI repo generates a knowledge snippet and creates a PR in the bot repo
Auto-merge workflow in the bot repo validates the knowledge file and squash-merges the PR
The merge triggers the deploy workflow, which builds a new container and deploys to Cloud Run
The bot is now live with updated knowledge — typically within 5-8 minutes of the original merge

The same flow applies to dbt schema changes, just through a different trigger workflow.

Lessons Learned

Cross-repo PRs need a Personal Access Token (PAT) with repo scope. The default GITHUB_TOKEN can only access the current repo. Store the PAT as a repo secret and use it for the actions/checkout and create-pull-request steps.
Use delete-branch: true on auto-PRs. Without it, the branch accumulates stale state. With it, each PR starts fresh from the latest master.
Validate before auto-merging. Even automated changes can break things. The validation step (checking required sections, minimum file size) catches truncation bugs and script errors before they reach production.
Don't trust gcloud builds submit exit codes. VPC Service Controls can cause false failures. Always verify the actual build status via the API.
Squash-merge auto-PRs. This keeps the bot repo's git history clean — one commit per sync, not a chain of automated commits.

The entire pipeline runs hands-free. BI model changes, dbt schema changes, and bot deployments are all connected. The bot's knowledge stays current, and nobody has to remember to copy files between repos.