CI/CD Pipeline 10x Faster: From 62 to 6 Minutes

Continuous Integration and Continuous Deployment pipelines are the backbone of modern software delivery. When your pipeline lags, everything else lags with it — feedback loops, bug fixes, release cadence, team morale. We hit that wall hard. Here's how we cut ours from over an hour to six minutes.

Step 1: Find the real bottleneck

The first thing we did was stop guessing. We instrumented every stage and measured it. The results surprised us:

If you don't measure, you end up optimizing the wrong stage. We almost spent a week on Docker layer caching before realizing it would have saved us ninety seconds on an hour-long pipeline.

Step 2: Parallelize the test suite

Our tests ran serially on a single runner. Moving to parallel test sharding across four runners was the single biggest win — cut testing time by 68%.

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        shard: [1, 2, 3, 4]
    steps:
      - uses: actions/checkout@v4
      - uses: oven-sh/setup-bun@v1
      - run: bun install --frozen-lockfile
      - run: bun test --shard=${{ matrix.shard }}/4

We also killed a dozen tests that were flaky or testing framework behavior instead of our own code. Fewer tests, faster signal.

Step 3: Cache the Docker build layer

Our Docker builds were starting from scratch every time. Adding BuildKit layer caching to a remote registry dropped image builds from four minutes to forty seconds:

- uses: docker/build-push-action@v5
  with:
    context: .
    push: true
    tags: ${{ env.IMAGE }}:${{ github.sha }}
    cache-from: type=registry,ref=${{ env.IMAGE }}:buildcache
    cache-to: type=registry,ref=${{ env.IMAGE }}:buildcache,mode=max

Step 4: Provision infrastructure as code

Environment provisioning used to require a human in the loop. We moved to Terraform so the pipeline could spin up a preview environment for every pull request:

resource "fly_app" "staging" {
  name = "app-staging-${var.pr_number}"
  org  = "personal"
}

resource "fly_machine" "api" {
  app    = fly_app.staging.name
  region = "ams"
  image  = var.image
}

Every PR gets its own preview environment, destroyed automatically when the PR closes. Reviewers stop asking "can you deploy this somewhere I can click?"

Step 5: Monitor so it stays fast

Pipelines rot. We wired Prometheus to scrape pipeline metrics and built a Grafana dashboard that alerts us when any stage creeps past its budget. If tests drift back toward fifteen minutes, we know before it becomes a daily annoyance.

The result

From 62 minutes to 6 minutes. A 10x improvement, achieved through four boring changes: measure, parallelize, cache, automate. No magic. No new framework. Just ruthless attention to where the time was actually going.

The lesson: faster pipelines aren't a one-time project. They're a habit. Measure weekly, cut ruthlessly, and don't let your CI slow to a crawl while you're not looking.