CI/CD Pipeline 10x Faster: From 62 to 6 Minutes
Continuous Integration and Continuous Deployment pipelines are the backbone of modern software delivery. When your pipeline lags, everything else lags with it — feedback loops, bug fixes, release cadence, team morale. We hit that wall hard. Here's how we cut ours from over an hour to six minutes.
Step 1: Find the real bottleneck
The first thing we did was stop guessing. We instrumented every stage and measured it. The results surprised us:
- Tests: 72% of total pipeline time
- Build + image push: 19%
- Infra provisioning: 6%
- Everything else: 3%
If you don't measure, you end up optimizing the wrong stage. We almost spent a week on Docker layer caching before realizing it would have saved us ninety seconds on an hour-long pipeline.
Step 2: Parallelize the test suite
Our tests ran serially on a single runner. Moving to parallel test sharding across four runners was the single biggest win — cut testing time by 68%.
jobs:
test:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
shard: [1, 2, 3, 4]
steps:
- uses: actions/checkout@v4
- uses: oven-sh/setup-bun@v1
- run: bun install --frozen-lockfile
- run: bun test --shard=${{ matrix.shard }}/4
We also killed a dozen tests that were flaky or testing framework behavior instead of our own code. Fewer tests, faster signal.
Step 3: Cache the Docker build layer
Our Docker builds were starting from scratch every time. Adding BuildKit layer caching to a remote registry dropped image builds from four minutes to forty seconds:
- uses: docker/build-push-action@v5
with:
context: .
push: true
tags: ${{ env.IMAGE }}:${{ github.sha }}
cache-from: type=registry,ref=${{ env.IMAGE }}:buildcache
cache-to: type=registry,ref=${{ env.IMAGE }}:buildcache,mode=max
Step 4: Provision infrastructure as code
Environment provisioning used to require a human in the loop. We moved to Terraform so the pipeline could spin up a preview environment for every pull request:
resource "fly_app" "staging" {
name = "app-staging-${var.pr_number}"
org = "personal"
}
resource "fly_machine" "api" {
app = fly_app.staging.name
region = "ams"
image = var.image
}
Every PR gets its own preview environment, destroyed automatically when the PR closes. Reviewers stop asking "can you deploy this somewhere I can click?"
Step 5: Monitor so it stays fast
Pipelines rot. We wired Prometheus to scrape pipeline metrics and built a Grafana dashboard that alerts us when any stage creeps past its budget. If tests drift back toward fifteen minutes, we know before it becomes a daily annoyance.
The result
From 62 minutes to 6 minutes. A 10x improvement, achieved through four boring changes: measure, parallelize, cache, automate. No magic. No new framework. Just ruthless attention to where the time was actually going.
The lesson: faster pipelines aren't a one-time project. They're a habit. Measure weekly, cut ruthlessly, and don't let your CI slow to a crawl while you're not looking.