Fly.io + Supabase: The Reliability Stack That Scales
Fly.io and Supabase together form a powerful stack for building global applications. But getting reliability right requires understanding their operational characteristics.
Why This Stack?
Fly.io gives you globally distributed compute with Machines that boot in milliseconds. Supabase provides Postgres with real-time capabilities and built-in auth. Together: fast, simple, and scalable.
Architecture Patterns
Regional Deployment
Deploy Fly Machines in the same region as your Supabase instance. Network latency between compute and database kills performance.
# fly.toml
primary_region = "sjc" # Same as Supabase
min_machines_running = 1 # Prevent cold starts
auto_stop_machines = false
Connection Pooling
Use Supabase's connection pooler in transaction mode. Don't open connections directly to Postgres from every Machine.
DATABASE_URL=postgres://...supabase.co:6543/postgres
# Port 6543 = Pooler in transaction mode
Read Replicas for Scale
For read-heavy workloads, deploy Fly Machines near users and use Supabase read replicas. Keep writes regional, spread reads globally.
Reliability Checklist
- ✅ Health checks configured (Fly monitors /health endpoint)
- ✅ Grace period for shutdown (drain connections before kill)
- ✅ Database connection retries with exponential backoff
- ✅ Request timeouts (prevent hanging connections)
- ✅ Structured logging (ship to Axiom or Logflare)
Common Pitfalls
1. Auto-stop Machines = Cold Start Latency
Fly auto-stops idle Machines by default. For user-facing apps, keep at least 1 Machine running. The cost is minimal, the UX gain is massive.
2. Missing Connection Limits
Supabase free tier: 60 connections. If your Machines don't pool properly, you'll hit limits fast. Monitor pg_stat_activity.
3. Not Using Fly Volumes for State
Machines are ephemeral. Use Fly Volumes for persistent storage. Back them up regularly - volumes aren't replicated.
Monitoring
Key metrics to track:
- Fly Machine response time (p50, p95, p99)
- Supabase active connections
- Database query duration
- Error rates and types
This stack is production-ready out of the box, but reliability comes from understanding the operational details.