Production Is a Different Game
Everything changes when real users depend on your system. The code that worked perfectly in development fails under load. The deployment that took 30 seconds now requires a rollback plan. The bug that was mildly annoying in staging is costing a customer $10,000 per hour in production.
This final part of our SaaS Cookbook covers the operational infrastructure that separates hobby projects from production-grade SaaS.
CI/CD: Ship Fast, Ship Safe
Your deployment pipeline should enable deploying to production multiple times per day with confidence:
Pipeline Stages:
1. Lint + Type Check (30s) — Catch obvious errors
2. Unit Tests (60s) — Business logic validation
3. Integration Tests (90s) — API and database tests
4. Build (60s) — Production build
5. Preview Deploy (auto) — Every PR gets a preview URL
6. E2E Tests on Preview (3m) — Critical path validation
7. Production Deploy (60s) — After merge to main
8. Post-deploy Checks (30s) — Health checks + smoke testsTotal pipeline time target: under 8 minutes from push to production. Anything longer and developers stop deploying frequently, which paradoxically makes each deployment riskier.
Monitoring: Know Before Your Users Do
The monitoring stack for production SaaS:
- Uptime monitoring: External checks every 60 seconds from multiple regions. We use Better Uptime or Checkly. Alert within 2 minutes of downtime
- Application Performance Monitoring (APM): Track response times, error rates, and throughput. Sentry for error tracking, Vercel Analytics or Datadog for APM
- Business metrics dashboard: Real-time display of signups, active users, revenue, and API usage. This is the dashboard you check every morning
- Log aggregation: Centralized, searchable logs. Essential for debugging production issues. Axiom or Betterstack Logs
Alerting rules
Configure alerts for these conditions:
- Error rate exceeds 1% of requests (warning) or 5% (critical)
- P95 response time exceeds 2 seconds
- Database connection pool utilization exceeds 80%
- Disk usage exceeds 85%
- Payment webhook processing failures
- Zero signups for 24 hours (something's broken or traffic dropped)
Database Scaling
PostgreSQL handles more load than most people think, but you need to prepare:
Before you need to scale
- Connection pooling: Use PgBouncer or Supabase's built-in pooler. Without pooling, you'll exhaust connections at ~100 concurrent users
- Read replicas: Route read-heavy queries (dashboards, reports, search) to replicas. This alone can 5x your database capacity
- Query optimization: Monitor slow queries weekly. Add indexes for queries that scan more than 10% of a table. Use EXPLAIN ANALYZE, not guesswork
- Archival strategy: Move historical data (logs, events, old audit trails) to cold storage. Keep your hot tables lean
When you actually need to scale
If you've optimized queries and added read replicas and you're still hitting limits, consider:
- Vertical scaling: Bigger database instance. Simple, effective, and often sufficient up to millions of rows
- Table partitioning: Split large tables by date or tenant. PostgreSQL native partitioning handles this well
- Caching layer: Redis for frequently accessed data (session data, feature flags, usage counters). Don't cache until you've measured that the database is actually the bottleneck
Incident Response
When things go wrong (and they will), have a playbook:
- Detect: Automated alerts trigger within 2 minutes
- Acknowledge: On-call engineer acknowledges within 15 minutes
- Communicate: Status page updated. Affected customers notified if impact exceeds 5 minutes
- Mitigate: Rollback, feature flag, or hotfix. Prioritize restoring service over finding root cause
- Resolve: Root cause identified and permanent fix deployed
- Review: Blameless post-mortem within 48 hours. Document what happened, why, and what changes prevent recurrence
Security Checklist for Launch
Before going to production, verify:
- All API endpoints require authentication (except public routes)
- Rate limiting on all public endpoints (login, signup, API)
- HTTPS everywhere, HSTS headers enabled
- Database credentials rotated from development defaults
- Secrets in environment variables, never in code
- Dependency audit (npm audit, Snyk) with zero critical vulnerabilities
- CORS configured to allow only your domains
- Webhook signature verification enabled
- Automated database backups with tested restore procedure
The Launch Checklist
This is the checklist we run through with every SaaS product before the first paying customer:
- Authentication flows tested (signup, login, reset, MFA)
- Billing flows tested (subscribe, upgrade, downgrade, cancel, failed payment)
- Onboarding checklist functional and tracked
- Email notifications working (transactional + triggered sequences)
- Monitoring and alerting configured and verified
- Status page live
- Terms of service and privacy policy published
- Support channel established (email at minimum, live chat for premium)
- First 24-hour on-call rotation scheduled
Building a SaaS product is a marathon, not a sprint. The architecture, auth, billing, growth, and operational foundations covered in this series give you the best possible starting position. The rest is iteration, customer feedback, and relentless focus on delivering value.
If you're building a SaaS product and want a team that's done this dozens of times, we'd love to hear about your project.
Share this article
Enjoyed this article?
Subscribe to get our latest insights on enterprise tech and digital transformation.