SaaS Scaling & Production Ops Guide | CodiHaus

Production Is a Different Game

Everything changes when real users depend on your system. The code that worked perfectly in development fails under load. The deployment that took 30 seconds now requires a rollback plan. The bug that was mildly annoying in staging is costing a customer $10,000 per hour in production.

This final part of our SaaS Cookbook covers the operational infrastructure that separates hobby projects from production-grade SaaS.

CI/CD: Ship Fast, Ship Safe

Your deployment pipeline should enable deploying to production multiple times per day with confidence:

Pipeline Stages:
1. Lint + Type Check     (30s)  — Catch obvious errors
2. Unit Tests            (60s)  — Business logic validation
3. Integration Tests     (90s)  — API and database tests
4. Build                 (60s)  — Production build
5. Preview Deploy        (auto) — Every PR gets a preview URL
6. E2E Tests on Preview  (3m)   — Critical path validation
7. Production Deploy     (60s)  — After merge to main
8. Post-deploy Checks    (30s)  — Health checks + smoke tests

Total pipeline time target: under 8 minutes from push to production. Anything longer and developers stop deploying frequently, which paradoxically makes each deployment riskier.

Monitoring: Know Before Your Users Do

The monitoring stack for production SaaS:

Uptime monitoring: External checks every 60 seconds from multiple regions. We use Better Uptime or Checkly. Alert within 2 minutes of downtime
Application Performance Monitoring (APM): Track response times, error rates, and throughput. Sentry for error tracking, Vercel Analytics or Datadog for APM
Business metrics dashboard: Real-time display of signups, active users, revenue, and API usage. This is the dashboard you check every morning
Log aggregation: Centralized, searchable logs. Essential for debugging production issues. Axiom or Betterstack Logs

Alerting rules

Configure alerts for these conditions:

Error rate exceeds 1% of requests (warning) or 5% (critical)
P95 response time exceeds 2 seconds
Database connection pool utilization exceeds 80%
Disk usage exceeds 85%
Payment webhook processing failures
Zero signups for 24 hours (something's broken or traffic dropped)

Database Scaling

PostgreSQL handles more load than most people think, but you need to prepare:

Before you need to scale

Connection pooling: Use PgBouncer or Supabase's built-in pooler. Without pooling, you'll exhaust connections at ~100 concurrent users
Read replicas: Route read-heavy queries (dashboards, reports, search) to replicas. This alone can 5x your database capacity
Query optimization: Monitor slow queries weekly. Add indexes for queries that scan more than 10% of a table. Use EXPLAIN ANALYZE, not guesswork
Archival strategy: Move historical data (logs, events, old audit trails) to cold storage. Keep your hot tables lean

When you actually need to scale

If you've optimized queries and added read replicas and you're still hitting limits, consider:

Vertical scaling: Bigger database instance. Simple, effective, and often sufficient up to millions of rows
Table partitioning: Split large tables by date or tenant. PostgreSQL native partitioning handles this well
Caching layer: Redis for frequently accessed data (session data, feature flags, usage counters). Don't cache until you've measured that the database is actually the bottleneck

Incident Response

When things go wrong (and they will), have a playbook:

Detect: Automated alerts trigger within 2 minutes
Acknowledge: On-call engineer acknowledges within 15 minutes
Communicate: Status page updated. Affected customers notified if impact exceeds 5 minutes
Mitigate: Rollback, feature flag, or hotfix. Prioritize restoring service over finding root cause
Resolve: Root cause identified and permanent fix deployed
Review: Blameless post-mortem within 48 hours. Document what happened, why, and what changes prevent recurrence

Security Checklist for Launch

Before going to production, verify:

All API endpoints require authentication (except public routes)
Rate limiting on all public endpoints (login, signup, API)
HTTPS everywhere, HSTS headers enabled
Database credentials rotated from development defaults
Secrets in environment variables, never in code
Dependency audit (npm audit, Snyk) with zero critical vulnerabilities
CORS configured to allow only your domains
Webhook signature verification enabled
Automated database backups with tested restore procedure

The Launch Checklist

This is the checklist we run through with every SaaS product before the first paying customer:

Authentication flows tested (signup, login, reset, MFA)
Billing flows tested (subscribe, upgrade, downgrade, cancel, failed payment)
Onboarding checklist functional and tracked
Email notifications working (transactional + triggered sequences)
Monitoring and alerting configured and verified
Status page live
Terms of service and privacy policy published
Support channel established (email at minimum, live chat for premium)
First 24-hour on-call rotation scheduled

Building a SaaS product is a marathon, not a sprint. The architecture, auth, billing, growth, and operational foundations covered in this series give you the best possible starting position. The rest is iteration, customer feedback, and relentless focus on delivering value.

If you're building a SaaS product and want a team that's done this dozens of times, we'd love to hear about your project.

Topics

SaaS DevOps Cloud

Share this article

Newsletter

Enjoyed this article?

Subscribe to get our latest insights on enterprise tech and digital transformation.

The Complete SaaS Cookbook, Part 5: Scaling to Production & Beyond

Production Is a Different Game

CI/CD: Ship Fast, Ship Safe

Monitoring: Know Before Your Users Do

Alerting rules

Database Scaling

Before you need to scale

When you actually need to scale

Incident Response

Security Checklist for Launch

The Launch Checklist

Enjoyed this article?

Related Articles

AI Makes Developers 2X Faster: The Productivity Data Every Engineering Leader Needs

The Complete SaaS Cookbook, Part 1: Architecture & Tech Stack

Expertise

Company

Resources