Do you just use AI to generate infrastructure?

No. AI speeds up boilerplate, but I review and test everything. Every line of infrastructure code is audited for security, performance, and maintainability before deployment.

What if my stack is weird?

Next.js to Django to Svelte. If it runs in a container, I can productionize it. I've worked with everything from legacy PHP to modern serverless architectures.

How long does the free audit take?

48 hours. You get a personalized video walking through your codebase with specific security and performance recommendations. Zero meetings required - it's fully async.

Do you work with large teams or just indie hackers?

I focus on indie hackers and small teams (1-5 developers) who built their MVP with AI but need production infrastructure. If you have a DevOps team, you probably don't need me.

What's included in a production launch?

Complete CI/CD pipeline, deployment automation, monitoring and alerting, security hardening, backup and recovery, load testing, and documentation. Everything you need to handle real traffic and sleep at night.

VibeOps - Production-Ready Your AI App | DevOps for Indie Hackers

You shipped your AI-built SaaS to production. Users are signing up. Revenue is starting to flow. Then, at 2 AM on a Tuesday, everything crashes.

Your Slack is on fire. Users are angry. Your Stripe dashboard shows failed payments. And you're frantically SSHing into a server you barely understand, trying to find logs in a directory structure Cursor generated three weeks ago.

This isn't a hypothetical. It's the reality for 80% of AI-built SaaS apps in their first 30 days of production.

I've seen this pattern repeat with dozens of indie hackers: Cursor/Claude writes beautiful code that works perfectly on localhost. You deploy with confidence. Then production tears it apart.

The problem isn't the AI. It's that AI writes code optimized for "working" not "surviving production at scale."

This post breaks down the 7 deadly sins that kill AI-built SaaS apps in production—and the exact fixes that keep you alive.

The 7 Deadly Sins of AI SaaS in Production

AI code generators optimize for localhost happiness. Production demands paranoia, redundancy, and graceful failure. Here are the gaps that kill apps:

Sin #1: Missing Error Boundaries

// ❌ What AI generates (works on localhost)
export default function Dashboard() {
  const { data } = useQuery('/api/metrics')
  return <MetricsChart data={data} />
}

// ✅ What production needs (survives real users)
export default function Dashboard() {
  const { data, error, isLoading } = useQuery('/api/metrics')

  if (error) {
    return <ErrorState message="Failed to load metrics" retry={() => refetch()} />
  }

  if (isLoading) {
    return <SkeletonLoader />
  }

  if (!data || data.length === 0) {
    return <EmptyState />
  }

  return <MetricsChart data={data} />
}

The rule: Every user-facing component needs 4 states: loading, error, empty, success.

Sin #2: Unvalidated User Input

AI assumes inputs are valid. Production users will send you: empty strings, SQL injection attempts, 10MB JSON payloads, emoji-only fields, null values wrapped in strings ("null"), and Unicode edge cases that break your database.

Fix: Validate everything at the API boundary with Zod/Joi/Yup before touching your database:

import { z } from 'zod'

const createUserSchema = z.object({
  email: z.string().email().max(255),
  name: z.string().min(2).max(100).trim(),
  password: z.string().min(8).max(128),
  metadata: z.record(z.unknown()).optional()
})

export async function POST(req: Request) {
  try {
    const body = await req.json()
    const validated = createUserSchema.parse(body)

    // NOW it's safe to use validated data
    const user = await db.users.create(validated)
    return Response.json(user)
  } catch (error) {
    if (error instanceof z.ZodError) {
      return Response.json({ error: error.errors }, { status: 400 })
    }
    return Response.json({ error: 'Internal error' }, { status: 500 })
  }
}

Zod catches 90% of production input bugs before they hit your database.

Sin #3: Infinite Loops in Async Code

This is the #1 killer of AI-generated backends. A useEffect with missing dependencies, a webhook that retries itself, or a background job that never exits.

Real example from an indie hacker: Cursor generated a Stripe webhook handler that re-processed failed payments by calling itself recursively. Within 2 hours, it made 47,000 API calls and hit the rate limit, blocking all legitimate payments.

The Fix: Circuit Breakers

// ❌ Dangerous: Infinite retry
async function processPayment(userId: string) {
  try {
    await stripe.charges.create({ ... })
  } catch (error) {
    await processPayment(userId) // INFINITE LOOP
  }
}

// ✅ Safe: Exponential backoff with max retries
async function processPayment(
  userId: string,
  retryCount = 0,
  maxRetries = 3
) {
  try {
    await stripe.charges.create({ ... })
  } catch (error) {
    if (retryCount >= maxRetries) {
      await logFailure(userId, error)
      await sendAdminAlert(error)
      throw error // Stop retrying
    }

    const delay = Math.pow(2, retryCount) * 1000 // 1s, 2s, 4s
    await sleep(delay)
    return processPayment(userId, retryCount + 1, maxRetries)
  }
}

Always add retry limits and exponential backoff to async operations.

Sin #4: Database Connection Leaks

AI generates database queries without connection pooling or proper cleanup. After 100 requests, you run out of connections and your app hangs.

Example: Proper Connection Pooling

// ❌ AI-generated (leaks connections)
import { Pool } from 'pg'
export async function getUser(id: string) {
  const pool = new Pool() // NEW POOL EVERY REQUEST
  const result = await pool.query('SELECT * FROM users WHERE id = $1', [id])
  return result.rows[0]
}

// ✅ Production-ready (reuses connections)
import { Pool } from 'pg'

const pool = new Pool({
  max: 20, // Connection limit
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000,
})

export async function getUser(id: string) {
  const client = await pool.connect()
  try {
    const result = await client.query('SELECT * FROM users WHERE id = $1', [id])
    return result.rows[0]
  } finally {
    client.release() // CRITICAL: Return to pool
  }
}

// Graceful shutdown
process.on('SIGTERM', async () => {
  await pool.end()
  process.exit(0)
})

One pool per app, not per request. Always release clients back to pool.

Sin #5: Missing Rate Limiting

AI doesn't add rate limiting by default. One malicious user (or bug) can drain your OpenAI credits, overwhelm your database, or trigger a $10,000 Vercel bill overnight.

Minimal Rate Limiting (Works Everywhere)

import rateLimit from 'express-rate-limit'

// Per-IP rate limiting
const limiter = rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 100, // 100 requests per window
  message: 'Too many requests, please try again later',
  standardHeaders: true,
  legacyHeaders: false,
})

app.use('/api/', limiter)

// Per-user rate limiting for expensive operations
const expensiveLimiter = rateLimit({
  windowMs: 60 * 60 * 1000, // 1 hour
  max: 10, // 10 AI generations per hour
  keyGenerator: (req) => req.user?.id || req.ip,
})

app.post('/api/generate', expensiveLimiter, async (req, res) => {
  // OpenAI call here
})

Add rate limiting before you need it. You'll thank yourself at 3 AM.

Sin #6: Hardcoded Secrets in Code

Cursor autocompletes API keys directly into your code. Then you commit them to GitHub. Within minutes, bots find them and drain your accounts.

Real story: An indie hacker accidentally committed their OpenAI key. Within 6 hours, bots racked up $2,400 in API charges running crypto mining prompts.

The Fix: Environment Variables

// ❌ NEVER do this
const openai = new OpenAI({
  apiKey: 'sk-proj-abc123...' // EXPOSED IN GIT HISTORY
})

// ✅ Always use environment variables
const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY
})

// Validate secrets at startup
const requiredEnvVars = [
  'OPENAI_API_KEY',
  'DATABASE_URL',
  'STRIPE_SECRET_KEY'
]

for (const varName of requiredEnvVars) {
  if (!process.env[varName]) {
    throw new Error(`Missing required env var: ${varName}`)
  }
}

If you accidentally commit a secret, rotate it immediately. GitHub history is forever.

Sin #7: No Monitoring or Alerts

Your app crashes at 2 AM. Users are angry on Twitter. You wake up at 9 AM and discover you've been down for 7 hours.

Without monitoring, you're flying blind. You need to know when things break before your users do.

Minimal Monitoring Setup (10 minutes)

// 1. Add Sentry for error tracking
import * as Sentry from "@sentry/node"

Sentry.init({
  dsn: process.env.SENTRY_DSN,
  environment: process.env.NODE_ENV,
  tracesSampleRate: 0.1, // 10% of requests
})

// 2. Add health check endpoint
app.get('/health', async (req, res) => {
  try {
    await db.query('SELECT 1') // Database check
    await redis.ping() // Cache check
    res.json({ status: 'healthy' })
  } catch (error) {
    res.status(503).json({ status: 'unhealthy', error: error.message })
  }
})

// 3. Set up UptimeRobot (free) to ping /health every 5 minutes
// If it fails, you get an email/SMS immediately

This simple setup catches 95% of production issues before users complain.

The 15-Minute Production Survival Audit

Run through this checklist right now. If you answer "no" to any of these, you're at risk:

Survival Checklist

When Things Break: The Debug Workflow

Production debugging is different from localhost debugging. Here's the workflow that actually works:

Step 1: Triage (2 minutes)

Check monitoring dashboard: Is this affecting all users or just one?
Check health endpoint: Are dependencies (DB, Redis, APIs) responding?
Check recent deployments: Did you ship something in the last hour?
Check error logs: What's the most frequent error message?

Step 2: Stop the Bleeding (5 minutes)

If it's a bad deployment: Roll back immediately (don't try to fix forward)
If it's a dependency outage: Enable fallback mode or circuit breaker
If it's one user: Rate limit or block that user temporarily
If it's database overload: Scale up or enable read replicas

Step 3: Root Cause Analysis (After Stability)

Only after the app is stable, dig into the root cause. Check Sentry traces, database slow query logs, and network timing. Most production bugs are timing issues, race conditions, or resource exhaustion—things that never happen on localhost.

Real Recovery Stories

Three indie hackers who survived production disasters:

Story 1: The Infinite Loop

Sarah's AI chat app had a useEffect that fetched chat history on every render. With 2 users, it was fine. With 50 concurrent users, it made 10,000 API calls in 3 minutes and crashed the database.

The fix: Added a ref to track if the fetch was already in progress, preventing duplicate calls. Also added request deduplication at the API layer.

Story 2: The Memory Leak

James's AI image generator cached generated images in-memory. After 200 generations, the Node process ran out of RAM and crashed.

The fix: Switched from in-memory cache to Redis with TTL (time-to-live). Old images automatically expire after 24 hours, freeing up memory.

Story 3: The API Key Leak

Mike committed his OpenAI key to GitHub. Bots found it and racked up $1,800 in charges before he noticed.

The fix: Rotated the key immediately, added git-secrets pre-commit hook to prevent future leaks, and set up billing alerts in OpenAI dashboard (alert at $50, hard cap at $100).

Your Production Survival Kit

These tools catch 90% of production issues before they become disasters:

Essential Tools

Error Tracking: Sentry (free tier) - Catches exceptions and performance issues
Uptime Monitoring: UptimeRobot (free) - Alerts you within 5 minutes of downtime
Rate Limiting: express-rate-limit or upstash/ratelimit - Prevents abuse
Input Validation: Zod - Type-safe validation at API boundaries
Database Pooling: pg-pool or Prisma - Prevents connection leaks
Secrets Management: Doppler or Infisical - Never commit secrets again
Log Aggregation: Axiom or Logtail (free tier) - Searchable logs across all servers

Summary

AI code generators are incredible for building fast. But they optimize for "works on my machine" not "survives real users at 2 AM."

The good news: You don't need a DevOps team to production-harden your AI-built SaaS. You need 3 things:

Error boundaries everywhere (handle loading, error, empty, success states)
Input validation at API boundaries (never trust user data)
Basic monitoring (know when things break before your users do)

These three practices alone will prevent 80% of production disasters.

The other 20%? You'll learn those the hard way. But with error monitoring, you'll catch them fast, fix them once, and move on.

Your app doesn't need to be perfect. It needs to fail gracefully, recover automatically, and alert you when manual intervention is required.

That's the difference between a side project and a real SaaS business.

Drowning in production fires? Get a free async vibe audit—we'll watch your Loom walkthrough, review your repo, and send back a personalized 8–12 minute video + PDF showing exactly where your app will break under real load.

Get Your Free Audit →

Why Your AI SaaS Dies in Production (And How to Fix It)