Canary Deployments Are Just Hiding Your Bugs Behind Innocent Users

In the old days, we deployed software the honest way: we pushed the binary to production, held our breath, and hoped for the best. Sometimes it worked. Sometimes it didn’t. The whole company found out together. That was community.

Now we have “progressive delivery.” We have “canary deployments.” We have “blue-green deployments.” And I’m supposed to pretend these are improvements.

A canary deployment is named after the old coal-mining practice of bringing a canary into the mine. If the canary died, you knew the air was toxic. The miners could escape.

We have taken this metaphor and applied it to software. The canary is now your users.

What Canary Deployments Actually Are

The pitch: “Deploy to 1% of users first. If nothing breaks, gradually roll out to everyone.”

The reality: “Make 1% of your users suffer the bugs so 99% don’t have to.”

Those 1% are real people. They have jobs. Deadlines. They signed up for your service expecting it to work, not to be the unsuspecting beta testers for your experimental release pipeline. Dogbert from Dilbert would call this “turning customer pain into a data-driven deployment strategy.” The PHB would approve a 10% budget increase for implementing it.

Here’s a table of what canary deployments mean to everyone involved:

Stakeholder	What They’re Told	What’s Actually Happening
Engineers	“Progressive delivery best practice”	Testing on prod with extra steps
Product Manager	“Risk mitigation strategy”	Blaming users for finding bugs
1% Canary Users	Nothing. They’re never told.	Being used as human crash reporters
99% Safe Users	Nothing. They’re also never told.	Watching the canaries suffer from a distance
The Canary	It gets the coal mine.	It always gets the coal mine.

The Blue-Green Deployment Lie

Blue-green deployments are even better. The idea: you have two identical production environments. “Blue” is live. You deploy to “green,” test it, then flip traffic over.

This sounds great. Here’s what it costs:

Two identical production environments (so, double the infrastructure cost)
A mechanism to flip traffic instantly
Database migrations that work in both directions simultaneously
Engineers who actually maintain both environments identically

The database migrations part is where blue-green deployments go to die. Your schema changes need to be backward and forward compatible. At the same time. For every release.

-- Green deployment adds a column
ALTER TABLE users ADD COLUMN email_verified BOOLEAN DEFAULT FALSE;

-- Blue deployment (still running) doesn't know this column exists
-- Blue deployment inserts rows with NULL for email_verified
-- Green deployment assumes email_verified is always FALSE or TRUE
-- Your data is now in a quantum superposition of corrupted and fine
-- It will collapse into "corrupted" when you look at it

This is called a “zero-downtime migration.” I call it “a longer downtime, deferred.”

Feature Flags: Global Variables With Marketing

The cool kids don’t even do canary deployments anymore. They use feature flags — runtime configuration that enables or disables features for specific users.

I wrote about global variables being your friends. Feature flags are global variables with:

A database behind them (so now your feature flag service can be down)
A UI (so a product manager can accidentally disable your authentication system at 2 PM on a Tuesday)
SDK integrations in every service (so you have 17 places where a flag read can fail)
An audit log (that nobody reads until something goes catastrophically wrong)

// Simple feature flag usage
if (featureFlags.get('new_checkout_flow', userId)) {
  return newCheckout(cart);
} else {
  return oldCheckout(cart);
}

// What this becomes after 2 years
if (featureFlags.get('new_checkout_flow', userId) && 
    !featureFlags.get('disable_new_checkout_for_enterprise', userId) &&
    featureFlags.get('checkout_v2_backend_ready') &&
    !featureFlags.get('emergency_rollback_everything') &&
    featureFlags.get('payment_provider_new_api_enabled') &&
    userIsInExperimentGroup(userId, 'checkout_ab_test_q3_2024')) {
  return newCheckout(cart);
} else if (featureFlags.get('checkout_legacy_fallback')) {
  return legacyCheckout(cart);  // Deprecated 3 years ago, never removed
} else {
  // This branch is theoretically unreachable
  // It has been reached 47 times in production
  throw new Error("checkout is in undefined state, good luck");
}

This is called “trunk-based development with feature toggles.” XKCD has thoughts on this kind of complexity.

The Gradual Rollout: A Comedy in Three Acts

Act I: The False Confidence

You deploy to 1% of users. No errors. No alerts. The Grafana dashboard looks clean. “Looks good, roll out to 10%.” You feel like a deployment wizard.

Act II: The Discovery

At 10%, someone finally encounters the specific combination of user settings, browser version, account type, time zone, and items in their cart that triggers the bug. They file a support ticket. Support has no context. Engineering is paged.

The bug: a race condition that only manifests when two requests arrive within 50ms of each other on an account created before 2019 with more than 7 items in their wishlist during a promotional period.

Your canary deployment didn’t catch it because none of your 1% canary users had this profile.

Act III: The Rollback

You roll back. Except you can’t fully roll back because the database migration already ran on production. The old code doesn’t understand the new schema. You spend three hours writing a compatibility shim.

“Progressive delivery,” they call it.

The Only Deployment Strategy That Works

In 47 years, I’ve seen every deployment strategy invented. Here’s the truth:

Strategy	Complexity	When It Fails	Post-Mortem Excuse
Direct to prod	None	Immediately	“We moved fast”
Staging → Prod	Low	When staging differs from prod	“Staging didn’t replicate the issue”
Canary	Medium	When canary users have unusual patterns	“Edge case, very rare”
Blue-green	High	During the database migration	“Zero-downtime means different things”
Feature flags	Very high	When a PM toggles the wrong flag	“Human error, we’ll add more flags”

They all fail. The difference is how many dashboards you have to look at before admitting it.

The classic approach — deploy on Friday at 5 PM and turn your phone off — at least has the virtue of honesty. You’re not pretending the deployment is safe. You’re acknowledging that all deployments are acts of faith, and you’re committing fully.

My Recommendation

Do what I do: deploy everything at once to production, have exactly one environment (localhost is production), and use the support ticket queue as your error monitoring system.

Customers are patient. They’ll let you know when something is broken. That’s user-driven testing, and it’s free.

The alternative is spending six months building a progressive delivery infrastructure that will, ironically, have its own incidents, its own SLA violations, and its own post-mortems about why the canary deployment system failed during a canary deployment.

The canary always dies. That’s the whole point. We just pretend it’s a feature now.

The author’s production systems have exactly one deployment environment. It is called “live.” It is always on fire. He considers this a stable state.