It was a dark and stormy night. At least, that’s how it can feel when a deployment to production goes horribly wrong. Problems don’t just affect the support staff, it can affect the bottom line of the company. Putting new code into production is inherently risky. Any time you change the state of the production system, there’s a chance of something going wrong.
So how do you reduce this risk? Here are three ways to reduce the risk of production deployments:
Feature Toggles
One of the easiest ways to reduce the risk associated with deploying new code is to release it dark. That is, put the new code behind a feature toggle. A feature toggle is a switch that diverts the code path along a new bit of functionality, or not, depending on the state of the toggle.
By releasing new code dark, you can make sure that the existing state of the system is stable before introducing any new variables into the mix. Only when you’re satisfied that things look good do you turn on the new code. You may even wait until there are plenty of people in the office who can monitor things, rather than late at night when everyone is tired.
Keep in mind, that this doesn’t have to be a binary on/off switch. An A/B testing framework can just as easily be used to toggle on new functionality. Initially, you start with 100% of traffic going to the control. When things look good, you start turning the dials to send a percentage of traffic to the new code and watch what happens.
One Thing at a Time
A common practice is to accumulate a whole bunch of features together and release them at once, usually during a regularly scheduled release window. While this seems like a reasonable practice, it actually increases the risk of breaking something in production. If something goes wrong, which of the dozens (hundreds?) of features were responsible?
If you want to reduce the risk of your deployments, only release one thing at a time, not a huge package of changes. That way, if something goes sideways, you know exactly what went wrong and can roll it back immediately.
Unfortunately, deploying code this way really requires a continuous deployment model. If you can push code to production as soon as it’s deemed ready, great! For those of you who are still stuck with a regularly scheduled release window, there is still hope.
Refactoring the system into smaller modules or microservices that can be deployed independently can reduce the overall risk. Even if you’re deploying lots of functionality at once, the ability to identify problems and respond is improved. If you see that there is something wrong with, say, billing and you know there was a change to the billing component, you can roll back just that piece.
Planning and More Planning
By failing to prepare, you are preparing to fail. - Benjamin Franklin
Reducing the risk of a release begins at the very beginning of feature development. Throughout the development cycle, ask yourself the following questions:
- What could go wrong?
- How will I know if it does?
- How will I respond if it does?
If you have a standard release process, many of these questions may already have answers. There may be standard dashboards, monitors and alarms already set up that will alert when things go wrong.
For more complicated deployments, perhaps upgrading a critical piece of infrastructure, you’ll need to spend more time up front. Figure out the sequence of events and the timing for each step. Automate as much as possible. Work out contingency plans. And when all else fails, know ahead of time when it will be time to cut your losses and roll back to the last known good state.
Anything you can do to prepare for a release reduces the risk and makes that release all the smoother.
Risk is Inevitable
Every time you touch production, there’s always risk that something could go wrong. You can’t eliminate all risk. But by paying attention to how you build your software and how you deploy it, you can increase the odds that you’ll have a successful deploy. After all, your pillow is calling…sleep well!