Staging environments are costly. Bugs in production are costlier.
The idea of avoiding testing in production is being challenged lately, and one of the key arguments for it is that staging environments might not be worth it. We pay a lot to set up and maintain an as-close-to-production-as-possible environment, but it might not bring enough value for the cost.
Using automated local tests to detect issues, delivering features cautiously through feature flags and a/b testing, and even simply having a few bugs slip through to production are mentioned as good enough alternatives not to have to invest in a staging environment.
However, thinking that a staging environment is unnecessary might stem from working on smaller-scale systems that can tolerate outages and issues or from working with a bad staging environment.
Working without staging environments
Localized testing can only get us far when working on a complex system. We need to know that all the pieces fall into place to ensure that things keep working as expected. We don’t truly validate our code without booting up the many other (micro)services and doing proper end-to-end system testing.
Feature flags are a great solution, but they introduce some complexity, especially if they pile up and get left in the code for too long. Plus, they might get misconfigured if our feature touches on many parts of the system.
If we interact with plenty of third-party services, setting it all up from a local testing setup is difficult. Then, you have to consider the security concerns some clients might have. Ensuring secure access from one point is much easier than a whole network.
Bad staging environments
A poorly designed staging environment doesn’t test our product in its real ecosystem, where it interacts with both our own services and any third-party systems it relies on. If the staging environment is designed that way, we might as well rely only on local testing.
It’s not much better when a staging environment only has simple emulations of other systems. If we simplify the results we expect to receive whenever we call another service, we lose a valuable chance to test compatibility and real-world interactions before we hit production.
Suppose we don’t automate testing of key business processes and only rely on developers to check the areas they impacted manually. In that case, we risk introducing breaking changes into our critical and high-traffic systems.
A bad staging environment only has bits of testing data present in it. If we don’t work with real production amounts of data, we can end up not realizing how our changes impact usability. If we only test loading a few entries, we can miss issues introduced when millions of entries need to be loaded.
Potential staging environment problems
Developers should not become more lenient and trust that their testing environment (and their testers, of course) would catch all errors. It would be truly impressive if we could create a staging environment so good that developers would trust it to that extent. A good staging environment shouldn’t mean developers can be less vigilant.
Another issue could be developers queuing up for their turn to test things. Good use of separate instances and load balancers can ensure people don’t queue for too long and that other people can continue to use stable releases while someone tests their new changes.
Even with all this, a bug will make it to production one day. But, striving for perfection is not a good enough excuse to reject staging environments. We can’t create a perfect staging environment, but we should still aim for one as good as possible.
Staging environments benefits
A good and appropriately used staging environment can save much more time and money than it takes to create and maintain.
The value of a good staging environment comes from:
- Being able to test any use case at any time without touching production
- Demonstrating functionality at any time, which can also serve product managers, designers, and clients alike.
- Having a playground that everyone can learn with. Developers can take in information much better if they can see existing code in reality.
What works for some doesn’t work for all, and products have different requirements and levels of responsibility.
Sometimes, it’s better to keep it simple. Other times, keeping it simple could result in costly outages. And a good staging environment can prevent them.