If you were fortunate enough to dodge the recent 4-hour outage of Amazon’s Web Services, then at the very least you saw how it impacted some of the websites you visited that day. While the glitch has probably already taught you that claims of “impending disasters” stemming from over-dependence on cloud services have some clout, it’s important that as an owner of a startup, you walk away with more knowledge.
If you’re a business leader that depends on Amazon for your web services, here are four important lessons to keep in mind.
Identify Points of Failure
Amazon’s Web Service outage was able to be traced back to a line of code that sputtered multiple server shutdowns— enough to affect the entire AWS network. As you reflect on how to safeguard yourself from similar future occurrences, start digging into the ‘single points of failure’ that could cripple your business. This doesn’t necessarily mean the information you have on a cloud or server. Look a little deeper into other aspects of your company. Perhaps these points of failure come on in the form of a key person that you depend on for information. Consider how their absence on a sick day, or if they were just to leave your business, might affect your entire operation. Maybe it’s a bit more complicated and lies in a piece of software that you depend on for a large aspect of managing your business. You might not think your business has a weak point now, but remember, neither did those other companies relying on AWS. Start digging into the infrastructure of your company so that you can identify any potential defects in your framework.
Start Differentiating Between Core and Non-Core Systems
Look at the ways in which you depend on the cloud and start determining which aspects you can afford to lose for a while from those that are too high-priced. If businesses learned anything from the AWS outage, it’s that the cloud is a great tool, until it collapses. To decipher which aspects of your business should go on the cloud and which shouldn’t, consider the information you have stored on AWS, or any other cloud server that you can and cannot afford to lose. What bits of information could immediately cripple your ability to serve your customers if they were was lost? Losing the contact information of your clients might not completely derail your business and cost you money, the way a line of code might. As you sort through what makes sense to store on the cloud and what does not, ensure you have your own, controlled backup procedure prepped for the next time stuff goes down.
Always Expect The Unexpected
Nothing is fail proof simply because everything will fail at some point or another. Whether you’re using the highest quality software or most sufficient memory, failure is a given. This is a concept that comes even more apparent thanks to the AWS meltdown. To make sure you’re effectively combatting and prepping for any failures coming your startup’s way, you have to build systems that embrace failure as a natural occurrence. Even if you’re unsure of what those failures might be, it is important to be able to manage pieces that are impacted without having to overhaul your entire system. Even if your “house is on fire,” you have to make sure the rest of your operation is able to still run.
Always Have A Fresh Pair of Eyes On Deck
Efficient deployment of high-potential employees has to be one of the most underrated tools used by every business. In other words, never underestimated the power of moving your team members around and into different departments for learning reasons. Having your best employees gaining experience in other departments of your organization will give you a well-rounded understanding of how your business works as a whole. What’s more, it will help you have an extra pair of eyes looking out for problems and challenges that might come along the way.
AJ, good points that you make. There is a very good breakdown of Amazons analysis of the breakdown done by Holger Mueller from Constellation Research.
down report human error takes aws s3 down
The single line of code that was entered by the poor employee was only the trigger. The root cause was Amazon not doing what they are preaching themselves.
One more important thing that one can learn is the avoidance of single points of failure, which is where cloud services can really excel.
Thomas
@twieberneit