Quora is down, Reddit is in emergency read only mode. Quite severe this is then!
According to the first investigation (from the AWS health dashboard) the reason for outage was a networking event which caused a large number of EBS volumes being re-mirrored. This caused capacity problems in the affected region. Also there were problems with one control plane which made it difficult to create new EBS volumes and instances. Control plane is a piece of router architecture which is responsible of drawing the network map, if you did not know it… I certainly did not know before.
Of course, there are plenty of other services impacted by the outage and I guess this is a great time to see how different services have been designed to sustain a degradation of some underlying components. Quora is totally dead (well, there is the notification to users) and Reddit is in read only mode. I give my points to Reddit as they have managed to fail gracefully to a cached read only mode.
Funny thing, just today I was reading a text by James Hamilton which is spot on this situation. I need to say I am surprised Quora did not have a fail over to a different location as the other location in US seems to be ok.
Tags: Amazon, James Hamilton AWS, outage, Quora, Reddit, US-EAST-1

