Amazon explained this by discontinuing web services disconnected parts of the Internet offline a few hours on December 7 – and promised more clarity if that happens in the future. As CNBC reports, Amazon discovered the automated capacity scaling feature has led to “unexpected behavior” by internal network clients. The devices connecting that internal network to the AWS were flooded, stopping communication.
The nature of the failure prevented the teams from identifying and resolving the issue, Amazon added. They had to use the logs to find out what happened, and internal tools were affected. Rescuers were “extremely deliberate” in restoring the service to avoid interrupting still functional workloads, and had to contend with a “latent problem” that prevented network clients from withdrawing and giving systems a chance to recover.
The AWS department has temporarily disabled the scaling that caused the problem and will not turn it back on until solutions are found. The solution to the latent omission comes within two weeks, Amazon said. There is also an additional network configuration to protect the device in case of re-failure.
You may find it easier to understand crises next time. A new version of AWS’s service status dashboard should appear in early 2022 to provide a clearer view of all outages, and a multi-region support system will help Amazon get in touch with customers much sooner. They won’t return AWS faster during an incident, but they can eliminate part of the mystery when services darken – which is important when victims include everything from Disney + to Roomba vacuum cleaners.
All products recommended by Engadget are selected by our editorial team, regardless of our parent company. Some of our stories include affiliate links. If you buy something through one of these links, we may earn a commission for the partners.