Elevated API errors

Incident Report for Permutive

Resolved

Our API experienced a large spike in traffic at 13.05 UTC. This caused our cluster to autoscale to its maximum capacity, but this wasn't enough to handle the additional load. All of our API instances were simultaneously overloaded, causing our cluster to become non-responsive, and we weren't able to rebuild the cluster until 14:32 UTC.

We have now 2x over-provisioned our API cluster to reduce the risk of this happening again and will be replicating our API in more geographical regions to increase availability. We are currently investigating the time taken to rebuild the cluster, and will be taking steps to speed up this process in the future.

Posted Jul 11, 2017 - 15:26 UTC

Monitoring

Service has now been fully restored, our engineering team will continue to monitor over the next few hours.

Posted Jul 11, 2017 - 15:09 UTC

Investigating

Our EU API servers experienced an increase in 5xx errors. We are currently investigating the issue and will post an update shortly.

Posted Jul 11, 2017 - 13:47 UTC