Outage in instance creation in EU compute shard.

Resolved·Full outage

No impact has been observed for the last 90 minutes, the issue is fully resolved.

Wed, Nov 13, 2024, 04:55 PM

(1 year ago)

Affected components

Nov 13, 2024, 12:51 PM

04:55 PM

Updates

Resolved

No impact has been observed for the last 90 minutes, the issue is fully resolved.

Wed, Nov 13, 2024, 04:55 PM

Monitoring

We continue to observe the recovery, the queueing time have been back to normal levels for 1h now. The team continues to monitor the situation.

Wed, Nov 13, 2024, 04:18 PM(37 minutes earlier)

Identified

We have confirmed the recovery of the functionality across all regions and have observed that for the last 30 minutes the queuing time got back to expected levels. We keep making changes and monitoring the situation.

Wed, Nov 13, 2024, 03:41 PM(36 minutes earlier)

Identified

We have added more capacity to our EU shard and are observing recovery of the region. Some of the workloads might still observe slightly longer queuing time, but the wait time is improving. The team continues to monitor the situation and are still adding more capacity.

Wed, Nov 13, 2024, 03:28 PM(13 minutes earlier)

Identified

The team is currently working on adding additional compute capacity to the EU shard. The queue time to start new jobs has still degraded performance.

Wed, Nov 13, 2024, 02:51 PM(36 minutes earlier)

Identified

We continue using the US compute capacity to reduce pressure on the EU compute shard, and we see a slow recovery. In the meantime, we are preparing mitigations to recover EU capacity. AMD64 and ARM64 capacity are still experiencing degraded creation time. MacOS capacity is operational.

Wed, Nov 13, 2024, 02:31 PM(20 minutes earlier)

Identified

We are mitigating the impact of our unavailability in EU shard compute by using capacity from or US shard. We still see high queue time for all jobs and full outage for arm64 instances.

Wed, Nov 13, 2024, 01:49 PM(42 minutes earlier)

Identified

The team identified a scheduling component overloaded by requests. The whole team is working on removing the overload from this component.

Wed, Nov 13, 2024, 01:01 PM(47 minutes earlier)

Investigating

We are investigating an outage in creation of new instances in our EU compute shard.

Wed, Nov 13, 2024, 12:51 PM(10 minutes earlier)