API V2 service disruption

Incident Report for Mindee

Postmortem

Summary

Today, our service experienced a major degradation caused by an unexpected outage of a third-party provider that we rely on for the Polygon extraction option. The upstream failure propagated through our system and led to significant inference delays and failures for all customers.

What Happened

  • 13:40 UTC: Requests using the Polygon option began failing immediately.
  • Workers processing Polygon-enabled documents became blocked, waiting indefinitely for a response.
  • As these requests accumulated, all workers were progressively saturated, eventually preventing any inference from being processed.

Impact on Customers

  • 13:40–14:00 UTC:

    • Polygon requests returned errors.
    • Worker saturation began, reducing system capacity.
  • 14:00–14:50 UTC:

    • All inference requests (including those not using Polygon) were affected.
    • Customers experienced either failed inferences or delays of 10 to 20 minutes.

The service recovered once we switch the upstream provider and our worker pool cleared the blocked tasks.

Root Cause

The initial root cause was the outage of the third-party provider supporting the Polygon extraction option.
The problem was amplified internally because this external call did not include a timeout, allowing requests to hang and consume worker capacity indefinitely.

Resolution

  • The service recovered automatically after we change the upstream provider.
  • Blocked workers resumed normal activity as their tasks were released.

Preventive Actions

We are implementing the following measures to prevent similar incidents:

  1. Add strict timeouts to all calls to external providers.
  2. Enhance monitoring systems to detect blocked workers and abnormal processing times earlier.

Closing Note

We sincerely apologize for today’s disruption. Reliability is essential to your operations, and we are taking concrete steps to ensure this type of issue does not happen again.
If you have any questions, our support team is here to help.

Posted Dec 05, 2025 - 17:04 CET

Resolved

This incident has been resolved.
Posted Dec 05, 2025 - 16:30 CET

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Dec 05, 2025 - 16:02 CET

Update

We are continuing to investigate this issue.
Posted Dec 05, 2025 - 15:46 CET

Investigating

We are currently investigating this issue.
Posted Dec 05, 2025 - 15:46 CET
This incident affected: Mindee V2 (API V2 (api-v2.mindee.net)).