The underlying issue has been resolved — Kafka infrastructure is now fully operational and all services have been restored. The Routing backlog has decreased by over 95% and continues to improve. We expect a full catch-up within the next few hours.
For customers concerned about potential data gaps during this incident, we will share more details about the backfill solution during the coming week.
We appreciate your patience throughout this incident and apologise for any disruption to your operations. Our team will conduct a thorough post-mortem analysis to prevent similar issues in the future.
Posted Feb 22, 2025 - 18:50 UTC
Monitoring
Our engineering team is making good progress on multiple infrastructure migrations, with significant results expected by tomorrow morning (22nd February). There is still a substantial backlog in Routing to process, but we anticipate improvements in data throughput by then.
We are implementing the migration process gradually to minimise data loss and impact on other systems. Our long-term solution includes migrating to managed Kafka by the end of March, which will enable faster reaction times and scaling capabilities.
The team is developing a solution to backfill any eventual data gaps, and we'll provide more details early next week.
We understand this situation continues to impact business operations, particularly for teams relying on real-time data dashboards, and we appreciate your patience as we work to resolve this issue.
Posted Feb 21, 2025 - 18:41 UTC
Identified
We are experiencing an infrastructure/performance issue affecting our Routing product. Some customers may be experiencing delays or missing data. This issue began on February 18th and is related to increased data processing loads. Our engineering team is actively investigating and working to resolve the issue as quickly as possible. We apologize for any inconvenience this may cause and will provide updates as they become available.