Application Insights team has identified an issue in alerting service that may have impacted up to one third of our total customers. This issue was caused by a configuration change & has been fixed by hotfix deployment. At this moment , all of our services are healthy & running as expected.However during the impacted window (2/8/2016 18:00 - 2/8/2016 23:30 UTC) , customers may not have received alerts for configured metrics.
- Root Cause: The failure was due to configuration change.
- Lessons Learned: Our team is investigating additional improvements to internal telemetry and change management process to avoid re-occurrence of similar issue.
- Incident Timeline: 5 Hours & 30 minutes - 2/8, 18:00 UTC through 2/8, 23:30 UTC
-Application Insights Service Delivery Team