Exclaimer Cloud: Messages failing to be processed with 'Negotiation Failed'
Incident Report for Exclaimer Cloud
Postmortem

Issue: TLS Negotiation failed, Certification Invalid for US subscriptions.

Incident Length: 6 hours and 43 minutes

Incident Date: 04/12/2024, UTC 00:52 – 07:35 04/12/2024

Incident Status: Resolved

Summary

Customers reported encountering the error ‘TLS Negotiation failed, Certification Invalid’ when routing messages through Exclaimer’s Server-side system. This issue only impacted subscriptions located within US regions. 
Not all servers were affected by this issue, so this did not impact all messages being sent through Exclaimer during the incident.

Going forwards, a new process for reviewing and confirming certificate updates has been introduced to prevent similar issues in the future.

Root Cause

Due to an unexpected oversight, the certificate for US relays was not updated on all routing servers. Resulting in the previous certificate expiring and no longer being valid for mail routing.

Mitigation

Once the expired certificate had been identified within the infrastructure as not correctly updated and applied. The certificate was replaced with the renewed certification to resolve the issue. All other instances of the certificate were then reviewed and verified to have been correctly updated

Incident Timeline

00:52 – Alerting advised to a failure to obtain a message response at times within the US region

01:29 – Initial investigation indicated that all endpoints were responding to requests and were accessible to engineering staff, suggesting full operation of the Exclaimer system.

01:45 – Investigation continued to confirm that traffic routing also did not advise of an issue. However, traffic flow to other US servers remained higher.

02:13 – Another full review indicated that the system was operating as expected with no errors being reported in the run up to the alert being generated.

02:25 – Alert was documented and prepared for pickup during main operational hours

03:40 – Support alerted team to customer facing reports of messages being rejected with ‘Certification Invalid’

03:52 – Identified that the active certificate on US2 was showing as having recently expired.

04:06 – Engineering confirmed that no other services were also attempting to use the expired certificate.

04:21 – A full review of all servers and an update of US2 to ensure only the latest certificate was applied was completed.

04:31 – Engineering confirmed system recovery of the original alert, and an improvement of traffic flow between all servers. Incident moved into Monitoring status.

07:35 – Support confirmed that new reports of the issue had ceased, and existing reports confirmed the issue was no longer occurring.

Posted Dec 09, 2024 - 12:10 UTC

Resolved
This incident has been resolved.
Posted Dec 04, 2024 - 07:21 UTC
Update
We are continuing to investigate this issue.
Posted Dec 04, 2024 - 06:41 UTC
Investigating
We are investigating a possible service alert.

Issue: TLS Negotiation failed, certificate invalid: SSL [Leaf certificate is expired]

Next Update In: TBA
Posted Dec 04, 2024 - 01:39 UTC
This incident affected: Exclaimer Cloud: US (Mail Routing - G Suite).