Incident Summary:
On July 1, 2024, users reported difficulties searching for applicants and viewing candidate resumes. The issue impacted applications received on June 29, 2024 and involved problems with search indexing, resume parsing, and other background updates.
Customer Impact:
Users encountered issues with searching for applicants who applied on June 29, 2024, and with updates to applicant profiles processed on that date. These issues affected resume parsing and other updates that are generally processed in the background. A workaround was provided, allowing users to access applicant profiles and download resumes directly from the file section in the application viewer.
Root Cause:
The incident was caused by scheduled system maintenance that overwhelmed our background processing system. This led to the failure of background jobs, including search indexing and resume processing. The system's capacity degraded over the course of the day, resulting in a total outage from 8:55 pm to 10:12 pm PT on June 29, 2024. The failed background jobs were initially missed during that outage.
Resolution Steps:
Background jobs that failed on June 29, 2024 were identified and reprocessed on July 1, 2024.
Preventative Measures:
We implemented several improvements to enhance our system's performance and reliability. New alerts have been established to monitor background job completions and detect errors more efficiently. We've added an additional monitor to catch issues during low traffic periods and increased system capacity to manage higher loads. An alert now triggers when background jobs exceed a reasonable threshold, ensuring timely intervention. Enhancements have been made to the background processing system, and we've reduced dependency on critical components to prevent total outages during failures.
If you have any further questions or concerns, please do not hesitate to reach out to our support team.