This Is Why Our 3000 Apache Servers Went Down On The First Day of 2022
Imagine a beautiful Saturday morning and the first day of the new year, you wake up and see a notification that says your whole infrastructure is down!!!
This happened to a colleague of mine. Real story.
Early Morning
The most important action to take was recovering all services to minimize the outage. We restarted all apache servers and it seemed that it was functioning without any problems. Now it was the time to answer the question of why; Why on earth are all servers suddenly going down on the first day of the year? It couldn’t be a coincidence, could it?
This error log was all we had on all our servers:
AH00171: Graceful restart requested, doing restartlibgomp: could not create thread pool destructor.
libgomp!? To respect programmers' culture first thing we did was google this issue. We found the same issue opened in serverfault. But no answer, at least not something that we can use! Although something about the issue was weird for us; as the reporter mentioned the crash occurs every 24–36 hours!