Apache Hung, But Why?

Posting this here just in case others find themselves in the same situation.

Since migrating this site from my own silverorange-colocated server to Amazon AWS, I’ve been experiencing seemingly random Apache server outages. The usual suspects – mismatched Apache and MySQL settings, lack of memory, etc. – didn’t seem to apply, and the symptoms were confoundingly simple: all of Apache’s processes – up to the MaxClients settings limit – would be “hung,” and the server would be unreachable. But MySQL was fine, the system load was fine, and, as near as I could tell, everything was beautiful and sunny. Except that Apache was, effectively, hung.

After several weeks of Googling away for “Apache hung processes” and “Apache process limit” and “Apache crashed” and “mysterious and confounding Apache issue,” I launched into some more serious Apache debugging and eventually traced the issue to some variation of APC and this PHP issue. The debugging that got me there was running strace on any of the child Apache processes and seeing:

futex(0x2b19a0b01070, FUTEX_WAIT, 2, NULL)

Rather than trying to solve the APC issue itself, I solved the issue – fingers cross – by upgrading from Apache 2.2 to Apache 2.4 and from PHP 5.3.29 to PHP 5.5.20. Among other things, this let me jettison APC and use PHP OPcache instead. Which appears to have addressed the larger issue, and given me access to a nice OPcache visualization to boot:

PHP OpCache