Update on the debugging: We’ve traced it to a spontaneous reboot. Not a process dying, but a reboot. That’s different. The damn machines are rebooting at random intervals once we get them fully loaded. They die suddenly, with no message in the log files. When they come back, the only message to indicate why they rebooted is some crap about a “PMU signal -93”. Turns out that the PMU is the Power Management Unit, and the code is actually the result of some whack bit masking. -93 translates as “a watchdog process told us to shut the hell down, so we did.”
I got in touch with the person at Apple who knows such things, and asked his advice. Basically, we (1) set some debug flags in the kernel (2) set up a method to capture the death debugging screams as it goes down (to determine which watchdog requested the reboot, and why) and (3) wait for it to die again.
That was several hours ago. The reboot has gone away. Entirely possible that the debugging flags “fix” this particular behavior.
Oh, I hate computers.
Leave a Reply