So I spent the better part of my evening trying to track down a bug, which turns out to be a “feature” of python.
I had just installed the GitPlugin for trac but I started to experience problems. When browsing the source inside trac it was taking over 30seconds to load the page and sometimes it would fail completely. A lot of searching didn’t help much, so I attempted to debug the problem myself. The first thing I noticed was Apache was taking 100% of the processor for a good 30seconds. I attached strace to it and saw something like this:
[pid 22682] close(43029) = -1 EBADF (Bad file descriptor) [pid 22682] close(43030) = -1 EBADF (Bad file descriptor) [pid 22682] close(43031) = -1 EBADF (Bad file descriptor) [pid 22682] close(43032) = -1 EBADF (Bad file descriptor) [pid 22682] close(43033) = -1 EBADF (Bad file descriptor) [pid 22682] close(43034) = -1 EBADF (Bad file descriptor) [pid 22682] close(43035) = -1 EBADF (Bad file descriptor) [pid 22682] close(43036) = -1 EBADF (Bad file descriptor)
This obviously didn’t look good! After some tinkering I found the problem went away when I ran trac standalone, instead of using mod_python or fcgi. This turned out to be a bit of a red herring because I spent my time trying to figure out what was different between a standalone executable and one being run inside Apache.
After playing around with environment variables, I gave up and attempted to printf debug the trac git plugin. I found that the actual call to git was taking on the order of seconds, whereas calling it myself from the command took milliseconds. The line of code (in PyGIT.py) looked a bit like this:
p = Popen(self.__build_git_cmd(git_cmd, *cmd_args), stdin=None, stdout=PIPE, stderr=PIPE, close_fds=True)
Now, when I removed the close_fds argument the problems went away! After some more digging I found this bug report which describes the behaviour of close_fds. Python will spin in a tight loop calling close for all possible valid fd number. WTF! You can see the python code here:
def _close_fds(self, but): os.closerange(3, but) os.closerange(but + 1, MAXFD)
So the simple fix to this was to remove the close_fds, so that Python doesn’t stupidly spin calling close(). I suspect the reason I only noticed this when running inside Apache, is because Apache must have a larger MAXFD. Hopefully in the future Python will change this behaviour and find a more sensible way to close all file descriptors, especially when I read this bug report which advises changing close_fds default to true.