Last Monday (6th Nov 2006) my server suddenly refused to allow postfix to start. I spent the remainder of Monday, Tuesday, Wedsnesday and some of Thursday trying to fix it to no avail at which point I raised an emergency support ticket with Tektonic (who host the system, it’s one of their VPS plans). By Wednesday this week the problem was resolved and I spent Wednesday evening (pre-LUG) in the Computer Science department uploading the backup I have locally to the server, and then Thursday morning was spent putting the files back in the right place and fixing file permissions.
This morning I checked to see how it was doing. A lack of spam assassin processes using processor time indicated that all of the mail should have been processed, and mutt’s ~700 unread emails would seem to agree that I now have all of the mail from the last week delivered. A quick glance at top, however, showed that of the 35 processes normally running (apache, spamd, cron among others) only 11 were actually running at the moment (fortunately one of the was sshd). Grrr. I restarted the VPS and everything was back to normal, however withing 15minutes 35 processes had again be reduced to 11. The logs indicate that the processes are killed (that is they receive a kill signal, and exit) which is very perculiar as I am certainly not killing them. I restarted for the second time and again things seemed back to normal, however when I pointed a web-browser at my blog it said “wordpress is not installed”. A quick look at mysql showed that one of the tables, “wp_options” had been corrupted and was unreadable by mysql. I again restored this table from backup and my blog seems to be working (more or less) again.
I have never has this VPS working for more than 24hours without processes being killed, apparently randomly. Somethimes it’s just one or two processes, sometimes all of them. Tektonic provide an interface to show if you hit a resource limit (e.g. running out of RAM) but this shows no alerts at all for me. The only time it has ever shown anything is when I ran out of disk space (due to an accident involving dd and a factor of 10), so I can only conclude that it us either broken for memory on my VPS or I’m not hitting any limits and something else is going on. I would, happily, move away from Tectonic if it wern’t for that fact that they are so cheap.
Aside from my email, I’ve also been without my svn repositories on this server which has been a right pain.