MySQL Table Crashing = Disk Controller + 4GB RAM

Since we migrated Yankee’s web infrastructure to Peer1/Dedicated in November, our MySQL server has been experiencing table “crashing” problems and binary log corruption problems. These problems manifested themselves in a variety of ways:

  • Doing a mysqlbinlog dump on the binary logs resulted in an ‘Event too small’ error at some point in the dump.
  • Because of the binary log corruption, attempting to use MySQL replication to mirror data on a slave MySQL server failed at the same time.
  • Selected tables — almost exclusively those with very frequent SELECTs and INSERTs both — were frequently being marked as “crashed” by the MySQL server, and needed to have REPAIR TABLE run on them to correct the problem.
  • MySQL “error 127” messages in the server logs.
  • MySQL “error 134” messages in the server logs.

Although we didn’t suffer any actual data loss as a result, and although repairing the tables always corrected the problem, there was obviously something afoot that needed solving.

And so over the last month I’ve been methodically working through possible solutions to this: tweaking the my.cnf configuration file, upgrading the verion of MySQL (we started with 5.0.22 Community and ended up at 5.0.54 Enterprise), changing queries that used the affected tables. Nothing worked.

Finally we opted to seek help from MySQL itself, and purchased the entry-level “Enterprise Basic” package and opened a support ticket. Over the course of a few days of going back and forth with their (very helpful) technicians, they ended up suggesting that it may be a problem with our disk controlled, and suggested we talk to our server managers about this.

So we got back in touch with Peer1 support (also very helpful) and they immediately discovered an incompatibility between the 3ware 8000 disk controller and the 4GB of RAM in the server running MySQL. On Wednesday night we had them downgrade the RAM to 3GB, and since that time it’s been clear sailing: no binary log corruption, no table crashing.

I write all this mostly to add the information to Google so that others with similar problems will see this as a possible (and not all that obvious) solution.

Comments

til's picture
til on January 23, 2008 - 09:51 Permalink

Which storage engine did you use? Table corruption sounds like a ghost from MyISAM past to me. Or was this problem not related to a specific storage engine?

Peter Rukavina's picture
Peter Rukavina on January 23, 2008 - 14:35 Permalink

We’re using MyISAM almost exclusively, but the problem was related to the disk controller’s incompatibility with 4GB of RAM, which was causing disk issues.

Jeff Sidlosky's picture
Jeff Sidlosky on February 20, 2008 - 11:02 Permalink

I’m having a very similar problem but with my innodb tables. We have a 8006-2 3Ware raid controller. After about two hours of heavy load, we get massive unrecoverable innodb corruption of our idata and ib_data0 log file.

We did a full backup from mysqldump the night before, after a few more hours of running, massive corruption again.

The only difference I did was increase the innodb_buffer_size from 768M to 2048M, using more of the 4 gigs of ram we have.

Your scenario is spookily familiar… I just contacted Silicon Mechanics to find out what they think, hopefully if they feel it may be a problem with the raid controller, they’ll find a fix, maybe have me use a different raid controller.