Over the weekend, I experienced a strange issue (even though its not new) with the InnoDB tablespace (ibdata) corruption. When in general InnoDB crashes, it automatically recovers during the next start by rolling back/forward based on what was pending and un-flushed/un-committed changes at the time of crash.
But for some reason, one of the server; we ran out of disk space (yeah, no alerts) on data directory; where we store everything (tablespace, logs and data); and server was running for few hours in this mode (disk full); and it became un-available and not responding after a while. Only option left was to kill the server process and its PID along with cleaning the stuff to get the space back. After I (re)started the server; server failed to start with the following error..
InnoDB: Error: trying to access page number 1098759810 in space 0, InnoDB: space name /data/ibdata1, InnoDB: which is outside the tablespace bounds. InnoDB: Byte offset 0, len 16384, i/o type 10.
Means the tablespace is corrupted.. By enabling the monitor; I noticed the following..
InnoDB: Error: data dictionary entry for table DBXX/tableXX is corrupt! InnoDB: Index field 0 is delete marked. 091117 20:32:47InnoDB: Assertion failure in thread 1148791104 in file dict0load.c line 503 InnoDB: Failing assertion: ut_memcmp(buf, field, len) == 0
Looks like a modification to primary key on that table which was never persisted…as it failed to write anything to disk just before the crash; wondering why the engine did not trap it earlier as it will not mark as persistent unless a write to log returns success (even if the modified entries are in memory, the change is written in log as everything else is rolled back)
Anyway, I recovered using innodb_force_recovery=4 and dumped and re-loaded it as it failed sub-sequent restarts to start again even after the first recovery (well, should not start as dictionary is wrong)…
Time for me to simulate this scenario on Drizzle and Maria for fun (Just got the source of Maria couple of days back, and should start contributing the code) to see how it works out in the coming days.