Over the weekend, I experienced a strange issue (even though its not new) with the InnoDB tablespace (ibdata) corruption. When in general InnoDB crashes, it automatically recovers during the next start by rolling back/forward based on what was pending and un-flushed/un-committed changes at the time of crash.
But for some reason, one of the server; we ran out of disk space (yeah, no alerts) on data directory; where we store everything (tablespace, logs and data); and server was running for few hours in this mode (disk full); and it became un-available and not responding after a while. Only option left was to kill the server process and its PID along with cleaning the stuff to get the space back. After I (re)started the server; server failed to start with the following error..
InnoDB: Error: trying to access page number 1098759810 in space 0, InnoDB: space name /data/ibdata1, InnoDB: which is outside the tablespace bounds. InnoDB: Byte offset 0, len 16384, i/o type 10. |
Means the tablespace is corrupted.. By enabling the monitor; I noticed the following..
InnoDB: Error: data dictionary entry for table DBXX/tableXX is corrupt! InnoDB: Index field 0 is delete marked. 091117 20:32:47InnoDB: Assertion failure in thread 1148791104 in file dict0load.c line 503 InnoDB: Failing assertion: ut_memcmp(buf, field, len) == 0 |
Looks like a modification to primary key on that table which was never persisted…as it failed to write anything to disk just before the crash; wondering why the engine did not trap it earlier as it will not mark as persistent unless a write to log returns success (even if the modified entries are in memory, the change is written in log as everything else is rolled back)
Anyway, I recovered using innodb_force_recovery=4 and dumped and re-loaded it as it failed sub-sequent restarts to start again even after the first recovery (well, should not start as dictionary is wrong)…
Time for me to simulate this scenario on Drizzle and Maria for fun (Just got the source of Maria couple of days back, and should start contributing the code) to see how it works out in the coming days.


Last week it happened in our company, but its not with disk full instead one of the disk went bad in RAID; and server crashed.
New blog post: http://tinyurl.com/ylzacug – InnoDB Tablespace Corruption
InnoDB Tablespace Corruption: Over the weekend, I experienced a strange issue (even though its not new) with the In… http://bit.ly/1LqRNM
Are you using the InnoDB plugin?
No, 5.0.77
[...] Anuganti also was surprised, in his case by InnoDB Tablespace Corruption: “When . . . InnoDB crashes, it automatically recovers during the next [...]
You should check whether your disk system respects fsyncs. One way to get out of sync like this is out of order writing and losing a change that the server thinks is safely on disk.
The fact that terabytes of data could be so easily corrupted is kind of unsettling. The incremental backup for just one day’s worth of data takes us hours. Restoring all of the backups would take weeks if not months. Guess this is why those with money go with Oracle