I was setting up a new test cluster other day using the latest development branch (1.0.4 tag) of hadoop, to test the new patch which extends the balancer code to add support for balancing a single node when you have multiple disks (JBOD) without considering the whole cluster.
During the initial cluster setup (I did not copy my regular configs, else it might have went un-noticed), the NameNode failed to start and throwing the following error:
2012-10-27 00:49:50,067 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory /hadoop/dfs/name/1 does not exist. 2012-10-27 00:49:50,069 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /hadoop/dfs/name/1 is in an inconsistent state: storage directory does not exist or is not accessible. at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:303) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:388) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:362) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:276) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:496) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288) 2012-10-27 00:49:50,070 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /hadoop/dfs/name/1 is in an inconsistent state: storage directory does not exist or is not accessible. at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:303) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:388) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:362) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:276) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:496) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288) |
I had all directories setup properly with right privileges. Even the NameNode format did work without any errors:
hadoop@hadoop1:~$ hadoop namenode -format ... ... Re-format filesystem in /hadoop/dfs/name/1 ? (Y or N) Y ... .. 12/10/27 00:56:50 INFO common.Storage: Storage directory /hadoop/dfs/name/1 has been successfully formatted. |
It took me a while to understand that I had space in the dfs.name.dir config parameter value (hdfs-site.xml); which was causing the NameNode to fail to bootstrap.
dfs.name.dir
/hadoop/dfs/name/1 |
Basically the problem happens if you have a space in NameNode directory paths. The following path combinations with space will fail.
/hadoop/dfs/name/1 /hadoop/dfs/name/1 /hadoop/dfs/name/1, /hadoop/dfs/name/2 /hadoop/dfs/name/1,/hadoop/dfs/name/2 |
It will be more confusing if you have a space in front of a directory, as the format works, but node fails to start and you can’t even find the directory using ls command (tried on Ubuntu).
hadoop@hadoop1:~$ hadoop namenode -format ... Re-format filesystem in /hadoop/dfs/name/1 ? (Y or N) Y .. 12/10/27 01:06:55 INFO common.Storage: Storage directory /hadoop/dfs/name/1 has been successfully formatted. ... hadoop@hadoop1:~$ ls -al /hadoop/dfs/name/ total 8 drwxrwxr-x 2 hadoop hadoop 4096 Oct 27 01:04 . drwxr-xr-x 4 hadoop hadoop 4096 Oct 22 21:04 .. |
The successful formatted NameNode directory should look like this:
hadoop@hadoop1:~$ ls -al /hadoop/dfs/name/1 total 16 drwxrwxr-x 4 hadoop hadoop 4096 Oct 27 01:10 . drwxrwxr-x 3 hadoop hadoop 4096 Oct 27 01:10 .. drwxrwxr-x 2 hadoop hadoop 4096 Oct 27 01:10 current drwxrwxr-x 2 hadoop hadoop 4096 Oct 27 01:10 image |
This does not happen for DataNode path (dfs.data.dir), so it should be easy to patch by trimming the white spaces around the directory path unless it was quoted in the config file.
Update: Looks like this is fixed in latest 2.0 branch (FSNamesystem.java: getStorageDirs() now uses conf.getTrimmedStringCollection(propertyName) instead of conf.getStringCollection(propertyName))

