I was setting up a new test cluster other day using the latest development branch (1.0.4 tag) of hadoop, to test the new patch which extends the balancer code to add support for balancing a single node when you have multiple disks (JBOD) without considering the whole cluster.
During the initial cluster setup (I did not copy my regular configs, else it might have went un-noticed), the NameNode failed to start and throwing the following error:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
2012-10-27 00:49:50,067 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory /hadoop/dfs/name/1 does not exist. 2012-10-27 00:49:50,069 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /hadoop/dfs/name/1 is in an inconsistent state: storage directory does not exist or is not accessible. at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:303) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:388) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:362) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:276) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:496) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288) 2012-10-27 00:49:50,070 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /hadoop/dfs/name/1 is in an inconsistent state: storage directory does not exist or is not accessible. at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:303) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:388) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:362) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:276) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:496) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288) |
I had all directories setup properly with right privileges. Even the NameNode format did work without any errors:
1 2 3 4 5 6 7 8 9 |
hadoop@hadoop1:~$ hadoop namenode -format ... ... Re-format filesystem in /hadoop/dfs/name/1 ? (Y or N) Y ... .. 12/10/27 00:56:50 INFO common.Storage: Storage directory /hadoop/dfs/name/1 has been successfully formatted. |
It took me a while to understand that I had space in the dfs.name.dir config parameter value (hdfs-site.xml); which was causing the NameNode to fail to bootstrap.
1 2 3 4 |
dfs.name.dir /hadoop/dfs/name/1 |
Basically the problem happens if you have a space in NameNode directory paths. The following path combinations with space will fail.
1 2 3 4 5 6 |
/hadoop/dfs/name/1 /hadoop/dfs/name/1 /hadoop/dfs/name/1, /hadoop/dfs/name/2 /hadoop/dfs/name/1,/hadoop/dfs/name/2 |
It will be more confusing if you have a space in front of a directory, as the format works, but node fails to start and you can’t even find the directory using ls command (tried on Ubuntu).
1 2 3 4 5 6 7 8 9 10 11 12 |
hadoop@hadoop1:~$ hadoop namenode -format ... Re-format filesystem in /hadoop/dfs/name/1 ? (Y or N) Y .. 12/10/27 01:06:55 INFO common.Storage: Storage directory /hadoop/dfs/name/1 has been successfully formatted. ... hadoop@hadoop1:~$ ls -al /hadoop/dfs/name/ total 8 drwxrwxr-x 2 hadoop hadoop 4096 Oct 27 01:04 . drwxr-xr-x 4 hadoop hadoop 4096 Oct 22 21:04 .. |
The successful formatted NameNode directory should look like this:
1 2 3 4 5 6 7 8 |
hadoop@hadoop1:~$ ls -al /hadoop/dfs/name/1 total 16 drwxrwxr-x 4 hadoop hadoop 4096 Oct 27 01:10 . drwxrwxr-x 3 hadoop hadoop 4096 Oct 27 01:10 .. drwxrwxr-x 2 hadoop hadoop 4096 Oct 27 01:10 current drwxrwxr-x 2 hadoop hadoop 4096 Oct 27 01:10 image |
This does not happen for DataNode path (dfs.data.dir), so it should be easy to patch by trimming the white spaces around the directory path unless it was quoted in the config file.
Update: Looks like this is fixed in latest 2.0 branch (FSNamesystem.java: getStorageDirs() now uses conf.getTrimmedStringCollection(propertyName) instead of conf.getStringCollection(propertyName))
Hi Venu
Thanks for posting the issue you had ,its quite useful .I am working on something similar I have a hadoop setup already up and running and I am working on modifying the property dfs.name.dir to add a new folder location for the fsimage and edit logs to be saved ,however after adding the location and restarting the hadoop filesystem it dosnt seem to replicate the fsimage and editlogs ,i cant reformat the namenode since I am doing this on a prod setup .Let me know what I am missing when after altering the property dfs.name.dir .Here is what it was intially setup as when the cluster was build
dfs.name.dir
/hadoop/tmp/folder1
Here is what I am modifying it to be
dfs.name.dir
/hadoop/tmp/folder1,/hadoop/tmp/folder2,/hadoop/tmp/folder3
Aparna, shutdown the cluster, copy /hadoop/tmp/folder1 to /hadoop/tmp/folder2 and /hadoop/tmp/folder3 (just copy & paste) and set appropriate perms depending on what user/group the cluster is running (on all nodes); and then update dfs.name.dir property in hdfs-site.xml with all 3 folders (comma separated); and then restarting the cluster should work without any issues.
Here is what I did to test this on my test cluster on mac; and it worked as expected.
venu@ ~/hadoop/data/hdfs 16:31:04# grep dfs.name ~/hadoop/hadoop/conf/hdfs-site.xml
dfs.name.dir
/Users/venu/hadoop/data/hdfs/name
dfs.name.dir
/Users/venu/hadoop/data/hdfs/name,/Users/venu/hadoop/data/hdfs/name1
venu@ ~/hadoop/data/hdfs 16:31:07# cp -r name name1
venu@ ~/hadoop/data/hdfs 16:31:14# rm -rf name1
venu@ ~/hadoop/data/hdfs 16:31:17# cp -r name name1
venu@ ~/hadoop/data/hdfs 16:31:20# sed “s/hdfs\/name/hdfs\/name,\/Users\/venu\/hadoop\/data\/hdfs\/name1/g” ~/hadoop/hadoop/conf/hdfs-site.xml > x && mv x ~/hadoop/hadoop/conf/hdfs-site.xml
venu@ ~/hadoop/data/hdfs 16:31:27# grep dfs.name ~/hadoop/hadoop/conf/hdfs-site.xml
venu@ ~/hadoop/data/hdfs 16:31:29# mac_hadoop_start
—- starting hadoop..
localhost : namenode : OK
localhost : secondarynamenode : OK
localhost : jobtracker : OK
localhost : datanode : OK
localhost : tasktracker : OK