Interesting Hadoop issue
BackgroundWe have been running a proof of concept (PoC) CDH3u3 cluster of 9 data nodes running CentOS 6.1.
Now we want to go production and don't have the time to upgrade to CDH4.2 ;-)
We were adding new data nodes running CentOS 6.3 with the same version of CDH3u3.
At the same time we took the time to update the topology of the cluster for these new nodes.
We added the rack they were in into the topology file.
We left the PoC cluster with their default rack locations.
Symptoms
We could load data into the cluster no problem.
But when we ran queries, they would run for a while and then stop.
The number of occupied Map Slots would be 0 but there were plenty of free slots.
We saw some errors/warning in the logs and they were certainly not obvious.
Hadoop jobtracker log warnings/errors
Example warnings/errors in namenode jobtracker logfile (maybe related)
2013-05-01 10:11:11,984 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'KILLED' for 'attempt_201305010958_0001_m_000174_1' of TIP 'task_201305010958_0001_m_000174'oldTT=tracker_
2013-05-01 10:11:11,984 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201305010958_0001_m_000220_1' to tip task_201305010958_0001_m_000220, for tracker 'tracker_
2013-05-01 10:11:11,984 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException: java.lang.NullPointerException
2013-05-01 10:11:11,984 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 8021, call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@17b0b765, false, false, true, 2320) from 10.173.226.117:53514: error: java.io.IOException: java.lang.NullPointerException
java.io.IOException: java.lang.NullPointerException
Example odd entries in namenode tasktracker logfile (maybe related)
2013-04-30 00:19:22,760 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to '
2013-04-30 00:19:22,764 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to '
2013-04-30 00:19:22,767 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to '
2013-04-30 00:19:22,770 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to '
2013-04-30 00:19:22,773 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to '
2013-04-30 00:19:22,777 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to '
Solution
Adding the rack locations of all the cluster nodes in the topology file did the trick.
No comments:
Post a Comment