Hadoop Tools Ecosystem
Useful link re the Hadoop tools ecosystem
http://nosql.mypopescu.com/post/1541593207/quick-reference-hadoop-tools-ecosystem
Hadoop standalone rebuild
See useful link
rm -rf /data/hdfs
rm -rf /data/tmpd_hdfs
hadoop namenode -format
start-all.sh
Hadoop install on CentOS - JDK + Cloudera distribution
# copy CDH tarfiles and jdk somewhere - say /tmp/downloads
cd /opt # or wherever you decide to install hadoop
# install jdk
/tmp/jdk-6u25-linux-x64.bin # Install jdk
# edit /etc/profile and add necessary to the path e.g. +++
export JAVA_HOME="/opt/jdk"
PATH="$PATH:$JAVA_HOME/bin:/opt/hadoop/bin:/opt/hive/bin:/opt/pig/bin"
export HIVE_HOME=/opt/hive
export HADOOP_HOME=/opt/hadoop
export PIG_HOME=/opt/pig
Good Installation/config notes
Great notes re setting up a cluster
To be aware of
Small files in HDFS problem
Configuring
Read this article re configuration notes as a starter
Jobtracker hanging - memory issues - read here
Also read misconfiguration article
Transparent HugePages - see Linux reference and Greg Rahn's expose of THP issue on Hadoop
architectural notes here
Troubleshooting
$ hadoop fsck / 2>&1 | grep -v '\.\.\.'
FSCK started by hadoop (auth:SIMPLE) from /10.1.2.5 for path / at Tue Jun 19 08:16:59 BST 2012
Total size: 14476540835550 B
Total dirs: 6780
Total files: 1040334 (Files currently being written: 3678)
Total blocks (validated): 1207343 (avg. block size 11990412 B)
Minimally replicated blocks: 1207343 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0023208
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 9
Number of racks: 1
FSCK ended at Tue Jun 19 08:17:15 BST 2012 in 15878 milliseconds
The filesystem under path '/' is HEALTHY
$ hadoop dfsadmin -report
Configured Capacity: 65158503579648 (59.26 TB)
Present Capacity: 61841246261419 (56.24 TB)
DFS Remaining: 18049941311488 (16.42 TB)
DFS Used: 43791304949931 (39.83 TB)
DFS Used%: 70.81%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Decommissioning nodes from cluster
Useful link re the Hadoop tools ecosystem
http://nosql.mypopescu.com/post/1541593207/quick-reference-hadoop-tools-ecosystem
Hadoop standalone rebuild
See useful link
rm -rf /data/hdfs
rm -rf /data/tmpd_hdfs
hadoop namenode -format
start-all.sh
Hadoop install on CentOS - JDK + Cloudera distribution
# copy CDH tarfiles and jdk somewhere - say /tmp/downloads
cd /opt # or wherever you decide to install hadoop
# install jdk
/tmp/jdk-6u25-linux-x64.bin # Install jdk
# install hadoop apps
for `f in *cdh3*gz`
do
tar -xvzf $f
done
# build soft links to current CDH version
for f in `ls -d *cdh3u3`; do g=`echo $f| cut -d'-' -f 1`; ln -s $f $g; done
# check permissions and chown -R hadoop:hadoop if reqd
# edit /etc/profile and add necessary to the path e.g. +++
export JAVA_HOME="/opt/jdk"
PATH="$PATH:$JAVA_HOME/bin:/opt/hadoop/bin:/opt/hive/bin:/opt/pig/bin"
export HIVE_HOME=/opt/hive
export HADOOP_HOME=/opt/hadoop
export PIG_HOME=/opt/pig
Good Installation/config notes
Great notes re setting up a cluster
To be aware of
Small files in HDFS problem
Configuring
Read this article re configuration notes as a starter
Jobtracker hanging - memory issues - read here
Also read misconfiguration article
Transparent HugePages - see Linux reference and Greg Rahn's expose of THP issue on Hadoop
architectural notes here
Troubleshooting
$ hadoop fsck / 2>&1 | grep -v '\.\.\.'
FSCK started by hadoop (auth:SIMPLE) from /10.1.2.5 for path / at Tue Jun 19 08:16:59 BST 2012
Total size: 14476540835550 B
Total dirs: 6780
Total files: 1040334 (Files currently being written: 3678)
Total blocks (validated): 1207343 (avg. block size 11990412 B)
Minimally replicated blocks: 1207343 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0023208
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 9
Number of racks: 1
FSCK ended at Tue Jun 19 08:17:15 BST 2012 in 15878 milliseconds
The filesystem under path '/' is HEALTHY
$ hadoop dfsadmin -report
Configured Capacity: 65158503579648 (59.26 TB)
Present Capacity: 61841246261419 (56.24 TB)
DFS Remaining: 18049941311488 (16.42 TB)
DFS Used: 43791304949931 (39.83 TB)
DFS Used%: 70.81%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Decommissioning nodes from cluster
Decommissioning a node
- Add to the exclude node file
- Run hadoop dfsadmin -refreshNodes on the name node
This results in the data node added to the excludenode file to move all its data to other nodes in the clulster.
[hadoop@mynamenode conf]$ hadoop dfsadmin -report | more
Configured Capacity: 86201191694336 (78.4 TB)
Present Capacity: 81797299645664 (74.39 TB)
DFS Remaining: 32416639721472 (29.48 TB)
DFS Used: 49380659924192 (44.91 TB)
DFS Used%: 60.37%
Under replicated blocks: 315421
Blocks with corrupt replicas: 0
Missing blocks: 0
Name: mydecommnode:50010
Rack: /dc1/r16
Decommission Status : Decommission in progress
Configured Capacity: 7239833731072 (6.58 TB)
DFS Used: 6869802995712 (6.25 TB)
Non DFS Used: 369725145088 (344.33 GB)
DFS Remaining: 305590272(291.43 MB)
DFS Used%: 94.89%
DFS Remaining%: 0%
Last contact: Wed May 08 11:37:15 BST 2013
Removing the decommissioned datanode
- Kill hadoop java processes still running on the node
- Remove from slaves file on the name nodes
- Leave in the exclude node file until a cluster reboot (changed in newer versions of Hadoop TBC?)
Fsck of hdfs shows replication violations/issues
Solution is to toggle the replication factor down one and then back to where it should be.
See example below.
$ ./fsck_all.sh | head -20
fsck'ing fs [/data]
FSCK started by hadoop (auth:SIMPLE) from / for path /data at Tue May 14 08:33:37 BST 2013
/data/myfeed/load_dt=20130430/batch=202/myfile1_20130430110747+0100_20130430110848+0100_src1_0224180.dat.lzo: Replica placement policy is violated for blk_-1157106589514956189_3885959. Block should be additionally replicated on 1 more rack(s)
.......................Status: HEALTHY
So run the following:
$ hadoop fs -setrep 2 /data/myfeed/load_dt=20130430/batch=202/myfile1_20130430110747+0100_20130430110848+0100_src1_0224180.dat.lzo
Replication 2 set: hdfs://mynamenode/data/myfeed/load_dt=20130430/batch=202/myfile1_20130430110747+0100_20130430110848+0100_src1_0224180.dat.lzo
$ hadoop fs -setrep 3 /data/myfeed/load_dt=20130430/batch=202/myfile1_20130430110747+0100_20130430110848+0100_src1_0224180.dat.lzo
Replication 3 set: hdfs://mynamenode/data/myfeed/load_dt=20130430/batch=202/myfile1_20130430110747+0100_20130430110848+0100_src1_0224180.dat.lzo
And the replica violations disappear.
Also use this approach to solve underreplication issues.
Oozie
Good overview of Oozie
http://blog.cloudera.com/blog/2014/03/inside-apache-oozie-ha/
http://www.thecloudavenue.com/2013/10/installation-and-configuration-of.html
Good overview of Oozie
http://blog.cloudera.com/blog/2014/03/inside-apache-oozie-ha/
http://www.thecloudavenue.com/2013/10/installation-and-configuration-of.html