Saturday, 1 June 2013

Abortive (came close but ultimately failed :() installation of Cloudera Manager and Cloudera software without internet access (ie Cloudera Path C approach)

Note - this was an abortive attempt using Path C.
Read here for some notes re Path B installation approach.

Old abortive Path C approach below ...

Up to now, we have not been using Cloudera Manager.
Then again we have a really simple system running a small HDFS clusters.
Ganglia and simple scripting has done the job for us.

Now we have bought Cloudera support and it makes sense to add their Cloudera Manager layer.
Why? It seems like this simplifies the support (will give examples soon).

Our system lives in a DMZ and has no direct internet access. It has servers with the following specs:
  • 2 x quad core Xeon 2.1GHz CPUs, 48GB RAM, 4 x 2TB hard disks and a pair of 1G NICs
  • Cent OS 6.3
  • Latest stable release of Cloudera at the time 4.2.1 (Cloudera Manager 4.5.3)
Problem.
The free edition of Cloudera Manager found here is not useful if you don't have a connection to the internet. Installing Cloudera Manager without internet access is a pain. I was speaking to a consultant from Cloudera who strongly urged me to get our firewalls opened up to allow access to archive.cloudera.com from at least one server in the walled off environment. He said this would help if we needed to patch the environment quickly. Anyway, we have this constraint now so that is the starting point for this article.

Note - I have not yet completed this exercise but will do a write up as soon as I have got this working.

Here are some places to start ...

I started here in cmeeig_topic_6_1 but there wasn't enough information in this article.
From that document, I found myself going round in circles in the Cloudera documentation and not getting anywhere.

For a repository of tarballs including the Cloudera Manager, use this link
The instructions I've followed below are based on this article -  Appendix C in this article (might also cross reference this guide in cmeeig_topic_21_5 for installing parcels).
The Cloudera documentation moves around a bit - so this is following the Installation Path C ie installing from Tarballs. This route (ie Path C) is not the preferred route. It makes it difficult to get updates quickly and efficiently (chasing package dependencies is painful). So consider carefully before going this route rather than a direct connection from the cloudera manager server to archive.cloudera.com.

Installation steps used:
  1. Download
    • Java - downloaded
      •  Latest JDK 6 - use the rpm version because if you move back to the Cloudera Manager installer with an internet connection, it will check the rpm repository to determine whether java is installed (or so it seems). So look here for Java JDK versions. I downloaded this one: http://download.oracle.com/otn-pub/java/jdk/6u45-b06/jdk-6u45-linux-x64-rpm.bin. If you want the version certified by Cloudera, you'll need to look for the correct older version but the documentation new versions should be fine.
    • MySQL (or PostgreSQL or Oracle) - downloaded the following:
      • MySQL-server-5.5.22-1.linux2.6.x86_64.rpm
      • MySQL-client-5.5.22-1.linux2.6.x86_64.rpm
    • Install the latest cloudera manager and CDH parcel
  2. Install Java SDK - don't install this as Cloudera Manager will install this
    • java -showversion
    • java version "1.6.0_24"
    • OpenJDK Runtime Environment (IcedTea6 1.11.1) (rhel-1.45.1.11.1.el6-x86_64)
  3. Set up database
    • Got a choice of databases - PostgreSQL, MySQL or Oracle - we set up MySQL for Hive so has decided to create another database for monitoring and reporting using MySQL. But all three of these were options for us. Interestingly in the documentation PostgreSQL is the first. Not sure if this is a preference. 
    • Run the downloaded mySQL rpms - here's what I did on my system
      • rpm -i ./MySQL-server-5.5.22-1.linux2.6.x86_64.rpm --replacefiles
      • rpm -i ./MySQL-client-5.5.22-1.linux2.6.x86_64.rpm
      • including the following
        • mkdir -p /mysql_mon_rep_data
        • chown -R mysql:mysql /mysql_mon_rep_data
        • chmod 755 /mysql_mon_rep_data
        • mkdir -p /var/log/mysql/logs/binary/mysql_binary_log
        • chown -R mysql:mysql /var/log/mysql
        • mkdir -p /usr/share/java/
        • cp -ip mysql-connector-java-5.1.18-bin.jar /usr/share/java/mysql-connector-java.jar  # copied from earlier download I had for Hive repository. This can be found on the Oracle mySQL downloads site
        • chown mysql:mysql /usr/share/java/mysql-connector-java.jar
    • Moved old version of MySQL out the way (ie that was in /var/lib/mysql_data)
    • Change the /etc/my.cnf mySQL config file to include the configuration settings as per the documentation. I foolishly changed the bind_address to be equal to the server name rather than leave this out and have the default as localhost (need to verify this but this led me down a goose path and the Cloudera consultant was very good at keeping to the script in their documentation).
    • Stopped and restarted the mysql daemon (note - looks like CentOS 6.3 uses mysql and not mysqld as the daemon)
    • This builds a new set of database files (if you are upgrading your database - do something different) ... but it fails with the following error (for me at least) because I had moved the files from the default location. So I had to run the following to get it to work.
    • # /usr/bin/mysql_secure_installation
      
      NOTE: RUNNING ALL PARTS OF THIS SCRIPT IS RECOMMENDED FOR ALL MySQL
            SERVERS IN PRODUCTION USE!  PLEASE READ EACH STEP CAREFULLY!
      
      In order to log into MySQL to secure it, we'll need the current
      password for the root user.  If you've just installed MySQL, and
      you haven't set the root password yet, the password will be blank,
      so you should just press enter here.
      
      Enter current password for root (enter for none): 
      OK, successfully used password, moving on...
      
      Setting the root password ensures that nobody can log into the MySQL
      root user without the proper authorisation.
      
      Set root password? [Y/n] Y
      New password: 
      Re-enter new password: 
      Password updated successfully!
      Reloading privilege tables..
       ... Success!
      
      
      By default, a MySQL installation has an anonymous user, allowing anyone
      to log into MySQL without having to have a user account created for
      them.  This is intended only for testing, and to make the installation
      go a bit smoother.  You should remove them before moving into a
      production environment.
      
      Remove anonymous users? [Y/n] Y
       ... Success!
      
      Normally, root should only be allowed to connect from 'localhost'.  This
      ensures that someone cannot guess at the root password from the network.
      
      Disallow root login remotely? [Y/n] n
       ... skipping.
      
      By default, MySQL comes with a database named 'test' that anyone can
      access.  This is also intended only for testing, and should be removed
      before moving into a production environment.
      
      Remove test database and access to it? [Y/n] Y
       - Dropping test database...
       ... Success!
       - Removing privileges on test database...
       ... Success!
      
      Reloading the privilege tables will ensure that all changes made so far
      will take effect immediately.
      
      Reload privilege tables now? [Y/n] Y
       ... Success!
      
      Cleaning up...
      
      All done!  If you've completed all of the above steps, your MySQL
      installation should now be secure.
      
      Thanks for using MySQL!
      
    • Set the various mySQL account passwords
    • amon_password=insertpassword
      smon_password=insertpassword
      rman_password=insertpassword
      hmon_password=insertpassword
      hive_password=insertpassword
      nav_password=insertpassword
      mysql -u root <<EOF
      create database amon DEFAULT CHARACTER SET utf8;
      grant all on amon.* TO 'amon'@'%' IDENTIFIED BY '$amon_password';
      create database smon DEFAULT CHARACTER SET utf8;
      grant all on smon.* TO 'smon'@'%' IDENTIFIED BY '$smon_password';
      create database rman DEFAULT CHARACTER SET utf8;
      grant all on rman.* TO 'rman '@'%' IDENTIFIED BY '$rman_password';
      create database hmon DEFAULT CHARACTER SET utf8;
      grant all on hmon.* TO 'hmon'@'%' IDENTIFIED BY '$hmon_password';
      create database hive DEFAULT CHARACTER SET utf8;
      grant all on hive.* TO 'hive'@'%' IDENTIFIED BY '$hive_password';
      create database nav DEFAULT CHARACTER SET utf8;
      grant all on nav.* TO 'nav'@'%' IDENTIFIED BY '$nav_password';
      EOF
  4. Install cloudera manager server 
    • mkdir /opt/cloudera-manager
    • Install the tarball in this directory  /opt/cloudera-manager. It creates cm-4.6.0/... 
    • Create a soft link to the latest Cloudera manager tarball installation ln -s cm-4.5.3 cm. I removed this step because I ran into difficulties and thought it might be caused by this. But it does make sense to have a non versioned path
    • Create a cloudera service manager group and user account (note - uid and gid not reqd):
      • groupadd --gid cloudera-scm
      • useradd --system --uid --home=/opt/cloudera-manager/cm-4.6.0/run/cloudera-scm-server --gid cloudera-scm --no-create-home --shell=/bin/false --comment "Cloudera SCM User" cloudera-scm 
    • Cloudera manager requires these directories (can be changed if required)
    • mkdir -p /var/log/cloudera-scm-headlamp
      chown cloudera-scm:cloudera-scm /var/log/cloudera-scm-headlamp
      mkdir -p /var/log/cloudera-scm-firehose
      chown cloudera-scm:cloudera-scm /var/log/cloudera-scm-firehose
      mkdir -p /var/log/cloudera-scm-alertpublisher
      chown cloudera-scm:cloudera-scm /var/log/cloudera-scm-alertpublisher
      mkdir -p /var/log/cloudera-scm-eventserver
      chown cloudera-scm:cloudera-scm /var/log/cloudera-scm-eventserver
      mkdir -p /var/lib/cloudera-scm-headlamp
      chown cloudera-scm:cloudera-scm /var/lib/cloudera-scm-headlamp
      mkdir -p /var/lib/cloudera-scm-firehose
      chown cloudera-scm:cloudera-scm /var/lib/cloudera-scm-firehose
      mkdir -p /var/lib/cloudera-scm-alertpublisher
      chown cloudera-scm:cloudera-scm /var/lib/cloudera-scm-alertpublisher
      mkdir -p /var/lib/cloudera-scm-eventserver
      chown cloudera-scm:cloudera-scm /var/lib/cloudera-scm-eventserver
    • Configure the /opt/cloudera-manager/cm-4.6.0/etc/cloudera-scm-agent/config.ini file. Note initially you only need to change the server_host variable to be the name or ip address of the server you are running your cloudera manager server on. You will send this to the cloudera manager agents once you have installed the cloudera manager software there.
    • Create the scm database in mySQL (note the Cloudera documentation assumes you are on a remote server and does a proxy login using a temp user as root can't remote login I think). But on the cloudera manager server where my mySQL database is running , I used: /opt/cloudera-manager/cmf/schema/scm_prepare_database.sh mysql -u root -p scm scm scm (this defaults the -h option to localhost)
    • Start the cloudera manager server  /opt/cloudera-manager/cm-4.6.0/etc/init.d/cloudera-scm-server start
  5. Prepare the cluster
    • Do this at server build time for least admin overhead ...
    • Generate a ssh rsa key pair on the cloudera manager server
    • On each server running the agent, as root (or sudo user):
      • cd; mkdir .ssh; cd .ssh; vi authorized_keys # add ssh public key created above; chmod 700 . ; chmod 600 authorized_keys
    • On the cloudera manager server, create a file called host_list with the names of the servers in the cluster
  6. Install the cloudera manager agents (my servers had a bit of nfs on each server, but you could equally scp the tarball to the server and then install it) and copy the changed agent config.ini (see step 4 where we prep'd this)
    • Build the cloudera manager software (not # hashed host_list entries exclude them from the operation with the grep -v '^#')
      • for h in `cat host_list | grep -v '^#' | awk ' { print $1 } '`; do   ssh -q $h 'hostname; [ ! -d /mnt/nfs/vol1/packages ] && echo No_NFS && exit; mkdir -p /opt/cloudera-manager; cd /opt/cloudera-manager; tar xvfz /mnt/nfs/vol1/packages/cloudera-manager-el6-cm4.6.0_x86_64.tar.gz; cd /opt; chown -R root:root cloudera-manager '; scp /opt/cloudera-manager/cm-4.6.0/etc/cloudera-scm-agent/config.ini root@$h:/opt/cloudera-manager/cm-4.6.0/etc/cloudera-scm-agent ; done
    • Start and check the status of the agents in the cluster
      • for h in `cat host_list | grep -v '^#' | awk ' { print $1 } '`; do   ssh -q $h 'hostname; /opt/cloudera-manager/cm-4.6.0/etc/init.d/cloudera-scm-agent start; ' ; done
      • for h in `cat host_list | grep -v '^#' | awk ' { print $1 } '`; do   ssh -q $h 'hostname; /opt/cloudera-manager/cm-4.6.0/etc/init.d/cloudera-scm-agent status; ' ; done
  7. Try and connect to the cloudera server manager 
    • I had to use port forwarding to get round firewalling 
      • ssh -L 7180:localhost:7180
    • Using browser http://:7180
  8. Hit a problem installing the Cloudera packages so changed tack and opted to install a local cloudera repository ... download the latest cloudera repository archive.cloudera.com/cm4/repo-as-tarball/4.6.0/cm4.6.0-centos6.tar.gz
  9. Needed CentOS packages to install apache httpd web server on a server (put mine on the cloudera manager server) to allow the servers to install from this repository. Change the DocumentRoot in the httpd config settings in /etc/httpd/conf/httpd.conf.  DocumentRoot "/opt/cloudera/yum-repo" and .
    • # rpm -i httpd-2.2.15-9.el6.centos.x86_64.rpm
    • warning: httpd-2.2.15-9.el6.centos.x86_64.rpm: Header V3 RSA/SHA1 Signature, key ID c105b9de: NOKEY
    • error: Failed dependencies:
    • /etc/mime.types is needed by httpd-2.2.15-9.el6.centos.x86_64
    • apr-util-ldap is needed by httpd-2.2.15-9.el6.centos.x86_64
    • httpd-tools = 2.2.15-9.el6.centos is needed by httpd-2.2.15-9.el6.centos.x86_64
    • libaprutil-1.so.0()(64bit) is needed by httpd-2.2.15-9.el6.centos.x86_64
    • # rpm -i httpd-tools-2.2.15-9.el6.centos.x86_64.rpm 
    • warning: httpd-tools-2.2.15-9.el6.centos.x86_64.rpm: Header V3 RSA/SHA1 Signature, key ID c105b9de: NOKEY
    • error: Failed dependencies:
    • libaprutil-1.so.0()(64bit) is needed by httpd-tools-2.2.15-9.el6.centos.x86_64
    • # rpm -i apr-util-1.3.9-3.el6_0.1.x86_64.rpm 
    • warning: apr-util-1.3.9-3.el6_0.1.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID c105b9de: NOKEY
    • # rpm -i httpd-tools-2.2.15-9.el6.centos.x86_64.rpm 
    • warning: httpd-tools-2.2.15-9.el6.centos.x86_64.rpm: Header V3 RSA/SHA1 Signature, key ID c105b9de: NOKEY
    • # rpm -i apr-util-ldap-1.3.9-3.el6_0.1.x86_64.rpm 
    • warning: apr-util-ldap-1.3.9-3.el6_0.1.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID c105b9de: NOKEY
    • # rpm -i mailcap-2.1.31-2.el6.noarch.rpm
    • warning: mailcap-2.1.31-2.el6.noarch.rpm: Header V3 RSA/SHA1 Signature, key ID c105b9de: NOKEY
    • # rpm -i httpd-2.2.15-9.el6.centos.x86_64.rpm
    • warning: httpd-2.2.15-9.el6.centos.x86_64.rpm: Header V3 RSA/SHA1 Signature, key ID c105b9de: NOKEY
  10. Download and install createrepo rpm
    • Download from vault.centos.org in my case vault.centos.org/6.3/os/x86_64/Packages/
      • # rpm -i createrepo-0.9.8-5.el6.noarch.rpm 
      • warning: createrepo-0.9.8-5.el6.noarch.rpm: Header V3 RSA/SHA1 Signature, key ID c105b9de: NOKEY
      • error: Failed dependencies:
      • deltarpm is needed by createrepo-0.9.8-5.el6.noarch
      • python-deltarpm is needed by createrepo-0.9.8-5.el6.noarch
      • # rpm -i deltarpm-3.5-0.5.20090913git.el6.x86_64.rpm 
      • warning: deltarpm-3.5-0.5.20090913git.el6.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID c105b9de: NOKEY
      • # rpm -i python-deltarpm-3.5-0.5.20090913git.el6.x86_64.rpm 
      • warning: python-deltarpm-3.5-0.5.20090913git.el6.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID c105b9de: NOKEY
      • # rpm -i createrepo-0.9.8-5.el6.noarch.rpm 
      • warning: createrepo-0.9.8-5.el6.noarch.rpm: Header V3 RSA/SHA1 Signature, key ID c105b9de: NOKEY
    • Install a cloudera yum repository
      • # cd /opt/cloudera
      • # mkdir yum-repo
      • # cd yum-repo
      • # tar xvfz /cm4.6.0-centos6.tar.gz
      • # cd cm
      • # createrepo .
      • 14/14 - 4.6.0/RPMS/x86_64/enterprise-debuginfo-4.6.0-1.cm460.p0.141.x86_64.rpm  
      • Saving Primary metadata
      • Saving file lists metadata
      • Saving other metadata
      • make the following repo file
      • # cat /etc/yum.repos.d/clouderarepo.repo 
      • [clouderarepo]
      • name=clouderarepo
      • baseurl=http:///cm
      • enabled=1
      • gpgcheck=0

Bit disappointing not to get this Path C to work.
Tried hard but had hassles that never ended up working.
We were close ... not sure whether I'll have the opportunity to return to this and retry.

So we ended up using Path B documented here.

Thursday, 30 May 2013

Overview of an Oracle DB

Quick overview of an Oracle database (these are from a few years back)

-- Oracle files details
select * from dba_data_files
select * from v$logfile
select * from v$log
select * from v$tempfile
select * from v$loghist
-- check the amt of redologging

-- Oracle database & instance details & parameter settings
select * from v$database
select * from v$instance
select * from v$parameter

-- #objects x owner & object_type
select owner, object_type, count(*)
from dba_objects
group by owner, object_type

-- #segments (count) & size (GB) x schema x segment type
select owner, segment_type, count(*), sum(bytes)/1024/1024/1024
from dba_segments
group by owner, segment_type
order by owner, segment_type

-- #segments (count) & size (GB) x schema x segment type per table
select owner, segment_type, count(*), sum(bytes)/1024/1024/1024
from dba_segments
where segment_name like '%%'
group by owner, segment_type
order by 4 desc

-- Listing of schema, segment type & bytes (per table commented out)
select owner, segment_type, segment_name, bytes
from dba_segments
--where segment_name like '%%'
order by 4 desc

-- other useful views
dba_jobs -- scheduled jobs
dba_mviews -- see all 9i mview dictionary views here

Thursday, 23 May 2013

Graph Analysis using MapReduce

Graph Analysis - Social Network Analysis

Was at a Teradata CTO day today and learnt about their new graph analysis capability in Asterdata which had skipped me by.

I wonder whether this is not development based on the same Yahoo seed that has grown Giraph.

Time to play with this ...

Friday, 3 May 2013

Hadoop MapReduce jobs hanging (after adding new nodes to cluster)

Interesting Hadoop issue

Background

We have been running a proof of concept (PoC) CDH3u3 cluster of 9 data nodes running CentOS 6.1.
Now we want to go production and don't have the time to upgrade to CDH4.2 ;-)
We were adding new data nodes running CentOS 6.3 with the same version of CDH3u3.
At the same time we took the time to update the topology of the cluster for these new nodes.
We added the rack they were in into the topology file.
We left the PoC cluster with their default rack locations.

Symptoms

We could load data into the cluster no problem.
But when we ran queries, they would run for a while and then stop.
The number of occupied Map Slots would be 0 but there were plenty of free slots.

We saw some errors/warning in the logs and they were certainly not obvious.

Hadoop jobtracker log warnings/errors

Example warnings/errors in namenode jobtracker logfile (maybe related)


2013-05-01 10:11:11,984 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'KILLED' for 'attempt_201305010958_0001_m_000174_1' of TIP 'task_201305010958_0001_m_000174'oldTT=tracker_:localhost/127.0.0.1:54000 while newTT=tracker_:localhost/127.0.0.1:54000
2013-05-01 10:11:11,984 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201305010958_0001_m_000220_1' to tip task_201305010958_0001_m_000220, for tracker 'tracker_:localhost/127.0.0.1:54000'
2013-05-01 10:11:11,984 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException: java.lang.NullPointerException
2013-05-01 10:11:11,984 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 8021, call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@17b0b765, false, false, true, 2320) from 10.173.226.117:53514: error: java.io.IOException: java.lang.NullPointerException
java.io.IOException: java.lang.NullPointerException

Example odd  entries in namenode tasktracker logfile (maybe related)


2013-04-30 00:19:22,760 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to '' with reponseId '6549
2013-04-30 00:19:22,764 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to '' with reponseId '6549
2013-04-30 00:19:22,767 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to '' with reponseId '6549
2013-04-30 00:19:22,770 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to '' with reponseId '6549
2013-04-30 00:19:22,773 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to '' with reponseId '6549
2013-04-30 00:19:22,777 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to '' with reponseId '6549


Solution

Adding the rack locations of all the cluster nodes in the topology file did the trick.

Friday, 22 February 2013

Cheap archival NAS storage using BackBlaze design

BackBlaze have a great low-cost storage product.
The best part is they have open sourced their design.

They have just announced a new 3rd generation storage design as reported by TheRegister costing $2000 for the chassis holding 45 disks (disks not included in the price).

Interestingly Netflix were influenced by the BackBlaze design for their 100TB (36 x 3TB) design.


Thursday, 21 February 2013

BI, DW DBMS, Big Data articles

Articles covering the BI, DW, DBMSs, NoSQL and Big Data



13 Big Data Vendors to watch in 2013 - including AWS, 10gen, Cloudera, Hortonworks, 

Random entry - Graph DB Neo4j overview but the first 5 mins gives an interesting overview of Key-Vaue Pair vs ColumnStore vs Document vs Graph Databases


Big Data Architectures patterns by Eddie Satterley

Wednesday, 20 February 2013

Balancing an HDFS cluster (including java LeaseChecker OutOfmemoryError - still unresolved)

HDFS Balancer

Read the following articles for starters:

Yahoo tutorial module on Hadoop rebalancing 
Rebalancer Design PDF

Architecture for Open Source Applications HDFS - see rebalancing paragraph but take care talks about the threshold being between 0 and 1

Log on a the hadoop user (the user that runs our cluster is called hadoop) 
Change to the ${HADOOP_HOME}/bin where the hadoop scripts reside.
Then run the start-balancer.sh.
The default is a balancing threshold of 10% so choose something a little lower.
I chose 5%.
I should have started closer to 10% like 9% or 8%.
Why? Because start_balancer.sh TAKES FOREVER!
Use hadoop dfsadmin -report to check the redistribution of the space.

[hadoop@mynode hadoop]$ cd $HADOOP_HOME/bin


[hadoop@mynode bin]$ ./start-balancer.sh -threshold 5
starting balancer, logging to /opt/hadoop-0.20.2-cdh3u3/bin/../logs/hadoop-hadoop-balancer-mynode.out
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved
Feb 19, 2013 6:44:27 PM           0                 0 KB           516.65 GB              20 GB
[hadoop@mynode bin]$ hadoop dfsadmin -report


[hadoop@mynode bin]$ cat /opt/hadoop-0.20.2-cdh3u3/bin/../logs/hadoop-hadoop-balancer-mynode.out
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved
Feb 19, 2013 6:44:27 PM           0                 0 KB           516.65 GB              20 GB
Feb 19, 2013 7:05:57 PM           1              2.39 GB           514.07 GB              20 GB
Feb 19, 2013 7:28:28 PM           2              4.89 GB           511.59 GB              20 GB
Feb 19, 2013 7:50:29 PM           3              7.32 GB            509.2 GB              20 GB
Feb 19, 2013 8:12:29 PM           4              9.74 GB           506.67 GB              20 GB
Feb 19, 2013 8:34:30 PM           5             12.18 GB           504.51 GB              20 GB
Feb 19, 2013 8:56:30 PM           6             14.66 GB           502.14 GB              20 GB
Exception in thread "LeaseChecker" java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:640)
at java.util.concurrent.ThreadPoolExecutor.addIfUnderMaximumPoolSize(ThreadPoolExecutor.java:727)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:657)
at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:78)
at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:754)
at org.apache.hadoop.ipc.Client.call(Client.java:1080)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
at $Proxy1.renewLease(Unknown Source)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.renewLease(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.renew(DFSClient.java:1282)
at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:1294)
at java.lang.Thread.run(Thread.java:662)


[hadoop@mynode bin]$ ./stop-balancer.sh 
./stop-balancer.sh: fork: retry: Resource temporarily unavailable
./stop-balancer.sh: fork: retry: Resource temporarily unavailable
./stop-balancer.sh: fork: retry: Resource temporarily unavailable
./stop-balancer.sh: fork: retry: Resource temporarily unavailable
./stop-balancer.sh: fork: Resource temporarily unavailable
[hadoop@mynode bin]$ w
 21:19:18 up 231 days, 11:44,  2 users,  load average: 0.03, 0.01, 0.00
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT

[hadoop@mynode bin]$ hadoop job -list
/opt/hadoop/bin/hadoop: fork: retry: Resource temporarily unavailable
/opt/hadoop/bin/hadoop: fork: retry: Resource temporarily unavailable
/opt/hadoop/bin/hadoop: fork: retry: Resource temporarily unavailable
/opt/hadoop/bin/hadoop: fork: retry: Resource temporarily unavailable
/opt/hadoop/bin/hadoop: fork: Resource temporarily unavailable


[hadoop@mynode bin]$ cd ../pids
[hadoop@mynode pids]$ ls -atlr
total 20
drwxr-xr-x 17 hadoop hadoop 4096 Mar  8  2012 ..
-rw-rw-r--  1 hadoop hadoop    5 Feb 13 12:20 hadoop-hadoop-namenode.pid
-rw-rw-r--  1 hadoop hadoop    5 Feb 13 12:21 hadoop-hadoop-jobtracker.pid
-rw-rw-r--  1 hadoop hadoop    5 Feb 19 18:44 hadoop-hadoop-balancer.pid
drwxr-xr-x  2 hadoop hadoop 4096 Feb 19 18:44 .


[hadoop@mynode bin]$ kill -0 2329
[hadoop@mynode bin]$ echo $?
0
[hadoop@mynode bin]$ kill 2329
[hadoop@mynode bin]$ echo $?
0
[hadoop@mynode bin]$ ps -ef | grep 2329 | grep -v grep
[hadoop@mynode bin]$ 


Sometime later ... restarted a start_balancer.sh using 9% then 8% threshold ...


[hadoop@mynode bin]$ ./start-balancer.sh -threshold 9
starting balancer, logging to /opt/hadoop-0.20.2-cdh3u3/bin/../logs/hadoop-hadoop-balancer-mynode.out
[hadoop@mynode bin]$ tail -10f /opt/hadoop-0.20.2-cdh3u3/bin/../logs/hadoop-hadoop-balancer-mynode.out
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved
The cluster is balanced. Exiting...
Balancing took 629.0 milliseconds

[hadoop@mynode bin]$ ./start-balancer.sh -threshold 8
starting balancer, logging to /opt/hadoop-0.20.2-cdh3u3/bin/../logs/hadoop-hadoop-balancer-mynode.out

Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved
Mar 15, 2013 6:21:37 PM           0                 0 KB            63.46 GB              10 GB
Mar 15, 2013 6:42:37 PM           1              1.22 GB            62.13 GB              10 GB
...