GML blog

Wednesday, 11 July 2012

Installing HA - notes to self

c/o Graham H

Checking the HA

[root@dmmlw-r410-12 ~]# crm_mon

============
Last updated: Tue Jul 10 14:12:10 2012
Stack: openais
Current DC: myserver2 - partition with quorum
Version: 1.1.5-5.el6-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============

Online: [ myserver1 myserver2 ]

shared_ip_one (ocf::heartbeat:IPaddr): Started myserver1

Configuration

Install these packages:

cifs-utils-4.8.1-2.el6.x86_64.rpm

cluster-glue-1.0.5-2.el6.x86_64.rpm

cluster-glue-libs-1.0.5-2.el6.x86_64.rpm

corosync-1.2.3-36.el6.x86_64.rpm

corosynclib-1.2.3-36.el6.x86_64.rpm

corosynclib-devel-1.2.3-36.el6.x86_64.rpm

heartbeat-3.0.4-1.el6.x86_64.rpm #from epel repo

heartbeat-libs-3.0.4-1.el6.x86_64.rpm #from epel repo

keyutils-1.4-1.el6.x86_64.rpm

libibverbs-1.1.4-2.el6.x86_64.rpm

libmlx4-1.0.1-7.el6.x86_64.rpm

librdmacm-1.0.10-2.el6.x86_64.rpm

libtalloc-2.0.1-1.1.el6.x86_64.rpm

libtool-ltdl-2.2.6-15.5.el6.x86_64.rpm

lm_sensors-libs-3.1.1-10.el6.x86_64.rpm

net-snmp-libs-5.5-31.el6.x86_64.rpm

pacemaker-1.1.5-5.el6.x86_64.rpm

pacemaker-cts-1.1.5-5.el6.x86_64.rpm

pacemaker-libs-1.1.5-5.el6.x86_64.rpm

perl-TimeDate-1.16-11.1.el6.noarch.rpm

PyXML-0.8.4-19.el6.x86_64.rpm

resource-agents-3.0.12-22.el6.x86_64.rpm

net-snmp-5.5-31.el6.x86_64.rpm

sudo rpm -i --nodeps libvirt-0.8.7-18.el6.x86_64.rpm libvirt-client-0.8.7-18.el6.x86_64.rpm numactl-2.0.3-9.el6.x86_64.rpm gnutls-utils-2.8.5-4.el6.x86_64.rpm nc-1.84-22.el6.x86_64.rpm libxslt-1.1.26-2.el6.x86_64.rpm netcf-libs-0.1.7-1.el6.x86_64.rpm augeas-libs-0.7.2-6.el6.x86_64.rpm cyrus-sasl-md5-2.1.23-8.el6.x86_64.rpm qpid-cpp-client-0.10-3.el6.x86_64.rpm boost-1.41.0-11.el6.x86_64.rpm boost-1.41.0-11.el6.x86_64.rpm boost-date-time-1.41.0-11.el6.x86_64.rpm boost-python-1.41.0-11.el6.x86_64.rpm boost-test-1.41.0-11.el6.x86_64.rpm boost-regex-1.41.0-11.el6.x86_64.rpm boost-graph-1.41.0-11.el6.x86_64.rpm boost-serialization-1.41.0-11.el6.x86_64.rpm boost-wave-1.41.0-11.el6.x86_64.rpm boost-iostreams-1.41.0-11.el6.x86_64.rpm boost-signals-1.41.0-11.el6.x86_64.rpm ebtables-2.0.9-6.el6.x86_64.rpm iscsi-initiator-utils-6.2.0.872-21.el6.x86_64.rpm libicu-4.2.1-9.el6.x86_64.rpm dnsmasq-2.48-4.el6.x86_64.rpm radvd-1.6-1.el6.x86_64.rpm qemu-img-0.12.1.2-2.160.el6.x86_64.rpm yajl-1.0.7-3.el6.x86_64.rpm libcgroup-0.37-2.el6.x86_64.rpm libpciaccess-0.10.9-4.el6.x86_64.rpm

sudo rpm -i fence-virtd-libvirt-0.2.1-8.el6.x86_64.rpm fence-virtd-0.2.1-8.el6.x86_64.rpm

sudo rpm -i libesmtp-1.0.4-15.el6.x86_64.rpm

sudo rpm -i clusterlib-3.0.12-41.el6.x86_64.rpm

sudo rpm -i openais-1.1.1-7.el6.x86_64.rpm openaislib-1.1.1-7.el6.x86_64.rpm

sudo rpm -i pexpect-2.3-6.el6.noarch.rpm

sudo rpm -i perl-Net-Telnet-3.03-11.el6.noarch.rpm

sudo rpm -i cman-3.0.12-41.el6.x86_64.rpm fence-virt-0.2.1-8.el6.x86_64.rpm fence-agents-3.0.12-23.el6.x86_64.rpm net-snmp-utils-5.5-31.el6.x86_64.rpm ricci-0.16.2-35.el6.x86_64.rpm sg3_utils-1.28-3.el6.x86_64.rpm sg3_utils-libs-1.28-3.el6.x86_64.rpm oddjob-0.30-5.el6.x86_64.rpm nss-tools-3.12.9-9.el6.x86_64.rpm nss-tools-3.12.9-9.el6.x86_64.rpm modcluster-0.16.2-10.el6.x86_64.rpm

sudo rpm -i pacemaker-1.1.5-5.el6.x86_64.rpm pacemaker-cts-1.1.5-5.el6.x86_64.rpm pacemaker-libs-1.1.5-5.el6.x86_64.rpm

create /etc/corosync/corosync.conf

# Please read the corosync.conf.5 manual page

compatibility: whitetank

totem {

version: 2

secauth: off

threads: 0

interface {

ringnumber: 0

bindnetaddr: 10.x.x.x

#mcastaddr: 226.94.1.1

broadcast: yes

mcastport: 5405

ttl: 1

}

logging {

fileline: off

to_stderr: no

to_logfile: yes

to_syslog: yes

logfile: /var/log/cluster/corosync.log

debug: on

timestamp: on

logger_subsys {

subsys: AMF

debug: off

}

amf {

mode: disabled

}

#end of file

############################

run:

crm configure

paste the below into the new shell:

primitive shared_ip_one IPaddr params ip=10.x.x.0 cidr_netmask="255.255.254.0" nic="bond0"

property stonith-enabled="false"

location share_ip_one_master shared_ip_one 100: myserver1

monitor shared_ip_one 20s:10s

commit

exit

Tuesday, 10 July 2012

Web development - notes to self/links etc

Useful links

REST with Java (JAX-RS) using Jersey Tutorial - article by Lars Vogel

Monday, 9 July 2012

Pentaho PDI (Kettle) - notes to self

Starters

The Pentaho download page majors on the commercial versions.

(not sure whether the Community Edition (CE) comes with the commercial version)

Scroll down to Community Projects to find the open source version.

Spoon - the GUI where one designs, develops and tests ETL graphs.
So remember to run spoon.sh (spoon.bat) to fire up the environment and not simply click on the "Data Integration 64-bit" application (this resulted in the necessary libext JDBC libraries not to be available and resulted in a few errors below).

http://wiki.pentaho.com/display/EAI/02.+Spoon+Introduction

Problem with MySQL connection and PDI v.4.3

Error connecting to database [mytest_mysql] : org.pentaho.di.core.exception.KettleDatabaseException:
Error occured while trying to connect to the database
Exception while loading class
org.gjt.mm.mysql.Driver
...
Caused by: java.lang.ClassNotFoundException: org.gjt.mm.mysql.Driver

To resolve this problem, read the issue log here which requires you download the ConnectorJ from here
$ tar xvzf mysql-connector-java-5.1.21.tar.gz mysql-connector-java-5.1.21/mysql-connector-java-5.1.21-bin.jar
$ cp -ip /downloads/mysql-connector-java-5.1.21/mysql-connector-java-5.1.21-bin.jar /usr/local/pentaho/data-integration/libext/JDBC/

mysql misc - notes to self

installed mysql on mac
lazily running as root
needing to mkdir /var/run/mysqld and chmod 777 /var/run/mysqld
(missing something obviously)
Can across this useful document for installing mysql on mac after following my nose.

To connect to mysql using perl DBI/DBD

To load data into mysql

how to run an SQL command in a file from within mysql
mysql> source mysqlcmds.sql

how to run an SQL command from the command line
mysql < mysqlcmds.sql
(note can leave the if the first line in the mysqlcmds.sql file is use

Self consuming mysql sql script in shell script
cat load_myfile.sh

#!/bin/bash

MYFILE=/mypath/myfile.dat

mysql --user=myuser --password=xyz <
use mytest;

load data local infile '${MYFILE}'
replace
into table mytest.mytable
character set UTF8
fields terminated by '|';

EOF

If you are getting the following error, it could be that you are missing the "local" keyword (if you are providing a full path to the file"

$ ./load_myfile.sh
ERROR 13 (HY000) at line 3: Can't get stat of '/mypath/myfile.dat' (Errcode: 13)

Thursday, 5 July 2012

Big Data articles

Big Data related articles

Datasift - Tweet related apps

http://www.theregister.co.uk/2012/07/05/datasift_byo_big_data_datacenter/

Doug Cutting on Microsoft/Oracle's move in Big Data arena (Cloudera vs Hortonworks too)

http://www.theregister.co.uk/2012/06/27/doug_cutting_hadoop_interview/

Big Data Security
Securosis' Big Data Architectural Issues
Securosis' Securing Big Data recommendations white paper

Cassandra vs HBASE

http://bigdatanoob.blogspot.co.uk/2012/11/hbase-vs-cassandra.html

Interesting hackathon result

http://www.opencompute.org/blog/ocp-hackathon-winner-adaptive-storage/

Tuesday, 19 June 2012

Pig notes to self

Some commands

# Note SUBSTRING is like a python slice
# so suppose field x has "abcdfegh"
# SUBSTRING(x,3,4) => "d"
# SUBSTRING(x,2,5) => "cdef"

Note this code is there for syntax purposes only - it does nothing meaningful ...

comments

/* .... over multiple lines ...*/

-- use -param arg1='abcd' on the command line
-- use -param myvar='xyz' on the command line
%default arg1 'default value'
%default myvar 'default value'

REGISTER myudf.jar;
REGISTER piggybank.jar;

DEFINE SUBSTRING org.apache.pig.piggybank.evaluation.string.SUBSTRING();
DEFINE LENGTH org.apache.pig.piggybank.evaluation.string.LENGTH();

my_file = LOAD '$myfile' USING PigStorage('|') AS (col1:chararray, col2:double, col3:long);
my_file = DISTINCT my_file; -- remove duplicates

my_recs = FOREACH my_file GENERATE SUBSTRING(col1,0,14) AS mycol, null AS col4:chararray, (LENGTH(col1) < 3 ? col1 : SUBSTRING(REPLACE(col1,' ',''), 0,LENGTH(REPLACE(col1,' ',''))-2)) AS col5:chararray, col2, col3;

-- CONCAT(myudf.ZeroPad6Left(col1), myudf.ZeroPad6Left(col1)) AS col6:chararray

my_joined = JOIN my_recs by (col1, col2), my_recs by (col1,col2);

my_joined = FILTER my_joined BY (col3 < 1000);

my_joined2 = JOIN my_joined by col1 LEFT OUTER, my_recs by col1;

my_fin_rec = FOREACH my_joined2 GENERATE ;

STORE my_fin_rec INTO '$OUTPUTfile' USING PigStorage('|');

Saturday, 9 June 2012

Transferring Data via SSH - notes to self

Notes re transferring Data via SSH

ssh -c arcfour

If using SSH (scp/sftp/rsync with ssh), you can achieve speed enhancements using "-c arcfour" (sacrificing a little security - might be ok in-house e.g.). See notes re SSH from Charles Martin Reid's wiki.

Example using rsync

rsync can sync entire directory structures but this script needed data positioned in a certain way. rsync can do loads and is a good starting point ...
This script could/should be rewritten to make more use of rsync features.

#!/bin/ksh

eval $@

PUBKEY=${HOME}/.ssh/mykey.pub
svrname=`uname -n | cut -c1-8`
srcdir=/mysrcdir
sftpUsr=remuser
prisftpserver=remsvr
remdir=/remdestdir

cd ${srcdir}

START_DAY=${START_DAY:-`date --date="1 days ago" +%Y%m%d`}
END_DAY=${END_DAY:-`date --date="1 days ago" +%Y%m%d`}

DAY=${START_DAY}
while [ $DAY -le $END_DAY ]
do

echo "Starting DAY=$DAY ..."

echo "`date +'%Y/%m/%d %H:%M:%S'`|Start|${DAY}"

# Try and create the directory - it may have already be created
ssh -i ${PUBKEY} -q ${sftpUsr}@${prisftpserver} "mkdir ${remdir}/${DAY}; chmod 777 ${remdir}/${DAY}"

# replace with the pattern matching the files you want rsync'd
rsync -av --rsync-path=/opt/sfw/bin/rsync --rsh="ssh -i ${PUBKEY}" ${sftpUsr}@${prisftpserver}:${remdir}/${DAY}/${svrname}

echo "`date +'%Y/%m/%d %H:%M:%S'`|Complete|${DAY}"

DAY=$(($DAY+1))

done

Example not using rsync

#!/bin/ksh
# script built by several hence slightly different formatting stds used :(

eval $@

PUBKEY=${HOME}/.ssh/mykey.pub
svrname=`uname -n | cut -c1-8` # local server
srcdir=/src_logs # replace with location of source data files
sftpUsr=remuser # replace with remote user
prisftpserver=remserver # replace with remote server
remdir=/rem_logs # replace with location of destination directory

cd ${srcdir}

# this example caters for daily logfiles
START_DAY=${START_DAY:-`date --date="1 days ago" +%Y%m%d`}
END_DAY=${END_DAY:-`date --date="1 days ago" +%Y%m%d`}

DAY=${START_DAY}
while [ $DAY -le $END_DAY ]
do

echo "Starting DAY=$DAY ..."

# Try and create the directory - it may have already be created
ssh -i ${PUBKEY} -q ${sftpUsr}@${prisftpserver} "mkdir ${remdir}/${DAY}; chmod 777 ${remdir}/${DAY}"

for filename in `ls -1 ` # replace
do

base_filename=`basename ${filename} .gz`
dir_filename=`dirname ${filename}`

scp_count=0
scp_error=1

while [ $scp_error -ne 0 ] && [ $scp_count -le 2 ] # give up after 3 scp attempts
do

scp_count=$(($scp_count+1))
echo "`date +'%Y/%m/%d %H:%M:%S'`|Started (${scp_count})|$filename|${base_filename}.gz"

# throttle speed to 1M with 120sec timeout to handle hanging scp's
scp -i ${PUBKEY} -l100000 -o ConnectTimeout=120 -q ${filename} ${sftpUsr}@${prisftpserver}:${remdir}/${DAY}/${svrname}_${dir_filename}_${base_filename}.gz
# use arcfour cipher which is faster but less secure with 120sec timeout to handle hanging scp's
#scp -i ${PUBKEY} -c arcfour -o ConnectTimeout=120 -q ${filename} ${sftpUsr}@${prisftpserver}:${remdir}/${DAY}/${svrname}_${dir_filename}_${base_filename}.gz
scp_error=$?

done

echo "`date +'%Y/%m/%d %H:%M:%S'`|Complete|${filename}|${base_filename}.gz"

done

DAY=$(($DAY+1))

done

Streaming data

Flume
Scribe
Storm
S4

TBC