Friday, 30 December 2011

Korn shell wrapper script for a Teradata FastExport


Thanks to Euan Murray for this improvement on my slow bteq extract.
Uses Teradata FastExport fexp and reduced the extract time from hours to minutes.
Posted here as example wrapper shell script for a Teradata FastExport.
N.B. Caution: remember to coalesce any fields that can yield null. Also the comma can cause problems


#!/bin/ksh
# set -x
# Program: td_fexp_shell_wrapper_eg.ksh
# Description: Example shell wrapper script to fast export lots of data from Teradata
# Version: 0.1 glour    15-Jul-2011 Initial version
# Version: 0.2 emurray1 14-Oct-2011 Altered to be fexp
# -------------------------------------------------------------------------


PRG=`basename $0 .ksh` # i.e. PRG="td_fexp_shell_wrapper_eg"
eval $@ # evaluate any command line args


LOGON_STR="prodtd/user,passwd;" # the TD username and passwd
LOCATION=/output_path/ # the location of the output file
DATAFILE=${LOCATION}/${PRG}.dat # the output file
DATA_DATE=$(date +%Y-%m-%d -d "$(date +%d) days ago") # gets the last day of previous month
echo ${DATA_DATE}
DEBUG=0 # for debugging set to 1


>$DATAFILE # empty the output file


fexp > /dev/null 2>&1 <
.logon ${LOGON_STR}
.LOGTABLE   DATABASE.FExp_Log;
.BEGIN EXPORT  SESSIONS 48;
.EXPORT OUTFILE ${DATAFILE};
SELECT ',' ||
a.aaaa1 || '|' ||
trim(a.aaaa2) ||'|'||
coalesce(b.bbbb1,'??') ||'|'||
coalesce(c.cccc1,'??') ||'|'||
        ... more fields here ...
cast (c.cccc2_dt as date  format 'YYYYMMDD') (TITLE '')
FROM db1.table_a a, db2.table_b b, db3.table_c c
WHERE a.aaaa1 = b.bbbb1
AND b.bbbb4 = c.cccc3
AND     '${DATA_DATE}' between a.aaaa_from_Dt and st.aaaa_to_Dt
AND     ... other clauses ...;
.END EXPORT;
.LOGOFF;
EOF
RET_CODE=$?
if [[ ${RET_CODE} -ne 0 ]]
then
        echo fast export failed investigate before proceeding
        return
fi


# clean up the extra control chars in fast export output file
echo removing fast load extra chars
sed -r 's@^[[:cntrl:][:print:]]*,@@g' ${DATAFILE} > ${LOCATION}/${PRG}.${DATA_DATE}.dat
RET_CODE=$?
if [[ ${RET_CODE} -ne 0 ]]
then
        echo cleanse failed investigate before proceeding
        exit
else
        echo Cleanse complete removing original file
        rm ${DATAFILE}
fi

Tuesday, 13 December 2011

Perl TCP Socket Programs


3 small pieces of perl code for the syntax

  1. listener.pl - server socket script to listen and consume a stream of ASCII lines from a socket
  2. producer.pl - client socket script to produce/write a stream of XML (ASCII lines) supplied in a file onto a socket
  3. consumer.pl - client socket script to consume a stream of XML ASCII lines from a socket 





listener.pl - simple server



#!/usr/bin/perl
use IO::Socket;
use POSIX qw/strftime/;


$|=1;


my $sock = new IO::Socket::INET (
                                 LocalHost => 'localhost',  # rem to change to server ip addr if not running on same machine as the client
                                 LocalPort => '5555',
                                 Proto => 'tcp',
                                 Listen => 1,
                                 Reuse => 1,
                                );
die "Could not create socket: $!\n" unless $sock;


$count = 0;
$num_recs = 0;
$num_files = 1;
$max_recs = 30000;
$path = "/data/";
$file_root = "test";
$datetime = strftime('%Y%m%d%H%M%S',localtime);
open OUT, "|gzip -c > ${path}${file_root}_${datetime}_${num_files}.gz" or die "unable to create OUT";


my $new_sock = $sock->accept();


while(<$new_sock>) {
        $line = $_;
        #print "line: $count: $num_recs : $line";
        $count++;
        if (m#
        {
                print OUT $line;
                $num_recs++;
                if (($num_recs % $max_recs) == 0)
                {
                        #print "in reset : $count : $num_recs \n";
                        close(OUT);
                        $num_files++;
                        $datetime = strftime('%Y%m%d%H%M%S',localtime);
                        open OUT, "|gzip -c > ${path}${file_root}_${datetime}_${num_files}.gz" or die "unable to create OUT";
                }
        }
        else
        {
                #print "$count : $num_recs : in else\n";
                print OUT $line;
        }
}
close (OUT);
close($sock);




producer.pl - simple producer client



#!/usr/bin/perl


use IO::Socket;


my $sock = new IO::Socket::INET (
                                 PeerAddr => 'localhost',   # rem to chg to server ip addr if not running socket client and server on same server
                                 PeerPort => '5577',
                                 Proto => 'tcp',
                                );
die "Could not create socket: $!\n" unless $sock;


open IN, "xml.dump" or die "unable to open IN";


$i = 0;
print "before while\n";
while ()
{
        $i++;
        #print "in loop: $i \n";
        print $sock "$_";
}


close(IN);
close($sock);




consumer.pl - client consumer/reader of socket



#!/usr/bin/perl


use IO::Socket;
use POSIX qw/strftime/;
use File::Path;


my $sock = new IO::Socket::INET (
                                 PeerAddr => 'localhost', # rem to replace with svr ip if not on same machine as socket server
                                 PeerPort => '1099',
                                 Proto => 'tcp',
                                );
die "Could not create socket: $!\n" unless $sock;




$|=1;


$count = 0;
$num_recs = 0;
$num_files = 1;
$max_recs = 60000;
$path = "/data";
$file_root = "test";
$datetime = strftime('%Y%m%d%H%M%S',localtime);
$yyyymmdd = strftime('%Y%m%d',localtime);
unless (-d "${path}${yyyymmdd}")
{
        mkpath("${path}/${yyyymmdd}") or die "Unable to mkpath(${path}${yyyymmdd}) ($!)\n";
}
open OUT, "|gzip -c > ${path}${yyyymmdd}/${file_root}_${datetime}_${num_files}.gz" or die "unable to create OUT";


# client read from socket
while(<$sock>) {
        $line = $_;
        #print "line: $count: $num_recs : $line";
        $count++;
        if (m#
        {
                print OUT $line;
                $num_recs++;
                if (($num_recs % $max_recs) == 0)
                {
                        #print "in reset : $count : $num_recs \n";
                        close(OUT);
                        $num_files++;
                        $datetime = strftime('%Y%m%d%H%M%S',localtime);
                        $yyyymmdd = strftime('%Y%m%d',localtime);
                        unless (-d "${path}${yyyymmdd}")
                        {
                                mkpath("${path}${yyyymmdd}") or die "Unable to mkpath(${path}${yyyymmdd}) ($!)\n";
                        }
                        open OUT, "|gzip -c > ${path}${yyyymmdd}/${file_root}_${datetime}_${num_files}.gz" or die "unable to create OUT";
                }
        }
        else
        {
                #print "$count : $num_recs : in else\n";
                print OUT $line;
        }
}
close (OUT);
close($sock);

Friday, 2 December 2011

Useful links etc

TBC - just a bunch of useful links

Port Forwarding

ssh -L localport:localhost:remoteport remotehost

Hadoop rebuild


rm -rf /data/hdfs
rm -rf /data/tmpd_hdfs
hadoop namenode -format
start-all.sh


Emailing attachment on Linux (CentOS 6.1)


mailx -s "Example send PDF file" -a mypdf.pdf myemailaddress@mydomain.com <
pdf test mail
EOF


Other info

Solaris

Check for FC

# fcinfo hba-port
No Adapters Found
or


# fcinfo hba-port|grep -i wwn
HBA Port WWN: 2100001b321c25ba
        Node WWN: 2000001b321c25ba
HBA Port WWN: 2101001b323c25ba
        Node WWN: 2001001b323c25ba
HBA Port WWN: 2100001b321c08b9
        Node WWN: 2000001b321c08b9
HBA Port WWN: 2101001b323c08b9
        Node WWN: 2001001b323c08b9


pgp (Network Associates Freeware version)

To view keys on keyring
/opt/PGP/pgp -kv


To add key to keyring
/opt/PGP/pgp -ka


To edit the trust level of the 's key
/opt/PGP/pgp -ke [keyring]

To pgp encrypt a bunch of files (in this example a directory full of *.gz files
userid=xxxxxx    # the userid associated with the recipient's public key
for f in `ls *.gz`
do
  echo $f
  if [ ! -f ${f}.pgp ]
  then
    /opt/PGP/pgp -e $f $userid
    if [ $? -ne 0 ]
    then
      echo "ERROR: Unable to pgp encrypt file: $f"
      exit 1
    fi
  fi
done

Stop/start Solaris service - e.g. httpd

svcadm -v enable /network/http:apache2
svcadm -v disable /network/http:apache2

Swappiness in Linux

See Scott Alan Miller's (SAM's) article on swappiness
He says ...
"On a latency sensitive system or a system where disk IO is at a premium, lowering this number is generally advisable".
So for hadoop which is typically disk IO centric, you want to lower this - even set it to 0.
On Linux system run:
sysctl vm.swappiness
or
grep vm.swappiness /etc/sysctl.conf 
To set to 0:
sysctl vm.swappiness=0
or

echo "vm.swappiness = 0" >> /etc/sysctl.conf

For virtualised system he recommends setting to 10.
And to profile performance before and after the change.