Friday, 30 December 2011

Korn shell wrapper script for a Teradata FastExport


Thanks to Euan Murray for this improvement on my slow bteq extract.
Uses Teradata FastExport fexp and reduced the extract time from hours to minutes.
Posted here as example wrapper shell script for a Teradata FastExport.
N.B. Caution: remember to coalesce any fields that can yield null. Also the comma can cause problems


#!/bin/ksh
# set -x
# Program: td_fexp_shell_wrapper_eg.ksh
# Description: Example shell wrapper script to fast export lots of data from Teradata
# Version: 0.1 glour    15-Jul-2011 Initial version
# Version: 0.2 emurray1 14-Oct-2011 Altered to be fexp
# -------------------------------------------------------------------------


PRG=`basename $0 .ksh` # i.e. PRG="td_fexp_shell_wrapper_eg"
eval $@ # evaluate any command line args


LOGON_STR="prodtd/user,passwd;" # the TD username and passwd
LOCATION=/output_path/ # the location of the output file
DATAFILE=${LOCATION}/${PRG}.dat # the output file
DATA_DATE=$(date +%Y-%m-%d -d "$(date +%d) days ago") # gets the last day of previous month
echo ${DATA_DATE}
DEBUG=0 # for debugging set to 1


>$DATAFILE # empty the output file


fexp > /dev/null 2>&1 <
.logon ${LOGON_STR}
.LOGTABLE   DATABASE.FExp_Log;
.BEGIN EXPORT  SESSIONS 48;
.EXPORT OUTFILE ${DATAFILE};
SELECT ',' ||
a.aaaa1 || '|' ||
trim(a.aaaa2) ||'|'||
coalesce(b.bbbb1,'??') ||'|'||
coalesce(c.cccc1,'??') ||'|'||
        ... more fields here ...
cast (c.cccc2_dt as date  format 'YYYYMMDD') (TITLE '')
FROM db1.table_a a, db2.table_b b, db3.table_c c
WHERE a.aaaa1 = b.bbbb1
AND b.bbbb4 = c.cccc3
AND     '${DATA_DATE}' between a.aaaa_from_Dt and st.aaaa_to_Dt
AND     ... other clauses ...;
.END EXPORT;
.LOGOFF;
EOF
RET_CODE=$?
if [[ ${RET_CODE} -ne 0 ]]
then
        echo fast export failed investigate before proceeding
        return
fi


# clean up the extra control chars in fast export output file
echo removing fast load extra chars
sed -r 's@^[[:cntrl:][:print:]]*,@@g' ${DATAFILE} > ${LOCATION}/${PRG}.${DATA_DATE}.dat
RET_CODE=$?
if [[ ${RET_CODE} -ne 0 ]]
then
        echo cleanse failed investigate before proceeding
        exit
else
        echo Cleanse complete removing original file
        rm ${DATAFILE}
fi

Tuesday, 13 December 2011

Perl TCP Socket Programs


3 small pieces of perl code for the syntax

  1. listener.pl - server socket script to listen and consume a stream of ASCII lines from a socket
  2. producer.pl - client socket script to produce/write a stream of XML (ASCII lines) supplied in a file onto a socket
  3. consumer.pl - client socket script to consume a stream of XML ASCII lines from a socket 





listener.pl - simple server



#!/usr/bin/perl
use IO::Socket;
use POSIX qw/strftime/;


$|=1;


my $sock = new IO::Socket::INET (
                                 LocalHost => 'localhost',  # rem to change to server ip addr if not running on same machine as the client
                                 LocalPort => '5555',
                                 Proto => 'tcp',
                                 Listen => 1,
                                 Reuse => 1,
                                );
die "Could not create socket: $!\n" unless $sock;


$count = 0;
$num_recs = 0;
$num_files = 1;
$max_recs = 30000;
$path = "/data/";
$file_root = "test";
$datetime = strftime('%Y%m%d%H%M%S',localtime);
open OUT, "|gzip -c > ${path}${file_root}_${datetime}_${num_files}.gz" or die "unable to create OUT";


my $new_sock = $sock->accept();


while(<$new_sock>) {
        $line = $_;
        #print "line: $count: $num_recs : $line";
        $count++;
        if (m#
        {
                print OUT $line;
                $num_recs++;
                if (($num_recs % $max_recs) == 0)
                {
                        #print "in reset : $count : $num_recs \n";
                        close(OUT);
                        $num_files++;
                        $datetime = strftime('%Y%m%d%H%M%S',localtime);
                        open OUT, "|gzip -c > ${path}${file_root}_${datetime}_${num_files}.gz" or die "unable to create OUT";
                }
        }
        else
        {
                #print "$count : $num_recs : in else\n";
                print OUT $line;
        }
}
close (OUT);
close($sock);




producer.pl - simple producer client



#!/usr/bin/perl


use IO::Socket;


my $sock = new IO::Socket::INET (
                                 PeerAddr => 'localhost',   # rem to chg to server ip addr if not running socket client and server on same server
                                 PeerPort => '5577',
                                 Proto => 'tcp',
                                );
die "Could not create socket: $!\n" unless $sock;


open IN, "xml.dump" or die "unable to open IN";


$i = 0;
print "before while\n";
while ()
{
        $i++;
        #print "in loop: $i \n";
        print $sock "$_";
}


close(IN);
close($sock);




consumer.pl - client consumer/reader of socket



#!/usr/bin/perl


use IO::Socket;
use POSIX qw/strftime/;
use File::Path;


my $sock = new IO::Socket::INET (
                                 PeerAddr => 'localhost', # rem to replace with svr ip if not on same machine as socket server
                                 PeerPort => '1099',
                                 Proto => 'tcp',
                                );
die "Could not create socket: $!\n" unless $sock;




$|=1;


$count = 0;
$num_recs = 0;
$num_files = 1;
$max_recs = 60000;
$path = "/data";
$file_root = "test";
$datetime = strftime('%Y%m%d%H%M%S',localtime);
$yyyymmdd = strftime('%Y%m%d',localtime);
unless (-d "${path}${yyyymmdd}")
{
        mkpath("${path}/${yyyymmdd}") or die "Unable to mkpath(${path}${yyyymmdd}) ($!)\n";
}
open OUT, "|gzip -c > ${path}${yyyymmdd}/${file_root}_${datetime}_${num_files}.gz" or die "unable to create OUT";


# client read from socket
while(<$sock>) {
        $line = $_;
        #print "line: $count: $num_recs : $line";
        $count++;
        if (m#
        {
                print OUT $line;
                $num_recs++;
                if (($num_recs % $max_recs) == 0)
                {
                        #print "in reset : $count : $num_recs \n";
                        close(OUT);
                        $num_files++;
                        $datetime = strftime('%Y%m%d%H%M%S',localtime);
                        $yyyymmdd = strftime('%Y%m%d',localtime);
                        unless (-d "${path}${yyyymmdd}")
                        {
                                mkpath("${path}${yyyymmdd}") or die "Unable to mkpath(${path}${yyyymmdd}) ($!)\n";
                        }
                        open OUT, "|gzip -c > ${path}${yyyymmdd}/${file_root}_${datetime}_${num_files}.gz" or die "unable to create OUT";
                }
        }
        else
        {
                #print "$count : $num_recs : in else\n";
                print OUT $line;
        }
}
close (OUT);
close($sock);

Friday, 2 December 2011

Useful links etc

TBC - just a bunch of useful links

Port Forwarding

ssh -L localport:localhost:remoteport remotehost

Hadoop rebuild


rm -rf /data/hdfs
rm -rf /data/tmpd_hdfs
hadoop namenode -format
start-all.sh


Emailing attachment on Linux (CentOS 6.1)


mailx -s "Example send PDF file" -a mypdf.pdf myemailaddress@mydomain.com <
pdf test mail
EOF


Other info

Solaris

Check for FC

# fcinfo hba-port
No Adapters Found
or


# fcinfo hba-port|grep -i wwn
HBA Port WWN: 2100001b321c25ba
        Node WWN: 2000001b321c25ba
HBA Port WWN: 2101001b323c25ba
        Node WWN: 2001001b323c25ba
HBA Port WWN: 2100001b321c08b9
        Node WWN: 2000001b321c08b9
HBA Port WWN: 2101001b323c08b9
        Node WWN: 2001001b323c08b9


pgp (Network Associates Freeware version)

To view keys on keyring
/opt/PGP/pgp -kv


To add key to keyring
/opt/PGP/pgp -ka


To edit the trust level of the 's key
/opt/PGP/pgp -ke [keyring]

To pgp encrypt a bunch of files (in this example a directory full of *.gz files
userid=xxxxxx    # the userid associated with the recipient's public key
for f in `ls *.gz`
do
  echo $f
  if [ ! -f ${f}.pgp ]
  then
    /opt/PGP/pgp -e $f $userid
    if [ $? -ne 0 ]
    then
      echo "ERROR: Unable to pgp encrypt file: $f"
      exit 1
    fi
  fi
done

Stop/start Solaris service - e.g. httpd

svcadm -v enable /network/http:apache2
svcadm -v disable /network/http:apache2

Swappiness in Linux

See Scott Alan Miller's (SAM's) article on swappiness
He says ...
"On a latency sensitive system or a system where disk IO is at a premium, lowering this number is generally advisable".
So for hadoop which is typically disk IO centric, you want to lower this - even set it to 0.
On Linux system run:
sysctl vm.swappiness
or
grep vm.swappiness /etc/sysctl.conf 
To set to 0:
sysctl vm.swappiness=0
or

echo "vm.swappiness = 0" >> /etc/sysctl.conf

For virtualised system he recommends setting to 10.
And to profile performance before and after the change.


Tuesday, 19 July 2011

Simple Tara Teradata Netbackup backup shell script - handles multiple Jobs

Here is a simple script to run Tara Teradata Netbackup backups.


In the example below, I am backing up 3 jobs in 2 batches as follows:

  • On its own
    • 2650_FullBackup_6Stream_4week_ret.arc
  • The following two in parallel
    • 2650_FullBackup_2Stream_4week_ret.arc
    • TD_5550_FULLBACKUP_4STR.arc

Run this from the command line like this:
/usr/local/bin/run_tara_job.ksh DEBUG=0 JOB_NAME=2650_FullBackup_6Stream_4week_ret.arc#:2650_FullBackup_2Stream_4week_ret.arc:TD_5550_FULLBACKUP_4STR.arc# >/var/misc/log/run_tara_job.log 2>&1


Or set up a cron job like this:
0 19 19 7 * /usr/local/bin/run_tara_job.ksh DEBUG=0 JOB_NAME=2650_FullBackup_6Stream_4week_ret.arc#:2650_FullBackup_2Stream_4week_ret.arc:TD_5550_FULLBACKUP_4STR.arc# >/var/misc/log/run_tara_job.log 2>&1

Note - you will need the Job Names from the Tara scripts (typically find these in /opt/teradata/tara/server/sr directory) or from the Tara GUI.


#!/bin/ksh -x
# Program: run_tara_job.ksh
# Description: Run Tara backup job.
# Parameters: JOB_NAME=
# Version: 0.1 gml 19-Jul-2011 Initial version


SEP=":"
SUBSEP="#"              # take care not to pick a delimiter that interferes with greps
DEBUG=0
eval $@


BKP_ROOT_DIR=/opt/teradata/tara/server/bin
TARA_BKP_PASSWD_FILE=/opt/teradata/tara/server/bin/tara.pwd


if [ -z "$JOB_NAME" ]
then
  echo "ERROR: No JOB_NAME arg. It is mandatory"
  exit 1
fi


JOB_NAME=`echo $JOB_NAME | tr "$SEP" " "`




cd $BKP_ROOT_DIR


#exit 0


for JOB in `echo $JOB_NAME`
do
  WAIT_FLAG=`echo $JOB | grep -c "\${SUBSEP}"`
  if [ $WAIT_FLAG -gt 0 ]
  then
    JOB=`echo $JOB | tr -d "$SUBSEP"`
    echo $JOB
  fi
  echo "Starting job $JOB at `date`"
  # note -w will wait for tara job to complete fully
  if [ $DEBUG -eq 0 ]
  then
    ./taralaunch -u administrator -e $TARA_BKP_PASSWD_FILE -j $JOB -w &
  else
    echo "./taralaunch -u administrator -e $TARA_BKP_PASSWD_FILE -j $JOB -w" &
  fi
  if [ $WAIT_FLAG -gt 0 ]
  then
    wait
  fi
done

How to extract (CSV) data from Teradata via bteq script wrapped in a Unix shell script

There is a better, far simpler way to extract data from Teradata than my previous blog entry.
Ignore it. Use this one.

The example example below extracts data from a Teradata table or view into a pipe delimited file via bteq in a Unix shell script.

I had a serious hassle eliminating the first two lines of output - i.e. the column heading and the minuses underlining the heading. Initially a colleague Arturo told me to replace the ".export report file ..." row with ".export data file ..." but this caused the output to have hidden control characters in it, like these marked in red below ... I assume for Teradata-to-Teradata data transfers.

^P^@^N^@-1||Null Value
^W^@^U^@125|BUSINESS|Business
^[^@^Y^@149|INDIVIDUAL|Individual



He then found that the TITLE could be removed using the (TITLE '') phrase highlighted in the script in bold red below.



#!/bin/ksh -x
# Program: td_bteq_extract_better.ksh
# Description: To extract data from TD via bteq without any headings or "report formatting"
# Version: 1.0 gml 19-Jul-2011 Initial version from Arturo Gonzalez
# -----------------------------------------------------------------------------------------


LOGON_STR="prodtd/glourei1,xxxxxx;"     # Teradata logon acct
TMP=${HOME}/tmp                         # extract directory
FILENAME=test2                          # output will go in ${FILENAME}2.dat
SEP='|'                                 # pipe separator
DATAFILE=${TMP}/${FILENAME}.dat        # extract file


> $DATAFILE                             # Otherwise if run more than once, it will append records to $DATAFILE


bteq >/dev/null 2>&1 <<EOF
.logon $LOGON_STR
.export report file = $DATAFILE         -- .export report file = $DATAFILE will result in column header and header underlining - 2 unwanted rows
.set recordmode off                     -- not sure what this does - leaving it out seems to make no difference
-- an arbitrary example SQL query
SELECT TRIM(COALESCE(Acct_Type_Id,-99)) || '${SEP}' ||   -- Without the TRIM, the numerics will be right justified and have leading blanks
       TRIM(COALESCE(Acct_Type_Cd,'??')) || '${SEP}' ||
       TRIM(COALESCE(Acct_Type_Name,'??')) (TITLE '')
FROM nuc_user_view.acct_type
ORDER BY Acct_Type_id;
.export reset
.logoff
.quit
EOF