GML blog: 2011

Friday, 30 December 2011

Korn shell wrapper script for a Teradata FastExport

Thanks to Euan Murray for this improvement on my slow bteq extract.
Uses Teradata FastExport fexp and reduced the extract time from hours to minutes.
Posted here as example wrapper shell script for a Teradata FastExport.
N.B. Caution: remember to coalesce any fields that can yield null. Also the comma can cause problems

#!/bin/ksh
# set -x
# Program: td_fexp_shell_wrapper_eg.ksh
# Description: Example shell wrapper script to fast export lots of data from Teradata
# Version: 0.1 glour 15-Jul-2011 Initial version
# Version: 0.2 emurray1 14-Oct-2011 Altered to be fexp
# -------------------------------------------------------------------------

PRG=`basename $0 .ksh` # i.e. PRG="td_fexp_shell_wrapper_eg"
eval $@ # evaluate any command line args

LOGON_STR="prodtd/user,passwd;" # the TD username and passwd
LOCATION=/output_path/ # the location of the output file
DATAFILE=${LOCATION}/${PRG}.dat # the output file
DATA_DATE=$(date +%Y-%m-%d -d "$(date +%d) days ago") # gets the last day of previous month
echo ${DATA_DATE}
DEBUG=0 # for debugging set to 1

>$DATAFILE # empty the output file

fexp > /dev/null 2>&1 <
.logon ${LOGON_STR}
.LOGTABLE DATABASE.FExp_Log;
.BEGIN EXPORT SESSIONS 48;
.EXPORT OUTFILE ${DATAFILE};
SELECT ',' ||
a.aaaa1 || '|' ||
trim(a.aaaa2) ||'|'||
coalesce(b.bbbb1,'??') ||'|'||
coalesce(c.cccc1,'??') ||'|'||
... more fields here ...
cast (c.cccc2_dt as date format 'YYYYMMDD') (TITLE '')
FROM db1.table_a a, db2.table_b b, db3.table_c c
WHERE a.aaaa1 = b.bbbb1
AND b.bbbb4 = c.cccc3
AND '${DATA_DATE}' between a.aaaa_from_Dt and st.aaaa_to_Dt
AND ... other clauses ...;
.END EXPORT;
.LOGOFF;
EOF
RET_CODE=$?
if [[ ${RET_CODE} -ne 0 ]]
then
echo fast export failed investigate before proceeding
return
fi

# clean up the extra control chars in fast export output file
echo removing fast load extra chars
sed -r 's@^[[:cntrl:][:print:]]*,@@g' ${DATAFILE} > ${LOCATION}/${PRG}.${DATA_DATE}.dat
RET_CODE=$?
if [[ ${RET_CODE} -ne 0 ]]
then
echo cleanse failed investigate before proceeding
exit
else
echo Cleanse complete removing original file
rm ${DATAFILE}
fi

Tuesday, 13 December 2011

Perl TCP Socket Programs

3 small pieces of perl code for the syntax

listener.pl - server socket script to listen and consume a stream of ASCII lines from a socket
producer.pl - client socket script to produce/write a stream of XML (ASCII lines) supplied in a file onto a socket
consumer.pl - client socket script to consume a stream of XML ASCII lines from a socket

listener.pl - simple server

#!/usr/bin/perl
use IO::Socket;
use POSIX qw/strftime/;

$|=1;

my $sock = new IO::Socket::INET (
LocalHost => 'localhost', # rem to change to server ip addr if not running on same machine as the client
LocalPort => '5555',
Proto => 'tcp',
Listen => 1,
Reuse => 1,
);
die "Could not create socket: $!\n" unless $sock;

$count = 0;
$num_recs = 0;
$num_files = 1;
$max_recs = 30000;
$path = "/data/";
$file_root = "test";
$datetime = strftime('%Y%m%d%H%M%S',localtime);
open OUT, "|gzip -c > ${path}${file_root}_${datetime}_${num_files}.gz" or die "unable to create OUT";

my $new_sock = $sock->accept();

while(<$new_sock>) {
$line = $_;
#print "line: $count: $num_recs : $line";
$count++;
if (m#
{
print OUT $line;
$num_recs++;
if (($num_recs % $max_recs) == 0)
{
#print "in reset : $count : $num_recs \n";
close(OUT);
$num_files++;
$datetime = strftime('%Y%m%d%H%M%S',localtime);
open OUT, "|gzip -c > ${path}${file_root}_${datetime}_${num_files}.gz" or die "unable to create OUT";
}
}
else
{
#print "$count : $num_recs : in else\n";
print OUT $line;
}
}
close (OUT);
close($sock);

producer.pl - simple producer client

#!/usr/bin/perl

use IO::Socket;

my $sock = new IO::Socket::INET (
PeerAddr => 'localhost', # rem to chg to server ip addr if not running socket client and server on same server
PeerPort => '5577',
Proto => 'tcp',
);
die "Could not create socket: $!\n" unless $sock;

open IN, "xml.dump" or die "unable to open IN";

$i = 0;
print "before while\n";
while ()
{
$i++;
#print "in loop: $i \n";
print $sock "$_";
}

close(IN);
close($sock);

consumer.pl - client consumer/reader of socket

#!/usr/bin/perl

use IO::Socket;
use POSIX qw/strftime/;
use File::Path;

my $sock = new IO::Socket::INET (
PeerAddr => 'localhost', # rem to replace with svr ip if not on same machine as socket server
PeerPort => '1099',
Proto => 'tcp',
);
die "Could not create socket: $!\n" unless $sock;

$|=1;

$count = 0;
$num_recs = 0;
$num_files = 1;
$max_recs = 60000;
$path = "/data";
$file_root = "test";
$datetime = strftime('%Y%m%d%H%M%S',localtime);
$yyyymmdd = strftime('%Y%m%d',localtime);
unless (-d "${path}${yyyymmdd}")
{
mkpath("${path}/${yyyymmdd}") or die "Unable to mkpath(${path}${yyyymmdd}) ($!)\n";
}
open OUT, "|gzip -c > ${path}${yyyymmdd}/${file_root}_${datetime}_${num_files}.gz" or die "unable to create OUT";

# client read from socket
while(<$sock>) {
$line = $_;
#print "line: $count: $num_recs : $line";
$count++;
if (m#
{
print OUT $line;
$num_recs++;
if (($num_recs % $max_recs) == 0)
{
#print "in reset : $count : $num_recs \n";
close(OUT);
$num_files++;
$datetime = strftime('%Y%m%d%H%M%S',localtime);
$yyyymmdd = strftime('%Y%m%d',localtime);
unless (-d "${path}${yyyymmdd}")
{
mkpath("${path}${yyyymmdd}") or die "Unable to mkpath(${path}${yyyymmdd}) ($!)\n";
}
open OUT, "|gzip -c > ${path}${yyyymmdd}/${file_root}_${datetime}_${num_files}.gz" or die "unable to create OUT";
}
}
else
{
#print "$count : $num_recs : in else\n";
print OUT $line;
}
}
close (OUT);
close($sock);

Friday, 2 December 2011

Useful links etc

TBC - just a bunch of useful links

Jonathan Ellis' Linux Performance Basics http://spyced.blogspot.com/2010/01/linux-performance-basics.html

Port Forwarding

ssh -L localport:localhost:remoteport remotehost

Hadoop rebuild

rm -rf /data/hdfs
rm -rf /data/tmpd_hdfs
hadoop namenode -format
start-all.sh

Emailing attachment on Linux (CentOS 6.1)

mailx -s "Example send PDF file" -a mypdf.pdf myemailaddress@mydomain.com <
pdf test mail
EOF

Other info

Solaris

Check for FC

# fcinfo hba-port
No Adapters Found

# fcinfo hba-port|grep -i wwn
HBA Port WWN: 2100001b321c25ba
        Node WWN: 2000001b321c25ba
HBA Port WWN: 2101001b323c25ba
        Node WWN: 2001001b323c25ba
HBA Port WWN: 2100001b321c08b9
        Node WWN: 2000001b321c08b9
HBA Port WWN: 2101001b323c08b9
        Node WWN: 2001001b323c08b9

pgp (Network Associates Freeware version)

To view keys on keyring
/opt/PGP/pgp -kv

To add key to keyring
/opt/PGP/pgp -ka

To edit the trust level of the 's key
/opt/PGP/pgp -ke [keyring]

To pgp encrypt a bunch of files (in this example a directory full of *.gz files
userid=xxxxxx # the userid associated with the recipient's public key
for f in `ls *.gz`
do
echo $f
if [ ! -f ${f}.pgp ]
then
/opt/PGP/pgp -e $f $userid
if [ $? -ne 0 ]
then
echo "ERROR: Unable to pgp encrypt file: $f"
exit 1
fi
fi
done

Stop/start Solaris service - e.g. httpd

svcadm -v enable /network/http:apache2
svcadm -v disable /network/http:apache2

Swappiness in Linux

See Scott Alan Miller's (SAM's) article on swappiness
He says ...
"On a latency sensitive system or a system where disk IO is at a premium, lowering this number is generally advisable".
So for hadoop which is typically disk IO centric, you want to lower this - even set it to 0.
On Linux system run:
sysctl vm.swappiness
or
grep vm.swappiness /etc/sysctl.conf
To set to 0:
sysctl vm.swappiness=0
or

echo "vm.swappiness = 0" >> /etc/sysctl.conf

For virtualised system he recommends setting to 10.
And to profile performance before and after the change.

Tuesday, 19 July 2011

Simple Tara Teradata Netbackup backup shell script - handles multiple Jobs

Here is a simple script to run Tara Teradata Netbackup backups.

In the example below, I am backing up 3 jobs in 2 batches as follows:

On its own

2650_FullBackup_6Stream_4week_ret.arc

The following two in parallel

2650_FullBackup_2Stream_4week_ret.arc
TD_5550_FULLBACKUP_4STR.arc

Run this from the command line like this:
/usr/local/bin/run_tara_job.ksh DEBUG=0 JOB_NAME=2650_FullBackup_6Stream_4week_ret.arc#:2650_FullBackup_2Stream_4week_ret.arc:TD_5550_FULLBACKUP_4STR.arc# >/var/misc/log/run_tara_job.log 2>&1

Or set up a cron job like this:
0 19 19 7 * /usr/local/bin/run_tara_job.ksh DEBUG=0 JOB_NAME=2650_FullBackup_6Stream_4week_ret.arc#:2650_FullBackup_2Stream_4week_ret.arc:TD_5550_FULLBACKUP_4STR.arc# >/var/misc/log/run_tara_job.log 2>&1

Note - you will need the Job Names from the Tara scripts (typically find these in /opt/teradata/tara/server/sr directory) or from the Tara GUI.

#!/bin/ksh -x
# Program: run_tara_job.ksh
# Description: Run Tara backup job.
# Parameters: JOB_NAME=
# Version: 0.1 gml 19-Jul-2011 Initial version

SEP=":"
SUBSEP="#" # take care not to pick a delimiter that interferes with greps
DEBUG=0
eval $@

BKP_ROOT_DIR=/opt/teradata/tara/server/bin
TARA_BKP_PASSWD_FILE=/opt/teradata/tara/server/bin/tara.pwd

if [ -z "$JOB_NAME" ]
then
echo "ERROR: No JOB_NAME arg. It is mandatory"
exit 1
fi

JOB_NAME=`echo $JOB_NAME | tr "$SEP" " "`

cd $BKP_ROOT_DIR

#exit 0

for JOB in `echo $JOB_NAME`
do
WAIT_FLAG=`echo $JOB | grep -c "\${SUBSEP}"`
if [ $WAIT_FLAG -gt 0 ]
then
JOB=`echo $JOB | tr -d "$SUBSEP"`
echo $JOB
fi
echo "Starting job $JOB at `date`"
# note -w will wait for tara job to complete fully
if [ $DEBUG -eq 0 ]
then
./taralaunch -u administrator -e $TARA_BKP_PASSWD_FILE -j $JOB -w &
else
echo "./taralaunch -u administrator -e $TARA_BKP_PASSWD_FILE -j $JOB -w" &
fi
if [ $WAIT_FLAG -gt 0 ]
then
wait
fi
done

How to extract (CSV) data from Teradata via bteq script wrapped in a Unix shell script

There is a better, far simpler way to extract data from Teradata than my previous blog entry.
Ignore it. Use this one.

The example example below extracts data from a Teradata table or view into a pipe delimited file via bteq in a Unix shell script.

I had a serious hassle eliminating the first two lines of output - i.e. the column heading and the minuses underlining the heading. Initially a colleague Arturo told me to replace the ".export report file ..." row with ".export data file ..." but this caused the output to have hidden control characters in it, like these marked in red below ... I assume for Teradata-to-Teradata data transfers.

^P^@^N^@-1||Null Value
^W^@^U^@125|BUSINESS|Business
^[^@^Y^@149|INDIVIDUAL|Individual

He then found that the TITLE could be removed using the (TITLE '') phrase highlighted in the script in bold red below.

#!/bin/ksh -x
# Program: td_bteq_extract_better.ksh
# Description: To extract data from TD via bteq without any headings or "report formatting"
# Version: 1.0 gml 19-Jul-2011 Initial version from Arturo Gonzalez
# -----------------------------------------------------------------------------------------

LOGON_STR="prodtd/glourei1,xxxxxx;" # Teradata logon acct
TMP=${HOME}/tmp # extract directory
FILENAME=test2 # output will go in ${FILENAME}2.dat
SEP='|' # pipe separator
DATAFILE=${TMP}/${FILENAME}.dat # extract file

> $DATAFILE # Otherwise if run more than once, it will append records to $DATAFILE

bteq >/dev/null 2>&1 <<EOF
.logon $LOGON_STR
.export report file = $DATAFILE -- .export report file = $DATAFILE will result in column header and header underlining - 2 unwanted rows
.set recordmode off -- not sure what this does - leaving it out seems to make no difference
-- an arbitrary example SQL query
SELECT TRIM(COALESCE(Acct_Type_Id,-99)) || '${SEP}' || -- Without the TRIM, the numerics will be right justified and have leading blanks
TRIM(COALESCE(Acct_Type_Cd,'??')) || '${SEP}' ||
TRIM(COALESCE(Acct_Type_Name,'??')) (TITLE '')
FROM nuc_user_view.acct_type
ORDER BY Acct_Type_id;
.export reset
.logoff
.quit
EOF

GML blog