ssh -c arcfour
If using SSH (scp/sftp/rsync with ssh), you can achieve speed enhancements using "-c arcfour" (sacrificing a little security - might be ok in-house e.g.). See notes re SSH from Charles Martin Reid's wiki.
Example using rsync
rsync can sync entire directory structures but this script needed data positioned in a certain way. rsync can do loads and is a good starting point ...
This script could/should be rewritten to make more use of rsync features.
#!/bin/ksh
eval $@
PUBKEY=${HOME}/.ssh/mykey.pub
svrname=`uname -n | cut -c1-8`
srcdir=/mysrcdir
sftpUsr=remuser
prisftpserver=remsvr
remdir=/remdestdir
cd ${srcdir}
START_DAY=${START_DAY:-`date --date="1 days ago" +%Y%m%d`}
END_DAY=${END_DAY:-`date --date="1 days ago" +%Y%m%d`}
DAY=${START_DAY}
while [ $DAY -le $END_DAY ]
do
echo "Starting DAY=$DAY ..."
echo "`date +'%Y/%m/%d %H:%M:%S'`|Start|${DAY}"
# Try and create the directory - it may have already be created
ssh -i ${PUBKEY} -q ${sftpUsr}@${prisftpserver} "mkdir ${remdir}/${DAY}; chmod 777 ${remdir}/${DAY}"
# replace
rsync -av --rsync-path=/opt/sfw/bin/rsync --rsh="ssh -i ${PUBKEY}"
echo "`date +'%Y/%m/%d %H:%M:%S'`|Complete|${DAY}"
DAY=$(($DAY+1))
done
Example not using rsync
#!/bin/ksh
# script built by several hence slightly different formatting stds used :(
eval $@
PUBKEY=${HOME}/.ssh/mykey.pub
svrname=`uname -n | cut -c1-8` # local server
srcdir=/src_logs # replace with location of source data files
sftpUsr=remuser # replace with remote user
prisftpserver=remserver # replace with remote server
remdir=/rem_logs # replace with location of destination directory
cd ${srcdir}
# this example caters for daily logfiles
START_DAY=${START_DAY:-`date --date="1 days ago" +%Y%m%d`}
END_DAY=${END_DAY:-`date --date="1 days ago" +%Y%m%d`}
DAY=${START_DAY}
while [ $DAY -le $END_DAY ]
do
echo "Starting DAY=$DAY ..."
# Try and create the directory - it may have already be created
ssh -i ${PUBKEY} -q ${sftpUsr}@${prisftpserver} "mkdir ${remdir}/${DAY}; chmod 777 ${remdir}/${DAY}"
for filename in `ls -1
do
base_filename=`basename ${filename} .gz`
dir_filename=`dirname ${filename}`
scp_count=0
scp_error=1
while [ $scp_error -ne 0 ] && [ $scp_count -le 2 ] # give up after 3 scp attempts
do
scp_count=$(($scp_count+1))
echo "`date +'%Y/%m/%d %H:%M:%S'`|Started (${scp_count})|$filename|${base_filename}.gz"
# throttle speed to 1M with 120sec timeout to handle hanging scp's
scp -i ${PUBKEY} -l100000 -o ConnectTimeout=120 -q ${filename} ${sftpUsr}@${prisftpserver}:${remdir}/${DAY}/${svrname}_${dir_filename}_${base_filename}.gz
# use arcfour cipher which is faster but less secure with 120sec timeout to handle hanging scp's
#scp -i ${PUBKEY} -c arcfour -o ConnectTimeout=120 -q ${filename} ${sftpUsr}@${prisftpserver}:${remdir}/${DAY}/${svrname}_${dir_filename}_${base_filename}.gz
scp_error=$?
done
echo "`date +'%Y/%m/%d %H:%M:%S'`|Complete|${filename}|${base_filename}.gz"
done
DAY=$(($DAY+1))
done
Streaming data
Flume
Scribe
Storm
S4
TBC
No comments:
Post a Comment