Monday, 20 June 2011

Oracle database backup, restore and recovery - notes to self

Backup, Restore and Recovery with Oracle 
(Based on notes from Ganesh's friend largely based on Oracle docs)


Backup
A database backup is is a copy of the data stored in a database. There are two types of backups: physical backups and logical backups. 
A physical backup is a copy of the files that belong to a database at a specific point in type. It can either be a cold backup or hot backup. 
A logical backup is does not copy the files, instead, it extracts the data into a specific file format (for example with exp.) 
A backup is needed to restore the data in a database if it gets damaged. 
Some structural changes as well as direct path load require a backup. 


Consistent and inconsistent backups


Consistent backup
A consistent backup exhibits the following three properties: 
All headers of datafiles that belong to a writable tablespaces have the same checkpoint SCN. 
These datafiles don't have any changes past this checkpoint SCN. 
Lastly, The SCNs of the datafile headers match the checkpoint information in the controlfiles. 


Inconsistent backup
An inconsistent backup is where some files contain changes that were made after the files were checkpointed. A recovery is needed in order to make the backup consistent. 
An inconsistent backup is created by a hot backup. Also, if the database crashed (or was shutdown abort) and then the backup was made. 


Restore
A restore is the copying of backed up data to their original location in order to recover the database. 
Usually, a file is restored after a media failure. However, there are other situations where a file is restored, for example a point in time recovery. 


Recovery
Recovery is the process of bringing the database "up to date" since the last backup (whose data files would have been restored). All database changes are applied to the restored database files by "rolling forward" on the archived redolog files. 
It is also possible to recover (incomplete) if the current redo log is lost or if the controlfiles are lost. 
In rare cases consider applying the unarchived redolog file/s in the recovery process if it makes sense.


Types of recovery
There are four types of recovery: 
• Media/Datafile media recovery 
• Crash recovery 
Oracle performs an automatic crash recovery when the instance is started after a shutdown abort or after a crash. 
A crash recovery 'only' uses the online redo log to recover online datafiles. 
Crash recovery is the recovery of a database in a single-instance configuration or an Oracle Real Application Clusters configuration in which all instances have crashed. In contrast, instance recovery is the recovery of one failed instance by a live instance in an Oracle Real Application Clusters configuration.
• Instance recovery 
Instance Recovery is only used in a RAC environment. It is the application of online redo log by one instance that detected the crash of another instance.
• Disaster recovery 


There are at least five options to be considered for disaster recovery when running Oracle: 
• OPS (Oracle Parallel Server) 
• Replication 
• Standby databasess 
• OS oriented solutions 
• Architectural solutions 


A recovery (if it is not a crash recovery) can either be a 
• Complete recovery, or an 
• Incomplete recovery (Also known as Point in time recovery (PITR), or database point in time recovery (DBPITR))
Not all redo generated since the last backup is applied in a incomplete recovery. It is also called database point in time recovery (DBPITR). 
Flashback can sometimes be used as an alternative to point in time recovery


Performing backup and recovery


Backups and recovery can be performed either 
• with RMAN (the recovery manager) or 
• using SQL*Plus and operating system file copy commands (such as cp on Unix and copy on Windows). 


recover syntax


recover database until cancel
recover database until time '2004-03-21:22:59:04'
recover database until change 123456


recover datafile 'filename' until cancel
recover datafile 'filename' until time '2004-03-21:22:59:04'
recover datafile 'filename' until change 123456


recover tablespace ts_name until cancel
recover tablespace ts_name until time '2004-03-21:22:59:04'
recover tablespace ts_name until change 123456


recover database using backup controlfile


Until time
Performs a incomplete recovery (=Point in time recovery). 
The format of the time is 'YYYY-MM-DD:HH24:MI:SS' 


Until change
Performs a incomplete recovery. 
The number that follows until change is an SCN. In order to recover up to and including SCN 999, use recover until change 1000. 


Standby database
Recovers a standby database. 


Using backup controlfile
Don't use the current controlfiles, use backed up control files instead. 


Useful SQL to run in a recovery scenario


select substr(name, 1, 50), status from v$datafile;
select substr(name,1,40), recover, fuzzy, checkpoint_change#
select name, open_mode, checkpoint_change#, ARCHIVE_CHANGE# from v$database;
select GROUP#,THREAD#,SEQUENCE#,MEMBERS,ARCHIVED,STATUS,FIRST_CHANGE# from v$log;
select GROUP#,substr(member,1,60) from v$logfile;
select * from v$recover_file;
select distinct status from v$backup;
select  hxfil FILENUMBER,fhsta STATUS,fhscn SCN,fhrba_Seq SEQUENCE from x$kcvfh;
select fhsta, count(*) from X$KCVFH group by fhsta;
select * from v$log;
select * from v$logfile;

Thursday, 9 June 2011

SQL syntax for banded count of counts

This is a simple example of SQL syntax for banded count of counts (works on Teradata 12.0)


select case when a.num_occ > 10000 then 'a. > 10000'
                      when a.num_occ > 5000    then 'b. > 5000'
                      when a.num_occ > 2000    then 'c. > 2000'
                      when a.num_occ > 1000    then 'd. > 1000'
                      when a.num_occ > 500     then 'e. > 500'
                      when a.num_occ > 250     then 'f. > 250'
                      when a.num_occ > 100     then 'g. > 100'
                      when a.num_occ > 50      then 'h. > 50'
                      when a.num_occ > 10      then 'i. > 10'
                      when a.num_occ > 1       then 'j. >1'
                      else 'k. = 1'
                      end
            , count(*), sum(a.num_occ)
from (select id, count(*) as num_occ
          from sandbox.events
          group by id) a
group by 1
order by 1 desc

Friday, 3 June 2011

Teradata data migration from one Teradata system to another using arcmain

Problem

Need to migrate data from Teradata system SRC (i.e. SouRCe) to Teradata system TGT (i.e. TarGeT).

Approach

Create a pair of named pipes called $SRC_FIFO and $TGT_FIFO.
Link these together by cat'ing $SRC_FIFO and outputting into $TGT_FIFO and running in the background like this:

cat $SRC_FIFO > $TGT_FIFO &

The Producer for $SRC_FIFO is the arcmain process extracting data from SRC.
The Consumer for $SRC_FIFO is the cat command above i.e. cat $SRC_FIFO

The Producer for $TGT_FIFO is the input to $TGT_FIFO from the cat command above i.e. > $TGT_FIFO.
The Consumer for $TGT_FIFO is the arcmain process copying data into TGT.

Supporting Files

A single file is required with a list of tables and their associated databases in the SRC system with the following format:

databaseX.tableA
databaseY.tableB
...
databaseZ.tableC

Note in this version - there should not be no whitespace on any line in the file.
In the code, this file is referenced as $ARC_OBJECT_FILE.

Command line

To run the code, use something like:

./mig_test2dev.ksh ARC_OBJ_LIST=/tmp/example_archive_file_list 2>&1 | tee mig_test2dev.log

where the list of objects to be migrated is in the file /tmp/example_archive_file

Code

The code can be found at the end of this post.

Potential Problems

There is a reasonable desire to improve the throughput of both arcmain processes by increasing the SESSIONS=NN argument. I have had "general communication" errors when setting this argument to greater values.

Future improvements

This script was coded quickly and there are parts that need improvements - esp to make it generic for running any environment.
  • Allow whitespace in the file with the list of database tables (see example below)
  • Add break-point to a staging area to support migration across remote sites
    • Add compression support to arcmain procedures
    • Add chunking of arcmain archives (breaking into xGB chunks)
    • Adding copy across a network via SSH
  • Add check to determine whether the db.table already exists in the TGT system
    • If so, allow option of suffixing with date YYYYMMDD or NNN version number
  • GUI - there is already a product to do this - so probably not
Use and Support

Happy for the code to be used (at your own risk).
There is no support for this code - though would be happy to assist if able.

Code Listing


$ cat mig_test2dev.ksh
#!/bin/ksh
# Program: mig_test2dev.ksh
# Description: Teradata data migration script.
# Moves data between TD system SRC (ie SouRCe) and TD system TGT (ie TarGeT)
# Version: 1.0 gml 31-May-2011 Initial version
# ---------------------------------------------------------------------------------------

err_msg() {

echo "ERROR: in $PRG - $1"
echo "usage: mig_test2dev.ksh ARC_OBJ_LIST="
echo "example: ./mig_test2dev.ksh ARC_OBJ_LIST=/tmp/example_archive_file_list 2>&1 | tee mig_test2dev.log"
exit 1

}

# Remove remnant FIFOs
cleanup_fifo() {

if [ -n "$DEBUG" ]
then
set -x
fi

SRC_OR_TGT=$1
# clean up existing ARC_FIFO
if [ -p "${ARC_FIFO}_${SRC_OR_TGT}" ]
then
rm ${ARC_FIFO}_${SRC_OR_TGT}
fi

}

# create the arcmain scripts
create_arcmain_script() {

if [ -n "$DEBUG" ]
then
set -x
fi

SRC_OR_TGT=$1
THE_ARCMAIN_SCRIPT=${ARCMAIN_SCRIPT}_${NOW}_${SRC_OR_TGT}.arc
TMP_ARC_SCRIPT=${ARC_OBJ_LIST}.$$ # To create a temp script file
THE_ARC_FIFO=${ARC_FIFO}_${SRC_OR_TGT} # The FIFO is named in the arcmain script (instead of a filename)

if [ "$SRC_OR_TGT" = "SRC" ]
then
LOGON_STR="${TD_SRC}/${TD_SRC_LOGON}"
ARC_CMD="ARCHIVE"
cat ${ARC_OBJ_LIST} | awk -F'.' ' { printf("(%c%s%c.%c%s%c),\n",34,$1,34,34,$2,34) } ' > ${TMP_ARC_SCRIPT}
else # [ "$SRC_OR_TGT" = "TGT" ]
LOGON_STR="${TD_TGT}/${TD_TGT_LOGON}"
ARC_CMD="COPY"
cat ${ARC_OBJ_LIST} | awk -F'.' -v v_tgt_db=$TGT_DB -v v_tgt_tab_suffix=$TGT_TAB_SUFFIX ' { printf("(%c%s%c.%c%s%s%c) (FROM(%c%s%c.%c%s%c)),\n",34,v_tgt_db,34,34,$2,v_tgt_tab_suffix,34,34,$1,34,34,$2,34) } ' > ${TMP_ARC_SCRIPT}
fi


# build the arcmain_script from the arcmain_script_master
if [ -n "${TMP_ARC_SCRIPT}" ]
then
cat <${THE_ARCMAIN_SCRIPT}
logon ${LOGON_STR};
${ARC_CMD} DATA TABLES
EOF
cat ${TMP_ARC_SCRIPT} >> ${THE_ARCMAIN_SCRIPT}
cat <>${THE_ARCMAIN_SCRIPT}
RELEASE LOCK,
FILE=${THE_ARC_FIFO};
LOGOFF;
EOF

rm ${TMP_ARC_SCRIPT}

else
err_msg "No TMP_ARC_SCRIPT (${TMP_ARC_SCRIPT}) produced"
fi

}

# Main Program

# evaluate any arguments like X=A Y=B ... Z=C
eval $@

# if the DEBUG flag is set, then show the result/impact of every line
if [ -n "$DEBUG" ]
then
set -x
fi

# set up variables
PRG=`basename $0`
NOW=`date +"%Y%m%d%H%M%S"` # used (with $$) to allow multiple executions of this
TD_DEV=10.172.6.41 # Dev TD system
TD_TEST=10.172.6.71 # Test TD system
TD_SRC=${TD_SRC:-${TD_TEST}}
TD_TGT=${TD_TGT:-${TD_DEV}}
TD_SRC_LOGON=${TD_SRC_LOGON:-"ARCUSER1,xxxx"}
TD_TGT_LOGON=${TD_TGT_LOGON:-"ARCUSER1,yyyy"}
TD_SRC_LOGON_STR="${TD_SRC}/${TD_SRC_LOGON}"
TD_TGT_LOGON_STR="${TD_TGT}/${TD_TGT_LOGON}"
TGT_DB=${TGT_DB:-"ST3NF03_DB01"}
TGT_TAB_SUFFIX=${TGT_TAB_SUFFIX:-"001"}  # use suffix to avoid overwriting - needs to be more sophisticated

if [ -z "${ARC_OBJ_LIST}" ]
then
err_msg "The ARC_OBJ_LIST argument is missing"
fi

if [ ! -f "${ARC_OBJ_LIST}" ]
then
err_msg "ARC_OBJ_LIST (${ARC_OBJ_LIST}) file does not exist"
fi

HOME_DIR=/home/glourei1 # my home dir
SCRIPT_DIR=${HOME_DIR}/bin # script dir
BASE_ARC_OBJ_LIST=`basename ${ARC_OBJ_LIST}`
LOCK_FILE=$SCRIPT_DIR/${PRG}_${BASE_ARC_OBJ_LIST}.lck # lock file to ensure only one of these scripts runs at a time

# add - SRC_/TGT_ARC_FIFO_DIR
ARC_FIFO=FIFO
SRC_ARC_FIFO=${ARC_FIFO}_SRC
TGT_ARC_FIFO=${ARC_FIFO}_TGT
ARCMAIN_SCRIPT=$SCRIPT_DIR/arcmain_script
SRC_ARCMAIN_SCRIPT=${ARCMAIN_SCRIPT}_${NOW}_SRC.arc
TGT_ARCMAIN_SCRIPT=${ARCMAIN_SCRIPT}_${NOW}_TGT.arc
#let Teradata decide - was getting errors using high numbers
#SRC_NUM_SESSIONS="SESSIONS=8"
#TGT_NUM_SESSIONS="SESSIONS=4"

# This ensures only one of this processes runs at a time
if [ -f $LOCK_FILE ]
then
echo "ERROR: Already running $PRG"
echo " Protected by the lock file $LOCK_FILE"
exit 1
else
touch $LOCK_FILE
fi

# cleanup legacy fifos if required
cleanup_fifo SRC
cleanup_fifo TGT

# create the SRC and TGT arcmain scripts
create_arcmain_script SRC
create_arcmain_script TGT

# Set up the FIFOs
# arcmain ARCHIVE script writes as the Producer part of the SRC_ARC_FIFO
# which is Consumed by the cat below and written as the Producer to the
# TGT_ARC_FIFO which is Consumed by the arcmain COPY into the Target (TGT) database
mkfifo $SRC_ARC_FIFO
mkfifo $TGT_ARC_FIFO
cat ${SRC_ARC_FIFO} > ${TGT_ARC_FIFO} &

# run the arcmain archive
echo "Starting source arcmain ..."
arcmain $SRC_NUM_SESSIONS <$SRC_ARCMAIN_SCRIPT && SRC_RC=$? && [ $SRC_RC -ne 0 ] && err_msg "arcmain archive failed [SRC_RC=${SRC_RC}]" &
sleep 30

# run the arcmain copy/restore
echo "Starting target arcmain ..."
arcmain $TGT_NUM_SESSIONS <$TGT_ARCMAIN_SCRIPT && TGT_RC=$? && [ $TGT_RC -ne 0 ] && err_msg "arcmain copy/restore failed [TGT_RC=${TGT_RC}]" &

# need to wait till all the background processes have completed
wait

# clean up
rm $LOCK_FILE

# all's well - fin
exit 0

Thursday, 2 June 2011

Nested SQL Query syntax for Teradata (Teradata syntax error 3707)

In Oracle, a query like this will work fine:

SELECT num_occurrences, COUNT(*)
FROM (SELECT xxxx, COUNT(*) num_occurrences
      FROM mytable
      GROUP BY xxxx)
GROUP BY num_occurrences


In Teradata (v 12 as least), one must alias the nested query like this:


SELECT num_occurrences, COUNT(*)
FROM (SELECT xxxx, COUNT(*) num_occurrences
      FROM mytable
      GROUP BY xxxx) a
GROUP BY num_occurrences

If not you get a message like this ...

3707: Syntax error, expected something like a name or a Unicode delimited identifier or an 'UDFCALLNAME' keyword between ')' and the 'group' keyword


To answer Christi/Christophe's question in the comment below ...

select count(*) from
(select ctm_job, count(*)
from BCRTEC.x_ctm_job
group by ctm_job
having count(*) > 1);


I think  this should just work.
Try this ... (note column alias for count(*) mycount in the inner query and the subquery alias a highlighted in red below) ...

select count(*) from
(select ctm_job, count(*)
mycount
from BCRTEC.x_ctm_job
group by ctm_job
having count(*) > 1)
a