Thursday 30 May 2013

Overview of an Oracle DB

Quick overview of an Oracle database (these are from a few years back)

-- Oracle files details
select * from dba_data_files
select * from v$logfile
select * from v$log
select * from v$tempfile
select * from v$loghist
-- check the amt of redologging

-- Oracle database & instance details & parameter settings
select * from v$database
select * from v$instance
select * from v$parameter

-- #objects x owner & object_type
select owner, object_type, count(*)
from dba_objects
group by owner, object_type

-- #segments (count) & size (GB) x schema x segment type
select owner, segment_type, count(*), sum(bytes)/1024/1024/1024
from dba_segments
group by owner, segment_type
order by owner, segment_type

-- #segments (count) & size (GB) x schema x segment type per table
select owner, segment_type, count(*), sum(bytes)/1024/1024/1024
from dba_segments
where segment_name like '%%'
group by owner, segment_type
order by 4 desc

-- Listing of schema, segment type & bytes (per table commented out)
select owner, segment_type, segment_name, bytes
from dba_segments
--where segment_name like '%%'
order by 4 desc

-- other useful views
dba_jobs -- scheduled jobs
dba_mviews -- see all 9i mview dictionary views here

Thursday 23 May 2013

Graph Analysis using MapReduce

Graph Analysis - Social Network Analysis

Was at a Teradata CTO day today and learnt about their new graph analysis capability in Asterdata which had skipped me by.

I wonder whether this is not development based on the same Yahoo seed that has grown Giraph.

Time to play with this ...

Friday 3 May 2013

Hadoop MapReduce jobs hanging (after adding new nodes to cluster)

Interesting Hadoop issue

Background

We have been running a proof of concept (PoC) CDH3u3 cluster of 9 data nodes running CentOS 6.1.
Now we want to go production and don't have the time to upgrade to CDH4.2 ;-)
We were adding new data nodes running CentOS 6.3 with the same version of CDH3u3.
At the same time we took the time to update the topology of the cluster for these new nodes.
We added the rack they were in into the topology file.
We left the PoC cluster with their default rack locations.

Symptoms

We could load data into the cluster no problem.
But when we ran queries, they would run for a while and then stop.
The number of occupied Map Slots would be 0 but there were plenty of free slots.

We saw some errors/warning in the logs and they were certainly not obvious.

Hadoop jobtracker log warnings/errors

Example warnings/errors in namenode jobtracker logfile (maybe related)


2013-05-01 10:11:11,984 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'KILLED' for 'attempt_201305010958_0001_m_000174_1' of TIP 'task_201305010958_0001_m_000174'oldTT=tracker_:localhost/127.0.0.1:54000 while newTT=tracker_:localhost/127.0.0.1:54000
2013-05-01 10:11:11,984 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201305010958_0001_m_000220_1' to tip task_201305010958_0001_m_000220, for tracker 'tracker_:localhost/127.0.0.1:54000'
2013-05-01 10:11:11,984 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException: java.lang.NullPointerException
2013-05-01 10:11:11,984 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 8021, call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@17b0b765, false, false, true, 2320) from 10.173.226.117:53514: error: java.io.IOException: java.lang.NullPointerException
java.io.IOException: java.lang.NullPointerException

Example odd  entries in namenode tasktracker logfile (maybe related)


2013-04-30 00:19:22,760 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to '' with reponseId '6549
2013-04-30 00:19:22,764 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to '' with reponseId '6549
2013-04-30 00:19:22,767 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to '' with reponseId '6549
2013-04-30 00:19:22,770 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to '' with reponseId '6549
2013-04-30 00:19:22,773 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to '' with reponseId '6549
2013-04-30 00:19:22,777 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to '' with reponseId '6549


Solution

Adding the rack locations of all the cluster nodes in the topology file did the trick.