GML blog: 2013/05

Thursday, 30 May 2013

Overview of an Oracle DB

Quick overview of an Oracle database (these are from a few years back)

-- Oracle files details

select * from dba_data_files

select * from v$logfile

select * from v$log

select * from v$tempfile

select * from v$loghist

-- check the amt of redologging

-- Oracle database & instance details & parameter settings

select * from v$database

select * from v$instance

select * from v$parameter

-- #objects x owner & object_type

select owner, object_type, count(*)

from dba_objects

group by owner, object_type

-- #segments (count) & size (GB) x schema x segment type

select owner, segment_type, count(*), sum(bytes)/1024/1024/1024

from dba_segments

group by owner, segment_type

order by owner, segment_type

-- #segments (count) & size (GB) x schema x segment type per table

select owner, segment_type, count(*), sum(bytes)/1024/1024/1024

from dba_segments

where segment_name like '%%'

group by owner, segment_type

order by 4 desc

-- Listing of schema, segment type & bytes (per table commented out)

select owner, segment_type, segment_name, bytes

from dba_segments

--where segment_name like '%%'

order by 4 desc

-- other useful views

dba_jobs -- scheduled jobs

dba_mviews -- see all 9i mview dictionary views here

Thursday, 23 May 2013

Graph Analysis using MapReduce

Graph Analysis - Social Network Analysis

Was at a Teradata CTO day today and learnt about their new graph analysis capability in Asterdata which had skipped me by.

I wonder whether this is not development based on the same Yahoo seed that has grown Giraph.

Time to play with this ...

Friday, 3 May 2013

Hadoop MapReduce jobs hanging (after adding new nodes to cluster)

Interesting Hadoop issue

Background

We have been running a proof of concept (PoC) CDH3u3 cluster of 9 data nodes running CentOS 6.1.
Now we want to go production and don't have the time to upgrade to CDH4.2 ;-)
We were adding new data nodes running CentOS 6.3 with the same version of CDH3u3.
At the same time we took the time to update the topology of the cluster for these new nodes.
We added the rack they were in into the topology file.
We left the PoC cluster with their default rack locations.

Symptoms

We could load data into the cluster no problem.
But when we ran queries, they would run for a while and then stop.
The number of occupied Map Slots would be 0 but there were plenty of free slots.

We saw some errors/warning in the logs and they were certainly not obvious.

Hadoop jobtracker log warnings/errors

Example warnings/errors in namenode jobtracker logfile (maybe related)

2013-05-01 10:11:11,984 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'KILLED' for 'attempt_201305010958_0001_m_000174_1' of TIP 'task_201305010958_0001_m_000174'oldTT=tracker_:localhost/127.0.0.1:54000 while newTT=tracker_:localhost/127.0.0.1:54000
2013-05-01 10:11:11,984 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201305010958_0001_m_000220_1' to tip task_201305010958_0001_m_000220, for tracker 'tracker_:localhost/127.0.0.1:54000'
2013-05-01 10:11:11,984 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException: java.lang.NullPointerException
2013-05-01 10:11:11,984 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 8021, call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@17b0b765, false, false, true, 2320) from 10.173.226.117:53514: error: java.io.IOException: java.lang.NullPointerException
java.io.IOException: java.lang.NullPointerException

Example odd entries in namenode tasktracker logfile (maybe related)

2013-04-30 00:19:22,760 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to '' with reponseId '6549
2013-04-30 00:19:22,764 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to '' with reponseId '6549
2013-04-30 00:19:22,767 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to '' with reponseId '6549
2013-04-30 00:19:22,770 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to '' with reponseId '6549
2013-04-30 00:19:22,773 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to '' with reponseId '6549
2013-04-30 00:19:22,777 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to '' with reponseId '6549

Solution

Adding the rack locations of all the cluster nodes in the topology file did the trick.

GML blog

Thursday, 30 May 2013

Overview of an Oracle DB

Thursday, 23 May 2013

Graph Analysis using MapReduce

Friday, 3 May 2013

Hadoop MapReduce jobs hanging (after adding new nodes to cluster)

Interesting Hadoop issue

Blog Archive

About Me