Thursday, January 22, 2009

Oracle Rac Tunning Tips

1-Take AWR or statspack snaps on each instance (at the same time) at regular intervals. If there are severe inter-instance performance issues or hung sessions, you may also want to run the racdiag.sql script from the following note to collect additional RAC specific data: (Note 135714.1 Script to Collect RAC Diagnostic Information (racdiag.sql)
)



2-Monitoring RAC Cluster Interconnect Performance
The most important aspects of RAC tuning are the monitoring and tuning of the global services directory processes. The processes in the Global Service Daemon (GSD) communicate through the cluster interconnects

2-1- (GLOBAL CACHE CR PERFORMANCE)

-- This shows the average latency of a consistent block request.
-- AVG CR BLOCK RECEIVE TIME should typically be about 15 milliseconds depending
--If your CPU has limited idle time and your system typically processes long-running queries, then the latency may be higher. However, it is possible to have an average latency of less than one millisecond with User-mode IPC.
--Latency can be influenced by a high value for the DB_MULTI_BLOCK_READ_COUNT parameter.
--Also check interconnect badwidth, OS tcp settings, and OS udp settings if AVG CR BLOCK RECEIVE TIME is high.
---If some problems exist then Also refer Doc ID: 181489.1 Tuning Inter-Instance Performance in RAC and OPS

set numwidth 20
column "AVG CR BLOCK RECEIVE TIME (ms)" format 9999999.9
prompt '****Interconnect Latency should be lower than 15ms****'
select
b1.inst_id,
b2.value "GCS CR BLOCKS RECEIVED",
b1.value "GCS CR BLOCK RECEIVE TIME",
((b1.value/b2.value) * 10) "AVG CR BLOCK RECEIVE TIME (ms)"
from gv$sysstat b1,
gv$sysstat b2
where
(b1.name = 'global cache cr block receive time' and
b2.name = 'global cache cr blocks received' and
b1.inst_id = b2.inst_id
)
or
(b1.name = 'gc cr block receive time' and
b2.name = 'gc cr blocks received' and
b1.inst_id = b2.inst_id
)
/
2-2-The level of cluster interconnect performance

GCS waits that show how well data is being transferred. The waits that need to be monitored are shown in v$session_wait, v$obj_stats, and v$enqueues_stats. The major waits to be concerned with for RAC are:
* global cache busy
* buffer busy global cache
* buffer busy global cr


SELECT
INST_ID, EVENT, P1 FILE_NUMBER, P2 BLOCK_NUMBER, WAIT_TIME
FROM GV$SESSION_WAIT WHERE
EVENT IN ('buffer busy global cr', 'global cache busy', 'buffer busy global cache')
/

In order to find the object that corresponds to a particular file and block, the following query can be issued for the first combination on the above list:

SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE
FROM DBA_EXTENTS
WHERE FILE_ID = &file_id AND &block
BETWEEN BLOCK_ID AND BLOCK_ID+BLOCKS-1;

Once the objects causing the contention are determined, they should be modified by:
* Reducing the rows per block.
* Adjusting the block size.
* Modifying initrans and freelists.
All of these object modifications reduce the chances of application contention for the blocks in the object. Index leaf blocks are usually the most contended objects in the RAC environment; therefore, using a smaller block size for index objects can decrease intra-instance contention for index leaf blocks.

2-3 -The following wait events indicate that the remotely cached blocks were shipped to the local instance without having been busy, pinned or requiring a log flush and can safely be ignored:
* gc current block 2-way
* gc current block 3-way
* gc cr block 2-way
* gc cr block 3-way


3-GLOBAL CACHE LOCK PERFORMANCE

-- GLOBAL CACHE LOCK PERFORMANCE
-- This shows the average global enqueue get time.
-- Typically AVG GLOBAL LOCK GET TIME should be 20-30 milliseconds. the elapsed
-- time for a get includes the allocation and initialization of a new global enqueue. If the average global enqueue get (global cache get time) or average global enqueue conversion times are excessive, then your system may be experiencing timeouts.
--See the 'WAITING SESSIONS', 'GES LOCK BLOCKERS', 'GES LOCK WAITERS', and 'TOP 10 WAIT EVENTS ON SYSTEM' sections if the AVG GLOBAL LOCK GET TIME is high.

set numwidth 20
column "AVG GLOBAL LOCK GET TIME (ms)" format 9999999.9
prompt '****Average Lock Time should be 20-30 ms ****'
select
b1.inst_id, (b1.value + b2.value) "GLOBAL LOCK GETS",
b3.value "GLOBAL LOCK GET TIME",
(b3.value / (b1.value + b2.value) * 10) "AVG GLOBAL LOCK GET TIME (ms)"
from gv$sysstat b1, gv$sysstat b2, gv$sysstat b3
where
(
b1.name = 'global lock sync gets' and
b2.name = 'global lock async gets' and
b3.name = 'global lock get time' and
b1.inst_id = b2.inst_id and b2.inst_id = b3.inst_id
)
or
(
b1.name = 'global enqueue gets sync' and
b2.name = 'global enqueue gets async' and
b3.name = 'global enqueue get time' and
b1.inst_id = b2.inst_id and b2.inst_id = b3.inst_id
)
/

4-GES LOCK BLOCKERS and GES_LOCK_WAITERS


-- GES LOCK BLOCKERS:
-- This section will show us any sessions that are holding locks that
-- are blocking other users. The inst_id will show us the instance that
-- the session resides on while the sid will be a unique identifier for
-- the session. The grant_level will show us how the GES lock is granted to
-- the user. The request_level will show us what status we are trying to obtain.
-- The lockstate column will show us what status the lock is in. The last column
-- shows how long this session has been waiting.
--
set numwidth 5
column state format a16 tru;
column event format a30 tru;
select /*+ordered */
dl.inst_id, s.sid, p.spid, dl.resource_name1,
decode(substr(dl.grant_level,1,8),'KJUSERNL','Null','KJUSERCR','Row-S (SS)',
'KJUSERCW','Row-X (SX)','KJUSERPR','Share','KJUSERPW','S/Row-X (SSX)',
'KJUSEREX','Exclusive',request_level) as grant_level,
decode(substr(dl.request_level,1,8),'KJUSERNL','Null','KJUSERCR','Row-S (SS)',
'KJUSERCW','Row-X (SX)','KJUSERPR','Share','KJUSERPW','S/Row-X (SSX)',
'KJUSEREX','Exclusive',request_level) as request_level,
decode(substr(dl.state,1,8),'KJUSERGR','Granted','KJUSEROP','Opening',
'KJUSERCA','Canceling','KJUSERCV','Converting') as state,
s.sid, sw.event, sw.seconds_in_wait sec
from gv$ges_enqueue dl, gv$process p, gv$session s, gv$session_wait sw
where blocker = 1
and (dl.inst_id = p.inst_id and dl.pid = p.spid)
and (p.inst_id = s.inst_id and p.addr = s.paddr)
and (s.inst_id = sw.inst_id and s.sid = sw.sid)
order by sw.seconds_in_wait desc
/


-- GES LOCK WAITERS:
-- This section will show us any sessions that are waiting for locks that
-- are blocked by other users. The inst_id will show us the instance that
-- the session resides on while the sid will be a unique identifier for
-- the session. The grant_level will show us how the GES lock is granted to
-- the user. The request_level will show us what status we are trying to obtain.
-- The lockstate column will show us what status the lock is in. The last column
-- shows how long this session has been waiting.

set numwidth 5
column state format a16 tru;
column event format a30 tru;
select /*+ ordered */
dl.inst_id, s.sid, p.spid, dl.resource_name1,
decode(substr(dl.grant_level,1,8),'KJUSERNL','Null','KJUSERCR','Row-S (SS)',
'KJUSERCW','Row-X (SX)','KJUSERPR','Share','KJUSERPW','S/Row-X (SSX)',
'KJUSEREX','Exclusive',request_level) as grant_level,
decode(substr(dl.request_level,1,8),'KJUSERNL','Null','KJUSERCR','Row-S (SS)',
'KJUSERCW','Row-X (SX)','KJUSERPR','Share','KJUSERPW','S/Row-X (SSX)',
'KJUSEREX','Exclusive',request_level) as request_level,
decode(substr(dl.state,1,8),'KJUSERGR','Granted','KJUSEROP','Opening',
'KJUSERCA','Cancelling','KJUSERCV','Converting') as state,
s.sid, sw.event, sw.seconds_in_wait sec
from gv$ges_enqueue dl, gv$process p, gv$session s, gv$session_wait sw
where blocked = 1
and (dl.inst_id = p.inst_id and dl.pid = p.spid)
and (p.inst_id = s.inst_id and p.addr = s.paddr)
and (s.inst_id = sw.inst_id and s.sid = sw.sid)
order by sw.seconds_in_wait desc
/

5-Check db_file_multiblock if interconnect latency is occured

SELECT inst_id,SUBSTR(NAME,1,30) ,SUBSTR(VALUE,1,50) from gv$parameter
where name ='db_file_multiblock_read_count'
/
6-Load distribution between instances (At the moment )


set pagesize 60 space 2 numwidth 8 linesize 132
column service_name format a20 truncated heading 'Service'
column instance_name heading 'Instance' format a10
column stime heading 'Service TimemSec/Call' format 999999999
Select service_name ,
instance_name,
elapsedpercall stime,
cpupercall cpu_time,
dbtimepercall db_time,
callspersec throughput
from gv$instance gvi,
gv$active_services gvas,
gv$servicemetric gvsm
where
gvas.inst_id=gvsm.inst_id
and gvas.name_hash=gvsm.service_name_hash
and gvi.inst_id=gvsm.inst_id
and gvsm.group_id=10
order by
service_name,
gvi.inst_id
/

7-DLM TRAFFIC INFORMATION


prompt '**************************************************************'
prompt 'How many tickets are available in the DLM. '
prompt 'If the TCKT_WAIT columns says "YES" then'
prompt ' we have run out of DLM tickets which could cause a DLM hang. '
prompt 'Make sure that you also have enough TCKT_AVAIL. '
prompt '**************************************************************'
set numwidth 5
select * from gv$dlm_traffic_controller
order by TCKT_AVAIL
/

8-If the interconnect is slow, busy, or faulty, you can look for dropped packets,
retransmits, or cyclic redundancy check errors (CRC). You can use netstat commands
to check the networks. On Unix, check the man page for netstat for a list of options.
Also check the OS logs for any errors and make sure that inter-instance traffic is
not routed through a public network.
With most network protcols, you can use 'oradebug ipc' to see which interconnects
the database is using:
SQL> oradebug setmypid
SQL> oradebug ipc

This will dump a trace file to the user_dump_dest. The output will look something
like this:
SSKGXPT 0x1a2932c flags SSKGXPT_READPENDING info for network 0
socket no 10 IP 172.16.193.1 UDP 43749
sflags SSKGXPT_WRITESSKGXPT_UP info for network 1
socket no 0 IP 0.0.0.0 UDP 0...
So you can see that we are using IP 172.16.193.1 with a UDP protocol.

9-

Under configured network settings at the OS. Check UDP, or other network protocol
settings and tune them. See your OS specific documentation for instructions on how
to do this. If using UDP, make sure the parameters relating to send buffer space,
receive buffer space, send highwater, and receive highwater are set well above the
OS default. The alert.log will indicate what protocol is being used. Example:
cluster interconnect IPC version:Oracle RDG
IPC Vendor 1 proto 2 Version 1.0
Refer to OS manual Changing network parameters to optimal values


10-Database İmport
Install imp.exe as resident image with shared address space
$ install add imp.exe/resident/share=addr
Increase default quotas for BEQ’s mailboxes
$ define/sys ORA_BEQ_MBXSIZ 64000
$ define/sys ORA_BEQ_MBXSBFQ 64000
Set DEFMBXBUFQUO to 64000
Set DEFMBXMXMSG to 64000


11-DBMS_STATS.GATHER_SCHEMA_STATS
Calling gather_schema_stats results in a database server process being created
The server process in not multithreaded
Typically consumes 100% of one CPU
Performance improvement achieved by affinitizing the server process to one CPU and increasing QUANTUM to 20.

12-Analyze the efficiency of sort operations
Determine the number of optimal, one pass and multipass operations


select SUBSTR(NAME,1,30) , SUBSTR(VALUE,1,50) from v$parameter where name ='pga_aggregate_target'
/

SELECT optimal_count, round(optimal_count*100/total, 2)
optimal_perc,
onepass_count, round(onepass_count*100/total, 2)
onepass_perc,
multipass_count, round(multipass_count*100/total, 2)
multipass_perc
FROM
(SELECT decode(sum(total_executions), 0, 1,
sum(total_executions)) total,
sum(OPTIMAL_EXECUTIONS) optimal_count,
sum(ONEPASS_EXECUTIONS) onepass_count,
sum(MULTIPASSES_EXECUTIONS) multipass_count
FROM v$sql_workarea_histogram
WHERE low_optimal_size > 64*1024)
/

SELECT
PGA_TARGET_FACTOR , round(PGA_TARGET_FOR_ESTIMATE/1024/1024) target_mb,
ESTD_PGA_CACHE_HIT_PERCENTAGE cache_hit_perc,
ESTD_OVERALLOC_COUNT
FROM v$pga_target_advice
/

13-size the sga and shared_pool


select sga_size, sga_size_factor as size_factor,
estd_physical_reads as estimated_physical_reads
from v$sga_target_advice order by sga_size_factor;


select
SHARED_POOL_SIZE_FOR_ESTIMATE ,SHARED_POOL_SIZE_FACTOR,
ESTD_LC_TIME_SAVED_FACTOR, ESTD_LC_LOAD_TIME
from V$SHARED_POOL_ADVICE
/

14-Tune the sql 's .Poor SQL or bad optimization paths can cause additional block gets via theinterconnect just as it would via I/O.

15-Set base priority of CRS queue to 12 CRS is running in batch By default, base priority of a batch queue is 4. So set base priority of CRS queue to 12
10-The use of locally managed tablespaces (instead of dictionary managed) with the 'SEGMENT SPACE MANAGEMENT AUTO' option can reduce dictionary and freelist block contention.

select
TABLESPACE_NAME ,EXTENT_MANAGEMENT ,ALLOCATION_TYPE ,
SEGMENT_SPACE_MANAGEMENT
from dba_tablespaces
/
16-A poor gc_files_to_locks setting can cause problems. In almost all cases in RAC, gc_files_to_locks does not need to set at all

SELECT inst_id,SUBSTR(NAME,1,30) ,SUBSTR(VALUE,1,50) from gv$parameter
where name ='gc_files_to_locks'
/


For details Refer to to Oracle Metalink Notes

Doc ID: 135714.1:Script to Collect RAC Diagnostic Information (racdiag.sql)
Doc ID: 181489.1:Tuning Inter-Instance Performance in RAC and OPS
Doc ID: 563566.1:gc lost blocks diagnostics