Friday, December 17, 2010

Database Crashes With ORA-00494

Symptoms

Database may crashed with the following error in the alert file

ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 5488'

Incident details in: d:\oracle\admin\ecore\diag\rdbms\ecore\ecore\incident\incdir_12529\ecore_lgwr_5484_i12529.trc
Killing enqueue blocker (pid=5488) on resource CF-00000000-00000000 by (pid=5484)
by killing session 3.1
Killing enqueue blocker (pid=5488) on resource CF-00000000-00000000 by (pid=5484)
by terminating the process
LGWR (ospid: 5484): terminating the instance due to error 2103
Fri Dec 17 03:05:09 2010
Instance terminated by LGWR, pid = 5484

Cause


The lgwr has killed the ckpt process, causing the instance to crash.
From the alert.log we can see:
That the database has waited too long for a CF enqueue, so the following error has been reported.
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by by 'inst 1, osid 5488'

Then the LGWR has killed the blocker, which was in this case the CKPT process which cause the instance to crash.
Checking the alert.log we can see that the frequency of redo log files switch is very high(almost every 1 min).

Solution


1-We usually suggest to configure the redo log switches to be done every 20~30 min to reduce the contention on the control files.

You can use the V$INSTANCE_RECOVERY view column OPTIMAL_LOGFILE_SIZE to
determine the size of your online redo logs. This field shows the redo log file size in megabytes that is considered optimal based on the current setting of FAST_START_MTTR_TARGET. If this field consistently shows a value greater than the size of your smallest online log, then you should configure all your online
logs to be at least this size.

References

BUG:7448854 - ORA-00494 CAUSE THE INSTANCE TO CRASH
Metalink Note: 753290.1

No comments: