Upon the Introduction of UNDO_RETENTION

Srinivas Maddali

Data Concurrency and Read Consistency

ROLLBACK or UNDO is the backbone of the READ CONSISTENCY mechanism provided by Oracle.

Multi-User Data Concurrency and Read Consistency mechanism make Oracle stand tall in Relational Database Management Systems (RDBMS) world.

In the words of Jonathan Lewis

Behind any simple action to modify the data held in the Oracle database, the database engine does an enormous amount of work in order to be able to offer maximum concurrency to the users whilst still maintaining consistency without conflict.

What element makes this mechanism to work so successfully?

The backbone for this ability is given to Oracle by System Change Number (SCN), which describes the point in status of the database. This is recorded by Oracle in Data Block Header (DBH).

Lot of information of the data block buffer headers is gotten from X$BH.

How a user’s ‘commit’, to his work done, is taken care Oracle?

When a user issues a ‘commit’ after he is done with his transaction, Oracle performs a commit in 2 ways.

Fast Commit – In simple terms if the data block is still in the buffer cache then record the SCN in the block header.

Delayed Block Cleanout – If the dirty block (means the modified block) has already been written to disk, then the next process to visit the block will automatically check the transaction entry in the undo segment header and find that the changes made to the block have actually been committed. The process then gets the SCN of the commit from the undo header transaction entry and writes it to the data block header to record the change as committed – this is known as delayed block cleanout.

How long it takes to cleanout the dirty buffer to be written to disk? No body can determine, but only the next reader of the modified undo block header.

Remember, so, it is the responsibility of the Reader of the modified/dirty block to write to the disk.

How the rollback/undo works?

When a user is performing DML activity on a set of data, the reader of the same data should be able to see the committed consistent set of data under the data consistency rule.

This means Oracle saves the original image of the data block being modified by the DML activity to ROLLBACK or UNDO segments that live in ROLLBACK or UNDO tablespace.

This helps to rollback the transaction the user did not like/want to commit or to provide a snapshot of the data to a user that wanted to see while another is modifying the data in another session.

Let us suppose:

  1. User X started modifying the data at 10:00 AM
  2. User Y issued a query to view data at 11:00 AM
  3. User X committed his work at 12:00 NOON

 

Oracle for User Y builds snapshot of the data to satisfy the query as under:

  1. Checks for the SCN number at 11:00 AM
  2. Then check for all those blocks that satisfy the criteria
  3. If a block is seen modified and marked with upper bound commit number to let this Oracle session reader process know that that block is to be written to the disk, the delayed block cleanout takes place.
  4. If more and more data is required an attempt is made to lookup rollback segment header’s transaction slot pointed to by the top of the data block. When it is realized that the transaction has been over written, an attempt is made to rollback the changes made to the rollback segment header to get the original transaction slot entry
  5. If it cannot be rolled back then throws up error ORA-01555.

In the own words of Oracle:

To manage the multi-version consistency model, Oracle must create a read-consistent set of data when a table is being queried (read) and simultaneously updated (written). When an update occurs, the original data values changed by the update are recorded in the database’s undo records. As long as this update remains part of an uncommitted transaction, any user that later queries the modified data views the original data values. Oracle uses current information in the system global area and information in the undo records to construct a read-consistent view of a table’s data for a query.

So, are all those SELECT statements are the built up snapshots.

While this being the case of Oracle way of working for the Database Users, let us not forget the fact that, Oracle is provided by the user a given space on the hard disk by creating ROLLBACK/UNDO tablespace for her activities as said by Jonathan. There are a couple of relational rules to be followed by Oracle set to herself by herself. No any single transaction can span more than a single segment. A segment may have more than a single transaction. There is a set limit for the number of transactions oracle can have in a single segment.

If Oracle fails to build the snapshot throws up ORA-01555 the most infamous Oracle Error.

How the number of segments is decided in Auto Undo Management?

It is an educated guess work.

If processes are set to 100 then SESSIONS are derived (1.1 * PROCESSES) + 5 = 115.

If Sessions are 115 then TRANSACTIONS are derived (1.1 * SESSIONS) = 126.

The parameter TRANSACTIONS_PER_ROLLBACK_SEGMENT defaults to 5.

Basing on that default values, as one UNDO segment accommodates 5 transactions each and if all the transactions are concurrent then 126/5 = 25 +/- segments are to be created.

There could a better algorithm followed by Oracle basing on the average number of transactions and concurrency of those transactions and many more.

The reason is, in some cases the educated guess work is right, some times almost and sometimes no way near to the facts.

 

Evolution of UNDO Management

Let us first understand the evolution of the ROLLBACK/UNDO Management in the recent past.

Description 7 8i 9i 10g
CREATE ROLLBACK SEGMENT Yes Yes No No
Online or offline ability Yes Yes No No
Set transaction to rollback segment <rollback_segment_name> (2) Yes Yes No Yes
Exec dbms_transaction.use_rollback_segment (‘SYSTEM’);(2) Yes Yes No Yes
Auto Undo Management available NA NA YES YES
UNDO_RETENTION NA NA YES YES
UNDO RETENTION GAURANTEE NA NA NO YES (1)
ORACLE FLASHBACK QUERY NA NA YES (3) YES (3)

Notes

(1) The guarantee is conditional

The UNDO_RETENTION parameter is ignored for a fixed size undo tablespace. The database may overwrite unexpired undo information when tablespace space becomes low.

For an undo tablespace with the AUTOEXTEND option enabled, the database attempts to honor the minimum retention period specified by UNDO_RETENTION. When space is low, instead of overwriting unexpired undo information, the tablespace auto-extends. If the MAXSIZE clause is specified for an auto-extending undo tablespace, when the maximum size is reached, the database may begin to overwrite unexpired undo information. This results naturally when a long running query reading from those undo blocks ORA-01555.

(2) This command gave the users to set a transaction to a given   rollback/undo segment.

This gives the user the capability to set a transaction to a set segment of his choice. This helps likely to avoid ORA-01555

(3) The success of the Flashback directly dependent upon the undo_retention time setting and guarantee of retaining the undo until the expiry of that retention period.

What is this ORA_01555?

Source: ORA-01555 “Snapshot too old” – Detailed Explanation

Metalink Doc ID:  Note: 40689.1

Last Revision Date:  08-AUG-2006

Applies to:  Oracle Server – Enterprise Edition – Version: 8.1.7 and Information in this document applies to any platform.

  1. Read Consistency

Oracle Server has the ability to have multi-version read consistency, which is invaluable to you because it guarantees that you are seeing a consistent view of the data (no ‘dirty reads’).

Delayed Block Cleanout

This is best illustrated with an example: Consider a transaction that updates a million row table. This obviously visits a large number of database blocks to make the change to the data. When the user commits the transaction Oracle does NOT go back and revisit these blocks to make the change permanent. It is left for the next transaction that visits any block affected by the update to ‘tidy up’ the block (hence the term ‘delayed block cleanout’).

Whenever Oracle changes a database block (index, table, cluster) it stores a pointer in the header of the data block which identifies the rollback segment used to hold the rollback information for the changes made by the transaction. (This is required if the user later elects to not commit the changes and wishes to ‘undo’ the changes made.)

Upon commit, the database simply marks the relevant rollback segment header entry as committed. Now, when one of the changed blocks is revisited Oracle examines the header of the data block, which indicates that it has been changed at some point. The database needs to confirm whether the change has been committed or whether it is currently uncommitted. To do this, Oracle determines the rollback segment used for the previous transaction (from the block’s header) and then determines whether the rollback header indicates whether it has been committed or not.

If it is found that the block is committed then the header of the data block is updated so that subsequent accesses to the block do not incur this processing.

The algorithm that educates us on ORA-01555

When a fixed tablespace is allocated:

The algorithm for using extents is as follows:

  1. A new extent will be allocated from the undo tablespace.
  2. If failed because no free extent available and we cannot autoextend, try stealing an expired extent from another undo segment.
  3. If failed because there is no extents expired, try reuse an unexpired extent from the current undo segment.
  4. If failed, try stealing an unexpired extent from another undo segment.
  5. If all failed, report an “Out-Of-Space” error.

When an autoextend-able datafiles are allocated to the UNDO tablespace:

Start DML transaction

Allocate new extent for the transaction in UNDO tablespace

If there is no space allocable then

If the data files are set to AUTOEXTENSION on then extend the data file and allocate the extent else

If they are not set for then try steeling the EXPIRED segments

If the expired segments are not available then re-use the unexpired segments

If failed to grab an extent for re-use then say “Out-Of-Space”

The above is an “OUT OF SPACE” ORA-01555 mechanism, as I understood from the docs and experiences.

The other reason and cause of the ORA-01555 which is not due to space, but because of

  1. Overwriting of one undo block by another, head eats the tail
  2. Delayed block cleanout data being over written due the reasons attributable to the configuration of the undo tablespace.
  3. Excessive UNDO_RETENTION setting with not commensurate space allocation.
  4. Nature of the database activity OLTP with high DML activity and small transactions. In this case my experience is that Oracle has gone to the extent of creating more than 75 with not very large segments but with different sized undo segments and high updates to the same data which demand more provision for more ITL slots by increasing the concurrent transaction capability thru increased INITRANS parameter value for the table/s that form part of the transaction.

Case: One logon script that fires a SQL for more than 3000 times a day and more than 10 tables are highly dynamic and the incidence of ORA-01555 is a minimum of 3 times. UNDO_RETENTION was increased as suggested by Oracle 3 times from 900 to 5400 to 10800 and ultimately they came of to always set that to max query length + some thing. This suggestion was not taken and let the animal break three sessions minimum a day.

  1. Nature of the database activity Hybrid DATA MART+DW. This database has a few MV. The worst experience with one MV is that one of the 6 base tables is daily updated thru a process. If the MV refresh is started soon after that process, we are sure to hit ORA-01555. The UNDO_RETENTION is reduced to 900 (default) then also if failed. If the refresh process is scheduled after 900 seconds, it is successfully completed with ORA-01555. The same is tested in Oracle 8i and there were no problems absolutely. This meant that we have benefits by not enabling that UNDO_RETENTION. With scheduled ETL and transformation processes the database has more than 30 +/- UNDO segments created.
  2. Bugs

Oracle 9i introduces UNDO_RETENTION

The informed purpose of AUTO UNDO management by Oracle is to automate the UNDO management as Oracle takes over

The creation of the undo segments required for the database

The management (taking offline /online) of those segments

Setting a segment for a transaction

The dropping of those segments

Oracle maintains all the information to nullify changes made to the database. Such information consists of records of the actions of transactions, collectively known as undo. Oracle uses the undo to do the following:

Rollback an active transaction

Recover a terminated transaction

Provide read consistency

Recovery from logical corruptions

One of the most innovative concepts is UNDO_RETENTION.

The Business Requirements that could have potentially be source of introduction of UNDO_RETENTION by Oracle, is, being able to recover lost/modified data, within a given fixed time frame, caused by accidental and/or unwanted DML activity.

This provision benefits Flash Back Query.

To quote from Metalink Doc ID: 153384.1 for the referential explanation for the initialization parameter is:

UNDO_RETENTION specifies (in seconds) the amount of committed undo information to retain in the database. You can use UNDO_RETENTION to satisfy queries that require old undo information to rollback changes to produce older images of data blocks. You can set the value at instance startup.

The UNDO_RETENTION parameter works best if the current undo tablespace has enough space for the active transactions. If an active transaction needs undo space and the undo tablespace does not have any free space, then the system will start reusing undo space that would have been retained. This may cause long queries to fail. Be sure to allocate enough space in the undo tablespace to satisfy the space requirement for the current setting of this parameter.

In Oracle 9i, the implementation of UNDO_RETENTION is “not guaranteed”.

There are bugs, UNDO_RETENTION, delayed block clean out, low space provision for UNDO have been responsible to make the life of an Oracle DBA engaged with tracing, diagnosing, discussing with OSS and possible discussions with his peers, friends in the DBA community.

Oracle should have introduced the UNDO_RETENTION with a holistic way not exclusively to meet the business requirement of ‘FLASHBACK QUERY’ with no guarantee attached to it.

It should be noted that

  1. There can never be a given set of tested SQL that are used always
  2. Ad-hoc queries are always possible
  3. The data modification is high in OLTP but can never be set to a given trend as every OLTP is unique by its nature, use and purpose
  4. The Hybrid Data Mart/Data Warehouse are always a challenge unless different application requirements are integrated while architecting the database.
  5. The application vendors take responsibility for the schema design but not for the database design. They do not provide any design for the database.
  6. The database design is dependent on the application and its nature and behavior. This can be done when every nut and bolt of the application is known at the architectural level. The vendor is never ready to provide that.

The fact that Flashback Query owes its birth to UNDO should not mean that UNDO tablespace is to pay for the FQ. This should make its life independent for the single reason that this concept cannot eat into READ CONSISTENCY mechanism of Oracle at her cost.

AUTOMATED UNDO MANAGEMENT restricted (onwards from Oracle 9i) creating multi-sized extents for UNDO segments.

Oracle cannot allow the clients to set the transaction to a specific UNDO/ROLLBACK segment and throws up

ORA-30019: Illegal rollback Segment operation in Automatic Undo mode

Oracle 10g also do not guarantee this if the UNDO tablespace is created with FIXED space while agreeing to alert the customer/client as and when the system is under pressure for space.

Oracle allows in 10g to set a transaction and made it legal.

In Oracle 10g R2 when fixed space is used for the UNDO tablespace, by default 85% of the allocated space is taken as the threshold limit and an alert is spit in the alert log file when that threshold is hit.

UNDO_RETENTION is guaranteed Oracle 10g as Oracle calculates the UNDO_RETENTION doing its math summing up max query length (of the SELECT statement usually) + 300 seconds as TUNED_UNDORETENTION.

And the client/customer agrees to configured datafiles associated with UNDO tablespace is set to AUTOEXTEND on mode and able to provide the space asked for.

Oracle advises us to leave the UNDO_RETENTION setting to default value (900 seconds).

In introduced capturing additional information to V$UNDOSTAT with the SQL information that caused ORA-01555 and also with the run-away SQL that has impacted the unwanted UPWARD revision that was caused to TUNED_UNDORETENTION time.

Upward and downward reset of the TUNED_UNDORETENTION does not mean any resizing of the data files associated with the UNDO tablespace/s by resetting the High water mark for the datafiles, which only enables the downsizing of the datafiles.

We now can easily identify the run away SQL as its Id is stored in the UNDO stats in the data dictionary.

The alerts that are deposited in the alert log file to notify the pressure on the space have their positive impact where adding space to UNDO tablespace is unbridled. This is not possible in majority shops.

 

What we want Oracle is to do for us?

Give back the ability to create a UNDO segment of user choice.

Disassociate FlashBack Query from the database UNDO management.

Granularized Flashback for a data segment can help disassociate the Flashback Query from Undo and UNDO Tablespace Management.

Oracle has to introduce new error mechanism that educates the user on the facts not throw up the infamous ORA-01555.

The stats say

Incident Description # Incidents
ORA-01555 with NOSPACE error counts NIL
ORA-01555 with SSOLD error counts 823
Flash Back Query 1

Let me give some more facts

Version Block size Undo in day Allocated space in Gig By UNDO per minute RETENTION ORA-1555 Remarks
9.2.0.8.0 4 641.95 6.31 15.21 10800 42 Bugs that set the max query length to millions of seconds + application problems with few more runaway queries
9.2.0.6.0. 8 6.9 0.5 0.08 10800 0
9.2.0.6.0. 4 0.1 0.13 0 10800 0
9.2.0.6.0. 8 17.79 8 0.21 10800 0
9.2.0.6.0. 8 133.7 26.66 1.58 10800 0
9.2.0.6.0. 8 6.87 3.91 0.24 3600 9
9.2.0.6.0. 8 15.88 2.93 0.19 10800 0
9.2.0.6.0. 4 0.12 0.6 0.01 900 0
9.2.0.6.0. 4 0.33 0.93 0.01 10800 0
9.2.0.6.0. 8 10.42 3.91 1.46 900 0
9.2.0.6.0. 8 0.9 1.95 0.01 10800 0
9.2.0.6.0. 8 0.98 0.01 10800 0
9.2.0.6.0. 4 0.52 0.56 0.01 10800 0
9.2.0.6.0 8 7.21 5 1 900 25 Bugs that set the max query length to millions of seconds and the scheduling of the processes immediately after the loads resulted in these errors when they are fixed this got fixed.
4 0.09 0.09 0 900 0
4 0.28 1.76 0 10800 0
8 23.98 0.75 3.38 900 0
4 2.21 0.57 0.05 10800 0
4 0.12 0.6 0 5400 0
4 2393.3 4 56.73 10800 0 Scheduled low reads
4 2.62 2.93 0.06 10800 0
8 443.76 28.58 5.26 10800 664 OLTP
8 1.46 0.49 0.02 10800 0
32 28.21 35.55 0.98 900 77 DATA MART+DW
4 333.69 1.5 7.91 10800 0
4 0.11 2 0 900 0
4 41.1 0.62 0.97 10800 0
8 20.83 29.1 0.73 3600 6 OLTP+OLAP
8 0.77 4 0.01 10800 0

The Database names are not given, as I do not want to divulge the private information. Some of the databases are home grown and other databases are to support the applications as provided by the vendors. Majority of the ORA-01555 are reported from those databases built to deploy the schema provided by the vendors.

It is easy for the application vendors to point finger to Oracle for ORA-01555.

The Undo extent management manual and auto make a difference

Description   Version Oracle 9i Manual
Nature of Database Block Size Initial Next
SYSTEM OLTP-Application supporting 9.2.0 8 56 56
RBS00 8 128 128
RBS01 8 1024 1024
RBS02 8 1024 1024
RBS03 8 1024 1024
RBS04 8 1024 1024
RBS05 8 1024 1024
RBS06 8 1024 1024
R01 8 4096 4104
R02 8 4096 4104
R03 8 4096 4104
R04 8 4096 4104
Description   Version Oracle 9i Auto
Nature of Database Block Size Initial Next
Segment 01 Ext 01 OLTP db 01 9.2.0 4 128 64
OLTP db 02 9.2.0 8 128 64
DM-DW db 01 9.2.0 32 128 64
Description   Version Oracle10g Auto
Nature of Database Block Size Initial Next 01
Segment 01 General Purpose 10.2.0 8 128 64
Description   Version Oracle 8i Manual
Nature of Database Block Size Initial Next 01
Segment 01 OLTP 8.1.7 8 120 120
Segment 02 OLTP 8 588 588
Segment 03 OLTP 8 592 592
Segment 04 OLTP 8 1,452 1,452
Segment 05 OLTP 8 1468 1468
Segment 06 OLTP 8 1,480 1,480
Segment 07 OLTP 8 1,520 1,520
Segment 08 OLTP 8 1,608 1,608
Segment 09 OLTP 8 1,668 1,668
Segment 10 OLTP 8 3,019 3,019
Segment 11 OLTP 8 5,856 5,856

Please Note:

(1)    Manual UNDO is after the manual upgrade of a database from 8.1.7 to 9.2.0

(2)    The R0, R1, R2, R3 created by Oracle.

(3)     The next extent was greater than the initial extent for those extents create by Oracle while the 8.1.7 extents have the next extent equal to the initial extent.

(4)    In the auto undo management we see that Oracle created bigger initial extents and their sizes are consistent when the number of blocks are considered for the initial extent (128 blocks) and next extent (64 blocks). This has a potential to contribute some kind imbalance. The intent of bigger initial extent is to accommodate the entire transaction in one block, which generally may not be the case.

What the undo header block contains?

Start dump data blocks tsn: 1 file#: 2 minblk 25 maxblk 25

buffer tsn: 1 rdba: 0x00800019 (2/25)

scn: 0x0000.00326134 seq: 0x03 flg: 0x00 tail: 0x61342603

frmt: 0x02 chkval: 0x0000 type: 0x26=KTU SMU HEADER BLOCK

Hex dump of block: st=0, typ_found=1

Dump of memory from 0x080B5600 to 0x080B7600

Extent Control Header

—————————————————————–

Extent Header:: spare1: 0      spare2: 0      #extents: 4      #blocks: 271

last map  0x00000000  #maps: 0      offset: 4080

Highwater::  0x00800330  ext#: 3      blk#: 39     ext size: 128

#blocks in seg. hdr’s freelists: 0

#blocks below: 0

mapblk  0x00000000  offset: 3

Unlocked

Map Header:: next  0x00000000  #extents: 4    obj#: 0      flag: 0x40000000

Extent Map

—————————————————————–

0x0080001a  length: 7

0x00800081  length: 8

0x00800189 length:128

0x00800309 length:128

Retention Table

———————————————————–

Extent Number:0 Commit Time: 1162767755

Extent Number: 1 Commit Time: 1162767755

Extent Number: 2 Commit Time: 1163385640

Extent Number: 3 Commit Time: 1163385640

TRN CTL:: seq: 0x0513 chd: 0x0005 ctl: 0x001d inc: 0x00000000 nfb: 0x0001

mgc: 0x8201 xts: 0x0068 flg: 0x0001 opt: 2147483646 (0x7ffffffe)

uba: 0x00800330.0513.3a scn: 0x0000.00325bca

Version: 0x01

FREE BLOCK POOL::

uba: 0x00800330.0513.3b ext: 0x3  spc: 0x2fc

uba: 0x00000000.0513.05 ext: 0x3  spc: 0x1d48

uba: 0x00000000.0513.13 ext: 0x3  spc: 0x15ec

uba: 0x00000000.04c7.09 ext: 0x3b spc: 0x240

uba: 0x00000000.04c7.09 ext: 0x3b spc: 0x240

 TRN TBL::

index  state cflags  wrap#    uel  scn   dba   parent-xid    nub     stmt_num    cmt

(Lot of the dump is not known from that nor oracle is ready to document)

Commit time means as under:

Commit Time: 1162767755 is the elapsed seconds from 01:01:1970 00:00:00

How that can be achieved by Oracle?

Oracle has to introduce another background process to take care of the flashback query option associated with UNDO_RETENTION setting. This may be called like FBQM (Flash Back Query Monitor) or FBQMON (Flash Back Query Monitor) or any name that fits into the frame work of Oracle naming conventions.

This has to be associated with the requirement of a separate tablespace/table to be a repository for the UNDO data with supporting metadata, to manage the undo for flash back query purposes. There should be appropriate suggestions for various types are databases like basing on

(1)   DML activity

(2)   Transaction Size

(3)   Flash Back Query requirements

(4)   Data Recovery requirements

And the databases are also to be identified by their nature

  1. OLTP Databases with heavy but small transactions which require multiple small undo segments with Flash Back Option requirements
  2. OLTP Databases with moderate but longer transactions which require moderately large undo segments with Flash Back Query Requirements
  3. Data Mart Databases with batch loads and data massages and transformation requiring larger undo segments and Flash Back Query requirements
  4. OLAP Databases larger undo segments with dependency on Flash Back Query requirements
  5. Read intensive Data Warehouse databases with little DML activity generating less and with no dependency on Flash Back Query requirements
  6. Write once and Read many Data Warehouse with no DML activity with no dependency on Flash Back Query requirements

With this kind or further improved options UNDO_RETENTION can be seen as a valid and implement able option and meeting the long term demand of the Oracle Customers and Users. Other wise Oracle is to continue to advise switching off of this option as the “BEST PRACTICES” for avoiding ORA-01555 which is induced by this UNDO_RETENTION and the requirements to guarantee the implementation.

Appeal that there is a lot of unknown black matter in the translucent internals of the management of undo to me, this presentation is made as part my curiosity to understand and to over come my day to day issues and problems with this ORA-01555 and also request Oracle publicly to find a way to disassociate FLASHBACK QUERY option from the UNDO tablespace. This can be achieved by providing a new background process and metadata and repository for the undo on historical basis and querying method of that repository for the undo.

(This document was created on Oct 26 2006)