Upon the Introduction of UNDO_RETENTIONSrinivas Maddali
Data Concurrency and Read Consistency
ROLLBACK or UNDO is the backbone of the READ CONSISTENCY mechanism provided by Oracle.
Multi-User Data Concurrency and Read Consistency mechanism make Oracle stand tall in Relational Database Management Systems (RDBMS) world.
In the words of Jonathan Lewis
Behind any simple action to modify the data held in the Oracle database, the database engine does an enormous amount of work in order to be able to offer maximum concurrency to the users whilst still maintaining consistency without conflict.
What element makes this mechanism to work so successfully?
The backbone for this ability is given to Oracle by System Change Number (SCN), which describes the point in status of the database. This is recorded by Oracle in Data Block Header (DBH).
Lot of information of the data block buffer headers is gotten from X$BH.
How a user’s ‘commit’, to his work done, is taken care Oracle?
When a user issues a ‘commit’ after he is done with his transaction, Oracle performs a commit in 2 ways.
Fast Commit – In simple terms if the data block is still in the buffer cache then record the SCN in the block header.
Delayed Block Cleanout - If the dirty block (means the modified block) has already been written to disk, then the next process to visit the block will automatically check the transaction entry in the undo segment header and find that the changes made to the block have actually been committed. The process then gets the SCN of the commit from the undo header transaction entry and writes it to the data block header to record the change as committed - this is known as delayed block cleanout.
How long it takes to cleanout the dirty buffer to be written to disk? No body can determine, but only the next reader of the modified undo block header.
Remember, so, it is the responsibility of the Reader of the modified/dirty block to write to the disk.
How the rollback/undo works?
When a user is performing DML activity on a set of data, the reader of the same data should be able to see the committed consistent set of data under the data consistency rule.
This means Oracle saves the original image of the data block being modified by the DML activity to ROLLBACK or UNDO segments that live in ROLLBACK or UNDO tablespace.
This helps to rollback the transaction the user did not like/want to commit or to provide a snapshot of the data to a user that wanted to see while another is modifying the data in another session.
Let us suppose:
01. User X started modifying the data at 10:00 AM 02. User Y issued a query to view data at 11:00 AM 03. User X committed his work at 12:00 NOON
Oracle for User Y builds snapshot of the data to satisfy the query as under:
01. Checks for the SCN number at 11:00 AM 02. Then check for all those blocks that satisfy the criteria 03. If a block is seen modified and marked with upper bound commit number to let this Oracle session reader process know that that block is to be written to the disk, the delayed block cleanout takes place. 04. If more and more data is required an attempt is made to lookup rollback segment header’s transaction slot pointed to by the top of the data block. When it is realized that the transaction has been over written, an attempt is made to rollback the changes made to the rollback segment header to get the original transaction slot entry 05. If it cannot be rolled back then throws up error ORA-01555.
In the own words of Oracle:
To manage the multi-version consistency model, Oracle must create a read-consistent set of data when a table is being queried (read) and simultaneously updated (written). When an update occurs, the original data values changed by the update are recorded in the database's undo records. As long as this update remains part of an uncommitted transaction, any user that later queries the modified data views the original data values. Oracle uses current information in the system global area and information in the undo records to construct a read-consistent view of a table's data for a query.
So, are all those SELECT statements are the built up snapshots.
While this being the case of Oracle way of working for the Database Users, let us not forget the fact that, Oracle is provided by the user a given space on the hard disk by creating ROLLBACK/UNDO tablespace for her activities as said by Jonathan. There are a couple of relational rules to be followed by Oracle set to herself by herself. No any single transaction can span more than a single segment. A segment may have more than a single transaction. There is a set limit for the number of transactions oracle can have in a single segment.
If Oracle fails to build the snapshot throws up ORA-01555 the most infamous Oracle Error.
How the number of segments is decided in Auto Undo Management?
It is an educated guess work.
If processes are set to 100 then SESSIONS are derived (1.1 * PROCESSES) + 5 = 115.
If Sessions are 115 then TRANSACTIONS are derived (1.1 * SESSIONS) = 126. The parameter TRANSACTIONS_PER_ROLLBACK_SEGMENT defaults to 5.
Basing on that default values, as one UNDO segment accommodates 5 transactions each and if all the transactions are concurrent then 126/5 = 25 +/- segments are to be created.
There could a better algorithm followed by Oracle basing on the average number of transactions and concurrency of those transactions and many more.
The reason is, in some cases the educated guess work is right, some times almost and sometimes no way near to the facts.
Evolution of UNDO Management
Let us first understand the evolution of the ROLLBACK/UNDO Management in the recent past.
Notes
(1) The guarantee is conditional
The UNDO_RETENTION parameter is ignored for a fixed size undo tablespace. The database may overwrite unexpired undo information when tablespace space becomes low.
For an undo tablespace with the AUTOEXTEND option enabled, the database attempts to honor the minimum retention period specified by UNDO_RETENTION. When space is low, instead of overwriting unexpired undo information, the tablespace auto-extends. If the MAXSIZE clause is specified for an auto-extending undo tablespace, when the maximum size is reached, the database may begin to overwrite unexpired undo information. This results naturally when a long running query reading from those undo blocks ORA-01555.
(2) This command gave the users to set a transaction to a given rollback/undo segment.
This gives the user the capability to set a transaction to a set segment of his choice. This helps likely to avoid ORA-01555
(3) The success of the Flashback directly dependent upon the undo_retention time setting and guarantee of retaining the undo until the expiry of that retention period.
What is this ORA_01555?
Source: ORA-01555 "Snapshot too old" - Detailed Explanation Metalink Doc ID: Note: 40689.1Last Revision Date: 08-AUG-2006
Applies to: Oracle Server - Enterprise Edition - Version: 8.1.7 and Information in this document applies to any platform.
1. Read Consistency
Oracle Server has the ability to have multi-version read consistency, which is invaluable to you because it guarantees that you are seeing a consistent view of the data (no 'dirty reads').
2. Delayed Block Cleanout
This is best illustrated with an example: Consider a transaction that updates a million row table. This obviously visits a large number of database blocks to make the change to the data. When the user commits the transaction Oracle does NOT go back and revisit these blocks to make the change permanent. It is left for the next transaction that visits any block affected by the update to 'tidy up' the block (hence the term 'delayed block cleanout').
Whenever Oracle changes a database block (index, table, cluster) it stores a pointer in the header of the data block which identifies the rollback segment used to hold the rollback information for the changes made by the transaction. (This is required if the user later elects to not commit the changes and wishes to 'undo' the changes made.)
Upon commit, the database simply marks the relevant rollback segment header entry as committed. Now, when one of the changed blocks is revisited Oracle examines the header of the data block, which indicates that it has been changed at some point. The database needs to confirm whether the change has been committed or whether it is currently uncommitted. To do this, Oracle determines the rollback segment used for the previous transaction (from the block's header) and then determines whether the rollback header indicates whether it has been committed or not.
If it is found that the block is committed then the header of the data block is updated so that subsequent accesses to the block do not incur this processing.
The algorithm that educates us on ORA-01555
When a fixed tablespace is allocated:
The algorithm for using extents is as follows:
1. A new extent will be allocated from the undo tablespace. 2. If failed because no free extent available and we cannot autoextend, try stealing an expired extent from another undo segment. 3. If failed because there is no extents expired, try reuse an unexpired extent from the current undo segment. 4. If failed, try stealing an unexpired extent from another undo segment. 5. If all failed, report an "Out-Of-Space" error.
When an autoextend-able datafiles are allocated to the UNDO tablespace:
Start DML transaction Allocate new extent for the transaction in UNDO tablespace If there is no space allocable then If the data files are set to AUTOEXTENSION on then extend the data file and allocate the extent else If they are not set for then try steeling the EXPIRED segments If the expired segments are not available then re-use the unexpired segments If failed to grab an extent for re-use then say "Out-Of-Space"
The above is an “OUT OF SPACE” ORA-01555 mechanism, as I understood from the docs and experiences.
The other reason and cause of the ORA-01555 which is not due to space, but because of
01. Overwriting of one undo block by another, head eats the tail 02. Delayed block cleanout data being over written due the reasons attributable to the configuration of the undo tablespace. 03. Excessive UNDO_RETENTION setting with not commensurate space allocation. 04. Nature of the database activity OLTP with high DML activity and small transactions. In this case my experience is that Oracle has gone to the extent of creating more than 75 with not very large segments but with different sized undo segments and high updates to the same data which demand more provision for more ITL slots by increasing the concurrent transaction capability thru increased INITRANS parameter value for the table/s that form part of the transaction. Case: One logon script that fires a SQL for more than 3000 times a day and more than 10 tables are highly dynamic and the incidence of ORA-01555 is a minimum of 3 times. UNDO_RETENTION was increased as suggested by Oracle 3 times from 900 to 5400 to 10800 and ultimately they came of to always set that to max query length + some thing. This suggestion was not taken and let the animal break three sessions minimum a day. 05. Nature of the database activity Hybrid DATA MART+DW. This database has a few MV. The worst experience with one MV is that one of the 6 base tables is daily updated thru a process. If the MV refresh is started soon after that process, we are sure to hit ORA-01555. The UNDO_RETENTION is reduced to 900 (default) then also if failed. If the refresh process is scheduled after 900 seconds, it is successfully completed with ORA-01555. The same is tested in Oracle 8i and there were no problems absolutely. This meant that we have benefits by not enabling that UNDO_RETENTION. With scheduled ETL and transformation processes the database has more than 30 +/- UNDO segments created. 06. Bugs
Oracle 9i introduces UNDO_RETENTION
The informed purpose of AUTO UNDO management by Oracle is to automate the UNDO management as Oracle takes over
The creation of the undo segments required for the database The management (taking offline /online) of those segments Setting a segment for a transaction The dropping of those segments
Oracle maintains all the information to nullify changes made to the database. Such information consists of records of the actions of transactions, collectively known as undo. Oracle uses the undo to do the following:
Rollback an active transaction Recover a terminated transaction Provide read consistency Recovery from logical corruptions
One of the most innovative concepts is UNDO_RETENTION.
The Business Requirements that could have potentially be source of introduction of UNDO_RETENTION by Oracle, is, being able to recover lost/modified data, within a given fixed time frame, caused by accidental and/or unwanted DML activity.
This provision benefits Flash Back Query.
To quote from Metalink Doc ID: 153384.1 for the referential explanation for the initialization parameter is:
UNDO_RETENTION specifies (in seconds) the amount of committed undo information to retain in the database. You can use UNDO_RETENTION to satisfy queries that require old undo information to rollback changes to produce older images of data blocks. You can set the value at instance startup.
The UNDO_RETENTION parameter works best if the current undo tablespace has enough space for the active transactions. If an active transaction needs undo space and the undo tablespace does not have any free space, then the system will start reusing undo space that would have been retained. This may cause long queries to fail. Be sure to allocate enough space in the undo tablespace to satisfy the space requirement for the current setting of this parameter.
In Oracle 9i, the implementation of UNDO_RETENTION is "not guaranteed".
There are bugs, UNDO_RETENTION, delayed block clean out, low space provision for UNDO have been responsible to make the life of an Oracle DBA engaged with tracing, diagnosing, discussing with OSS and possible discussions with his peers, friends in the DBA community.
Oracle should have introduced the UNDO_RETENTION with a holistic way not exclusively to meet the business requirement of ‘FLASHBACK QUERY’ with no guarantee attached to it.
It should be noted that
01. There can never be a given set of tested SQL that are used always 02. Ad-hoc queries are always possible 03. The data modification is high in OLTP but can never be set to a given trend as every OLTP is unique by its nature, use and purpose 04. The Hybrid Data Mart/Data Warehouse are always a challenge unless different application requirements are integrated while architecting the database. 05. The application vendors take responsibility for the schema design but not for the database design. They do not provide any design for the database. 06. The database design is dependent on the application and its nature and behavior. This can be done when every nut and bolt of the application is known at the architectural level. The vendor is never ready to provide that.
The fact that Flashback Query owes its birth to UNDO should not mean that UNDO tablespace is to pay for the FQ. This should make its life independent for the single reason that this concept cannot eat into READ CONSISTENCY mechanism of Oracle at her cost.
AUTOMATED UNDO MANAGEMENT restricted (onwards from Oracle 9i) creating multi-sized extents for UNDO segments.
Oracle cannot allow the clients to set the transaction to a specific UNDO/ROLLBACK segment and throws up
ORA-30019: Illegal rollback Segment operation in Automatic Undo mode
Oracle 10g also do not guarantee this if the UNDO tablespace is created with FIXED space while agreeing to alert the customer/client as and when the system is under pressure for space.
Oracle allows in 10g to set a transaction and made it legal.
In Oracle 10g R2 when fixed space is used for the UNDO tablespace, by default 85% of the allocated space is taken as the threshold limit and an alert is spit in the alert log file when that threshold is hit.
UNDO_RETENTION is guaranteed Oracle 10g as Oracle calculates the UNDO_RETENTION doing its math summing up max query length (of the SELECT statement usually) + 300 seconds as TUNED_UNDORETENTION.
And the client/customer agrees to configured datafiles associated with UNDO tablespace is set to AUTOEXTEND on mode and able to provide the space asked for.
Oracle advises us to leave the UNDO_RETENTION setting to default value (900 seconds).
In introduced capturing additional information to V$UNDOSTAT with the SQL information that caused ORA-01555 and also with the run-away SQL that has impacted the unwanted UPWARD revision that was caused to TUNED_UNDORETENTION time.
Upward and downward reset of the TUNED_UNDORETENTION does not mean any resizing of the data files associated with the UNDO tablespace/s by resetting the High water mark for the datafiles, which only enables the downsizing of the datafiles.
We now can easily identify the run away SQL as its Id is stored in the UNDO stats in the data dictionary.
The alerts that are deposited in the alert log file to notify the pressure on the space have their positive impact where adding space to UNDO tablespace is unbridled. This is not possible in majority shops.
What we want Oracle is to do for us?
Give back the ability to create a UNDO segment of user choice. Disassociate FlashBack Query from the database UNDO management. Granularized Flashback for a data segment can help disassociate the Flashback Query from Undo and UNDO Tablespace Management.
Oracle has to introduce new error mechanism that educates the user on the facts not throw up the infamous ORA-01555.
The stats say
Let me give some more facts
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||