RMAN AND EMC DD ISSUE WITH AN ARCHIVE LOG WHEN RUNNING A RESTORE ON 1 OF 3 SERVERS (DOC ID 2116741.1)

RMAN AND EMC DD ISSUE WITH AN ARCHIVE LOG WHEN RUNNING A RESTORE ON 1 OF 3 SERVERS (DOC ID 2116741.1)


APPLIES TO:

Oracle Database - Enterprise Edition - Version 11.2.0.4 and later

            Oracle Database Cloud Schema Service - Version N/A and later

            Oracle Database Exadata Cloud Machine - Version N/A and later

            Oracle Cloud Infrastructure - Database Service - Version N/A and later

            Oracle Database Cloud Exadata Service - Version N/A and later

            Information in this document applies to any platform.

SYMPTOMS

Getting errors from archivelog file recovering database on one server. The same file was restored to other servers/standby hosts and the log restored and applied successfully. There is only a single standby server where the log fails to apply.

CHANGES

 

CAUSE

EMC's DataDomain backup with RMAN

            

            https://www.emc.com/collateral/hardware/data-sheet/h6811-datadomain-ds.pdf

            

            DDBoost functionality:

            

            Simply put DD Boost is software that enhances how backup servers and clients interact with a Data Domain backup appliance. It is based on Symantec's OST (Open Storage Technology) protocol, and is a means to extend Data Domain features back to the source. 

            Does not reproduce without DDBoost.

SOLUTION

Events can be set to help identify block corruptions when defining the rman channels. What impact does that have on the backups? Will they take longer? Is this something we should always have on - or just to debug issues?

            

            - From this point onwards, run all backups with event 10466 turned on for the

              RMAN channels that are doing the backup. This event turns on additional

              corruption detection in the Oracle I/O layers beneath RMAN. The purpose of

              setting this event is to determine whether the corruption is introduced during

              the backup process or is occurring subsequent to the backup.

            

              Here is an example of an RMAN script that can be used to run with

              event 10466 enabled for only the RMAN backups, but not for any other

              I/O done by this database:

            

              run

              {

              allocate channel c1 type sbt;

              sql channel c1

              "alter session set events ''10466 trace name context forever, level 1''";

              backup database;

              }

            

            - Do aggressive RMAN validation of every backup that is written to the

              problematic storage. Ideally do validation of every backup immediately

              following its creation, and then periodically thereafter.

 

The completion of the backup indicates that RMAN with event 10466 read each block, validated each and added HARD block bits into each backup similar to cksum in the OS.  If corruption is detected in the RMAN/Oracle areas the backup will fail but if corruption is only detected during validation it occurred after the blocks were passed to the media manager or OS layer.  When the blocks are handed off by Oracle the receiver returns an ack to acknowledge receipt of the blocks sent.  We expect the data sent will be returned so if corruption is detected with the 10466 event set but only during validation the OS vendor or media manager and storage vendors should be engaged.

 

In this case it was determined to be a hardware failure.  The same archivelogs was restored to 3 separate servers and applied to standby databases and only 1 server showed errors consistently.  That host was taken out of service.