In this document, I discussed various possible corruptions (HW and SW) and recovery and salvage processes. Many MOS docs and Oracle Docs and my experiences put together is this document.
The causes of corruption can be any one of the below reason or multiple reasons.
Bad IO hardware or firmware
OS issues
Oracle Bugs while writing to the files it has formatted (with 0s as 0s have no value unless you have digits on the left or right side)
Recovering NOLOGGING or UNRECOVERABLE (THAT CAN BE RECOVERED IN 12.2.0. (BEFORE 12.2.0 we hit ORA-01578
So, broadly the corruption is physical and logical. While logical corruption is recoverable physical is NOT. if you are using local disks with NO mirroring the OS corruption may NOT be recoverable.
Fixing corruption is based on the understanding of :
Determine the extent of corruption and also determining if that corruption is transient or permanent
Replacing the faulty HW and recovering the possible data after running the tools like ckdsk to block the corrupt OS blocks which can break Oracle block as multiple OS blocks can be formatted as Oracle Block. Based on the Oracle block sie, we may NOT be able to read data written to that block.
Determine which logical objects are impacted or affected
Determine the needed recovery method based on the affected data and object/s
If the database is 12.2.0 or higher at RMAN console
recover database not logged block;
This may be in the run block based the backup process defined using MML layer or disk.
The block/s to be recovered can be identified by querying a metadata table as under:
select * from v$nonlogged_block;
Before doing the not logged block recovery:
If a stand by database (physical) is configured then cancel the recovery.
If it is READONLY stand by database shutdown the database
If it is bidirectional or unidirectional Oracle Goldengate configuration, stop that extract and replcat processes.
After RMAN command return to the CLI command prompt, with no errors, then at SQL*Plus prompt rerun the command
SELECT * FROM V$NONLOGGED_BLOCK ;
It the process is NOT listing any block that is previously listed corrupt, you are good.
If the database version is below 12.2.0 then you may encounter
ORA-01578
Oracle 8i may list ORA-26040
Recovery cannot retrieve the NOLOGGING data.
NO data can be recovered
The solution is based on the basc configuration media recovery is enabled or NOT.
Block-level media recovery
If it is header-block the block can NOT be recovered
If that is data block then connect to RMAN conlsole
rman
connect target /
connect catalog catalog_user/catalog_user-password@catalog_database ----- this is optional
at the RMAN prompt
run { blockrecover datafile (datafile id) block (blck_id) } ------------ the inputs are from can be had from v$backup_corruption; table
alternatively, at RMAN prompt
list failure;
repair failure preview;
repair failure noprompt;
in 11.2 the RMAN recovery process sysntax changes are there.
recover datafile 1/n block n,n,n,n; datafile n block n,n,n;
and
recover datafile dn block n to n1 datafile dn block n to nn;
Restrictions:
Block 1 is always header and can NOT be recovered. RMAN behavior when you try to recover the header block is different. Block 1 (datafile header) is physically corrupt in this example with some garbage. The RMAN block recover command does not error out but alert log is updated:
Options of recovery:
Block media recovery as discussed above
Datafile recovery (if the database is NOT in media recovery mode … then there wi;ll be data loss)
Tablespace recovery (if the database is in no media recovery mode then you will suffer data loss)
Database recovery (if media recovery is NOT enabled then there will be data loss)
After complete recovery –- if it is table
analyze <owner>.<table_name> validate structure;
If it is index, the recovery process is simple. Drop the index and recreate the index. Before dropping the index, check if there are any FK constraints referencing. If there are
Alter table <owner>.table_name> disable constraint <pk_contraint_name>
Rebuild the PK
Disable the constraint
Drop the index
Create Index
Enable PK constraint
Enabling the FK – alter table <owner>.<child_key_table> enable constraint <constraint_name>;
Indexes also can be rebuilt. Based on the situation you are in, use that.
Salvaging the tables with corrupt data.
Oracle 7.2 onwards we have DBMS_REPAIR.SKIP_CORRUPT_BLOCK. The following script (MOS supplied) may be used:
REM Create the repair table in a given tablespace:
BEGIN
DBMS_REPAIR.ADMIN_TABLES (
TABLE_NAME => 'REPAIR_TABLE',
TABLE_TYPE => dbms_repair.repair_table,
ACTION => dbms_repair.create_action,
TABLESPACE => '&tablespace_name');
END;
/
REM Identify corrupted blocks for schema.object (it also can be done at partition level with parameter PARTITION_NAME):
REM Allow future DML statements to skip the corrupted blocks:
BEGIN
DBMS_REPAIR.SKIP_CORRUPT_BLOCKS (
SCHEMA_NAME => '&schema_name',
OBJECT_NAME => '&object_name',
OBJECT_TYPE => dbms_repair.table_object,
FLAGS => dbms_repair.SKIP_FLAG);
END;
/
NOTE: the above code is an example. Please use that at your own risk.
LOB segments are to be handled differently
It is not possible to used DBMS_REPAIR on LOB segments.
If the corrupt LOB block is NOT referenced by any row in the table then it should be possible to CREATE TABLE as SELECT (CTAS) or export / drop / import the table as is.
If the corrupt LOB block is referenced by a row then it should be possible to select or export with a WHERE predicate that excludes the problem row/s.
If a standby database is running for the database that has corrupt blocks, then find the options available for recovery immediately when the corruption occurred. This may be difficult.
Use DBVERIFY on the files restored and recovered to rule out corruption is avoided.
Last Chance is when you have all the archived logs created after the datafile is created, drop the datafile with corrupt blocks. Recreate if
Shutdown the database.
Startup mount;
Alter database datafile ‘<datafile_full_name>’ drop offline;
Alter database create datafile ‘<datafile_full_name> size <nn> m;
Recover datafile ‘…’;
Alter database ‘<datafile_name>’ online;
Alter database open;
This activity can be online too. Then the database need NOT be in mount mode. It can be open. The objects and data stored, can throw errors when accessed.