Automatic Page Repair is a feature in SQL Server Always On Availability Groups that helps ensure high availability and data protection by automatically repairing corrupted pages detected in the database. Suppose your AG replica is healthy, but one of your AG databases becomes corrupt. You get a logical consistency error if you execute the DBCC CHECKDB command.
In this article, let's see how the Automatic Page Repair feature works.
- Detection of Corruption: When a replica in an Availability Group encounters a corrupted page (e.g., due to I/O subsystem issues), it logs an error in the SQL Server error log and the Windows Event Log
- Request for Repair: The replica that detects the corruption (usually the secondary replica) requests a fresh copy of the corrupted page from another replica (typically the primary replica).
- Page Fetching: The primary replica, upon receiving the request, reads the corresponding page from its own database. If the primary replica can successfully read the page (i.e., the page is not corrupted), it sends the page back to the requesting replica.
- Repair Execution: The requesting replica receives the healthy page and uses it to repair the corrupted page. Once the page is repaired, normal operations can continue.
- Replication Role: Primary and secondary replicas participate in this process. The primary replica usually has the most recent and consistent data, making it the source of the repair data.
- Automatic Nature: The entire process is automatic and does not require manual intervention, ensuring minimal downtime and maintaining database availability.
- Requirements: The Always On Availability Groups must be properly configured and synchronized. The secondary replica must be in a readable state to detect the corruption and initiate the repair process.
- Improved Data Availability: Automatic page repair ensures that the database remains available even in the event of page corruption.
- Reduced Administrative Overhead: Administrators do not need to manually intervene to repair corrupted pages, which reduces maintenance time and effort.
- Enhanced Data Protection: Ensures that corruption in one replica does not affect the overall availability and integrity of the database.
- Configuration: Automatic Page Repair is enabled by default when you configure Always On Availability Groups. There are no specific settings to enable this feature; it is part of the core functionality.
- Monitoring: You can monitor the status and events related to automatic page repair using SQL Server error logs and the Windows Event Log. Use SQL Server Management Studio (SSMS) and dynamic management views (DMVs) to check the health and status of Availability Groups and replicas.
Suppose you have a primary replica and two secondary replicas in an Availability Group. If one of the secondary replicas encounters a corrupted page during a read operation, it will:
- Log an error indicating the corruption.
- Request the page from the primary replica.
- Receive the page from the primary if it is not corrupted.
- Repair the corrupted page and continue normal operations.
By leveraging the automatic page repair feature, SQL Server Always On Availability Groups provide a robust mechanism to handle page corruption, ensuring high availability and data integrity.
Nice post