Monday, April 5, 2010

All about maintennance mode, chkdsk/defrag in failover clustering including "cluster shared volumes"

Continuing from “storage architecture changes” to “how configuring storage with volume guid works”, today we will talk about different options of running chkdsk/defrag in maintenance mode in failover clustering 2008/R2. We will also see how putting a physical disk resource in maintenance mode is different from putting a cluster shared volume in maintenance mode. Chkdsk has always been a challenge especially in environments where storage disks size go in terabytes. I have seen many real life scenarios where critical disk resources were not available for services and applications in production time as chkdsk started running on them when they were mounted [brought online]…this happens because when we bring a disk resource online we check the dirty bit on file system to see if it’s dirty or not and if found dirty..we start chkdsk on it…what to do?....let’s call Microsoft….and as recommended they say “ it’s not recommended to stop chkdsk while its running”…ok fair enough..same question..what to do? ….the disk resource size is in terabytes and it may take hours if not days to get it finished. We cannot let our production critical apps/services waiting on chkdsk for days.

You have 2 options…loose production and wait for chkdsk to finish..kill chkdsk from task manager on your own risk [assuming you have all data backup]. And here comes the role of designing, architecting your highly available services. Prevention is always better than cure. Isn’t it! There is a pretty decent blog covering how to stop chkdsk from running on 2003 servers [including cluster servers] today we will see what options do we have for failover clustering 2008/R2. In failover clustering we have much more options of controlling the chkdsk behavior.

DiskRunChkDsk API--Determines whether the operating system runs chkdsk on a physical disk before attempting to mount the disk. Setting DiskRunChkDsk to FALSE causes the operating system to mount the disk without running chkdsk. With DiskRunChkDsk set to TRUE (the default), the operating system runs chkdsk first and, if errors are found, takes action based on the ConditionalMount property. The following table summarizes the interaction between DiskRunChkDsk and ConditionalMount.

The settings can be changed using the cluster.exe
cluster res “cluster disk ” /priv DiskRunChkDsk=value [see below for corresponding value, example 0 is default]

Chkdsk options

0 (Default): Run Normal Check. If corrupt, run chkdsk to fix the problem. Normal Check: Open the files in the root of the volume. Check volume dirty bit)

1 Run Verbose Check. If corrupt, run chkdsk to fix the problem. Verbose Check: Recursively open all files in the volume. Check volume dirty bit.

2 Run Normal Check. If corrupt, run chkdsk to fix the problem. If not corrupt, run chkdsk in read-only mode on the volume in parallel (i.e. online will proceed (and might complete) while chkdsk is running in read-only mode). We run chkdsk on a snapshot of the volume) and proceed with online.

3 Don’t do any File System check. Always run chkdsk on the volume.

4 Don’t do any File System check. Never run chkdsk, online disk without any File System check. Please note that this will also disable IsAlive/LooksAlive File System checks.

5 Run Verbose Check. If corrupt, fail online, don’t run chkdsk. User intervention required.

6 Suppresses volume creation/online/mounting during disk resource online. Disk is in offline read write mode, i.e. the disk is readable/writable using raw block level IOs. [ I doubt this is supported]

So let’s say as per standard maintenance task I need to run chkdsk every quarter or I need to take weekly backup in that case I can leverage the cluster maintenance mode to do the maintenance activity. Administrators use tools such as ChkDsk and VSS as part of weekly maintenance operation to ensure that disks are functional and there are no operational issues. These tools require exclusive access to the volume during their run. While these tools are in use, applications cannot read or write to the disk. The administrator expects the disk maintenance to succeed without ChkDsk failure and without a failover of the disk that the ChkDsk is run against. Under normal circumstances, the cluster disk resources will fail over when ChkDsk (fix error mode), VSS restore or any other tool that locks or dismounts the volume is run against a clustered disk. These tools fails part way through since cluster disk resource fails its health check that causes the cluster service to fail over the disk to the other node. This causes the node where these tools are run to lose access to the disk.

The following checks are performed on any disk that is online and that is managed by the cluster:

File system level checks: At the file system level, the Physical Disk resource type performs the following checks:
LooksAlive: By default, a brief check is performed every 5 seconds to verify that a disk is still available. The LooksAlive check determines whether a resource flag is set. This flag indicates that a device has failed. For example, a flag may indicate that periodic reservation has failed. The frequency of this check is user definable.
IsAlive: A complete check is performed every 60 seconds to verify that the disk and the file system, or systems, can be accessed. The IsAlive check effectively performs the same functionality as a dir command that you type at a command prompt. The frequency of this check is user definable.

Device level checks :At the device level, the Clusdisk.sys driver keeps checking every 3 seconds on PR table on LUN to make sure that only the owning node has ownership and can access that drive.

Maintenance mode is a mechanism provided through cluster.exe and the Failover Cluster API that places the specified resource in a mode that will disable health checking. After maintenance mode is enabled for a resource, Resource Monitor will ignore health check calls on the resource even though the resource is left in online mode. This will allow tools like ChkDsk to function against a resource that is in maintenance mode. Administrators should note that while ChkDsk is running the disk resource is not available to the application even though the resource is in online mode. When you put a disk in maintenance mode, this setting is an in-memory state and is not saved in the cluster registry hive. This change is not a persistent change. The next time that a disk is brought offline and then back online, the disk reverts to its standard behavior [we will see this later in article]
If there is any change to the state of the disk resource in maintenance mode, the maintenance mode setting is disabled. The maintenance mode setting is disabled when the following conditions are true:

Maintenance mode will remain on until one of the following occurs:
You turn it off. The node on which the resource is running restarts or loses communication with other nodes (which causes failover of all resources on that node). For a disk that is not in Cluster Shared Volumes, the disk resource goes offline or fails.

Ok fair enough…lots of talking..l picked one of my cluster shared volume “SR” and by default it is 0 [as seen below]. I am going to put this CSV disk resource in maintenance mode and then will run chkdsk on it. Ideally one should either save state or preferably properly shutdown all the VM ‘s who’s VHD are placed on this CSV before putting disk resource into maintenance mode. You will get a message that all dependent services and applications will be brought offline and cluster shared volume won’t be accessible from c:\clusterstaorage namespace.

C:\>cluster.exe res SR /priv
Listing private properties for 'SR':
T Resource Name Value
-- -------------------- ------------------------------ -----------------------
D SR DiskRunChkDsk 0 (0x0)

As I dint turned off my dependent VM’s before doing this action cluster service had to do it.

It also Removed access through the \ClusterStorage\volume path, however still allowing the owner node to access the volume through its identifier (GUID). This action also suspends direct IO from other nodes, allowing access only through the owner node. As I mentioned earlier “When you put a disk in maintenance mode, this setting is an in-memory state and is not saved in the cluster registry hive. This change is not a persistent change” we do not see this setting in registry and nor via cluster.exe output as 1 instead of 0 even after putting the disk in maintennance mode [though it’s very strange as then those properties should not be there or may be shoud be displayed in a different way] [see below]


Now once my CSV disk resource is in maintennance mode I need the GUID to run chkdsk on it. For a non CSV disk this GUID is not required as you will have a drive letter. I can either fetch the guid from mountvol.exe or powershell as shown below. though its more reliable and easier to take it from powershell if you have multiple CSV disk resources per node.

PS C:\Users\Administrator.UTOGWE> get-clustersharedvolume "SR" fc *
class ClusterSharedVolume
Name = SR
State = Online
OwnerNode =
class ClusterNode
Name = labw2k8hypv-1
State = Up
SharedVolumeInfo =
class ClusterSharedVolumeInfo
FaultState = 4
FriendlyVolumeName = C:\ClusterStorage\Volume1
Partition =
class ClusterDiskPartitionInfo
Name = \\?\Volume{ee372d39-9ec6-11de-b4ae-0017a4770008}
DriveLetter =
DriveLetterMask = 0
FileSystem = NTFS
FreeSpace = 618369548288
MaintenanceMode = True
RedirectedAccess = False
Id = 8e62ea00-9763-4c21-86d1-4a05708be24a

Possible values for VolumeName along with current mount points are:

Now once I have the guid with me its piece of cake as I just need to run the command for defrag or chkdsk..and here we go.

C:\>chkdsk file:///?\Volume{ee372d39-9ec6-11de-b4ae-0017a4770008}\
The specified volume name does not have a mount point or drive letter.
C:\>chkdsk /f \\?\Volume{ee372d39-9ec6-11de-b4ae-0017a4770008}
The type of the file system is NTFS.
Volume label is SR.
CHKDSK is verifying files (stage 1 of 3)...
256 file records processed.
File verification completed.
0 large file records processed.
0 bad file records processed.
0 EA records processed.
0 reparse records processed.
CHKDSK is verifying indexes (stage 2 of 3)...
324 index entries processed.
Index verification completed.
0 unindexed files scanned.
0 unindexed files recovered.
CHKDSK is verifying security descriptors (stage 3 of 3)...
256 file SDs/SIDs processed.
Security descriptor verification completed.
34 data files processed.
Windows has checked the file system and found no problems.
786428927 KB total disk space.
182458068 KB in 43 files.
32 KB in 36 indexes.
0 KB in bad sectors.
90215 KB in use by the system.
65536 KB occupied by the log file.
603880612 KB available on disk.
4096 bytes in each allocation unit.
196607231 total allocation units on disk.
150970153 allocation units available on disk.

Well this is one way and then there is another easier way to do this via powershell. Another reason for you to fall in love with powershell.

PS C:\Users\Administrator.UTOGWE> get-help repair-clustersharedvolume
Run repair tools on a Cluster Shared Volume locally on a cluster node.
Repair-ClusterSharedVolume -ChkDsk [-VolumeName] [-Parameters ] []

Repair-ClusterSharedVolume -Defrag [-VolumeName] [-Parameters ] []

This cmdlet runs chkdsk.exe or defrag.exe on a CSV volume. It will turn maintenance on for the volume, move the cluster resource to the node running this cmdlet, run the tool, and then turn maintenance off for the volume. This cmdlet has to run locally on one of the cluster nodes. To run remotely, use PowerShell Remoting.

Once chkdsk/defrag has finished you need to turn off maintennance mode and manually turn on the virtual machines dependent on the CSV disk resource.

Hope this article would have given you an insight of what are the various disk maintennence options we have with failover clustering and how we can take right decisions during architecting our high availability solution.

Gaurav Anand

1 comment:

  1. Nice informative blog about the cluster shared volumes.This blog can helps those people who have the knowledge in computer.Glad I could be of service.

    Cluster Repair in UK