Overview of WAFL_check
Categories: Administration, NearStore, Data ONTAP 7G
Answer
This KB will discuss the following aspects of WAFL_check:
- What is WAFL_check?
- What is the difference between WAFL_check and wafliron?
- What should be done prior to running WAFL_check?
- How long does it take to run WAFL_check?
- How to run WAFL_check
- Can WAFL_check be stopped?
- WAFL_check Phases
- Where to find the changes made by WAFL_check
- Can WAFL_check be run on a Deduplicated (SIS) volume?
- Can WAFL_check be run on a SnapMirror/SnapVault destination?
- Can WAFL_check be run on a SnapLock aggregate/volume?
- Can WAFL_check be used on a foreign, inconsistent aggregate?
- Can WAFL_check be used to delete Snapshot copies?
- Can WAFL_check be used on a 64-bit aggregate?
- Can the inconsistent aggregate be moved to another storage system before starting WAFL_check?
Warning: As of Data ONTAP 7.3.x, wafliron optional commit is available and NetApp recommends running this option primarily, ahead of WAFL_check.
What is WAFL_check?
WAFL_check is a Data ONTAP(R) diagnostic tool used to check WAFL(R) file systems. It is normally run on WAFL inconsistent volumes or aggregates in order to correct the inconsistencies. WAFL_check should only be run under the instruction of NetApp Technical Support as it can alter file systems and may result in data loss if used incorrectly. WAFL_check should never be run on striped aggregates, including striped member aggregates. Wafliron should be used instead.
If the root aggregate/volume is marked WAFL inconsistent, the storage system will be unable to boot until the aggregate is checked or another aggregate is designated as root and a new root FlexVol volume is created on that aggregate.
An aggregate or volume can be marked inconsistent for several reasons. One of the most common causes is a parity inconsistency due to FC-AL loop instability.
What is the difference between WAFL_check and wafliron?
WAFL_check and wafliron are both diagnostic tools used to check WAFL file systems. If WAFL_check is run, the administrator can choose whether or not to commit changes. Wafliron run without optional commit will make changes as it runs and report these changes. The administrator has no choice over which changes wafliron will commit unless the optional commit option is used.
Wafliron can be run while the storage system is online and serving data from volumes/aggregates not being checked. WAFL_check, however, must be run from the Special Boot Menu and the storage appliance will not be serving data until the WAFL_check completes and the administrator chooses to commit changes.
NetApp Technical Support should always be consulted before running either wafliron or WAFL_check.
Note: WAFL_check can take a long time to run and the storage appliance will not serve ANY data during this time. It will remain unavailable from the network and only be accessible from a console connection.
What should be done prior to running WAFL_check?
Several steps need to be taken to prepare to run WAFL_check.
- Identify and resolve the cause of the file system inconsistency.
If the inconsistency was caused by FC-AL loop instability or errors, loop testing should be performed to isolate the problem. NetApp FC-AL diagnostics can be used for troubleshooting.
Note: If the cause of the inconsistency is unresolved prior to starting WAFL_check, then the WAFL_check may be unable to correct the inconsistencies properly. Additionally, since the original problem still exists, the aggregate/volume could become inconsistent again.
- Connect to a console port on the filer using a laptop or PC.If a laptop is used, ensure that the laptop is connected to AC power and that any power management/hibernation that might shutdown the laptop after a period of time is disabled. It is critical that the laptop remain on for the entire duration of the WAFL_check.
For FAS3000/FAS6000 series storage systems, the Remote LAN Management (RLM) card can be used to connect to the storage system's console. From a PC, open an SSH session to the RLM and enter
system console
. Additional details on using the RLM can be found in the Data ONTAP Systems Administration Guide.
WARNING: NetApp BUG 224882 tracks a problem in which WAFL_check may fail to prompt to commit changes if the RLM system console is detached around the time this commit message is printed. Be sure that the RLM console session is not disconnected while running WAFL_check. This bug is first fixed in Data ONTAP 7.2.4.
The storage system requires the following settings on the Terminal Emulator:
- Bits per second: 9600
- Data bits: 8
- Parity: None
- Stop bits: 1
- Flow Control: hardware
- Set up the laptop/PC to log all storage system console output to a file
How Long Does it take to run WAFL_check?
The time it takes to complete WAFL_check depends on many factors, and as such the time cannot be accurately calculated. Factors affecting the total run time include:
- Mean size of files in the file system being checked
- Number of inodes in use
- Layout of the data on a volume/aggregate
- Size of the volume/aggregate
- Number of file system inconsistencies if they exist
- Storage system's CPU speed
- Storage system's Memory
- Speed of the disk drives (i.e. 5400 RMP vs. 7200 RPM vs. 10000 RPM vs. 15000 RPM)
- Data ONTAP version
- Number of FlexVol volumes contained in the aggregate being checked
In general, WAFL_check will take several hours to several days depending on the above factors. WAFL_check runs through several phases during the file system check. In most cases, the scan of inode file normal files (phase 5.3b for a FlexVol volume or phase 3b for a traditional volume) will take the most amount of time since this comprises the bulk of the data to be checked.
In Data ONTAP 7.2.3 and later, WAFL_check will include time estimations during FlexVol voume check phases 5.3b (scanning volume inodes) and 5.4 (checking volume directories). For example:
Selection (1-5)? WAFL_check aggr1
...
Checking volume flexvol1
...
Phase [5.3b]: Scan inode file normal files.
(inodes 3.56% done) 2 hrs 15 min estimated time remaining
(inodes 5.84% done) 2 hrs 41 min estimated time remaining
(inodes 8.13% done) 2 hrs 49 min estimated time remaining
...
Phase [5.4]: Scan directories.
(dirs 5.00% done) 0 hrs 50 min estimated time remaining
(dirs 10.00% done) 0 hrs 49 min estimated time remaining
(dirs 15.00% done) 0 hrs 58 min estimated time remaining
...
How to Run WAFL_check
WAFL_check is run from the Special Boot Menu. As such, console access to the storage appliance is required to run WAFL_check. Console output should be logged to a file when running WAFL_check so that the output can be reviewed by NetApp Technical Support.
To access the Special Boot Menu, press Ctrl+C when prompted during boot.
Note: It is important to use the same release of Data ONTAP that the storage system is running unless otherwise instructed by NetApp Technical Support.
Note: For storage systems that have floppy disk drives, a set of Data ONTAP boot floppies are required. Boot the storage system using the OS boot floppy diskettes.
The storage system will boot to the following Special Boot Menu:
(1) Normal boot.
(2) Boot without /etc/rc.
(3) Change password.
(4) Initialize all disks.
(5) Maintenance mode boot.
Selection (1-5)?
The WAFL_check option is a hidden option from this menu. Do NOT select one of the 1 - 5 options. Instead, type WAFL_check
. This will start the file system check. It will ask for confirmation before checking each aggregate/volume. Check all aggregates/volumes unless instructed otherwise.
Note: To check a specific traditional volume or aggregate, use WAFL_check volumename/aggregate_name
.
Note: When checking an aggregate, all associated FlexVol volumes will also be checked. It is not possible to check a single FlexVol volume within an aggregate.
After starting the WAFL_check, the storage administrator should watch the console for the first 20 - 30 minutes to ensure WAFL_check is progressing and not logging excessive errors. If excessive errors are seen, NetApp Technical Support should be contacted immediately.
If 'WAFL_check' finds problems on a volume, it will ask for confirmation before committing changes after checking each volume.
WARNING: NetApp Technical Support should be consulted before committing any changes found by WAFL_check. Failure to do so may result in data loss.
In order to commit WAFL_check changes, you must enter "y" for yes.
After WAFL_check is finished and the changes are committed or rejected, the storage system will prompt for reboot. If this is a floppy boot storage system, make sure no floppies are in the floppy drive. Then press any key to reboot.
Can WAFL_check be stopped?
WAFL_check can only be stopped by power-cycling the storage system. Since WAFL_check does not make any changes until the storage administrator chooses to commit changes at the end of the check, it is safe to stop WAFL_check. However, any checks that were done to the point that it was stopped will be lost. Therefore, it is best to let WAFL_check run to completion unless otherwise advised by NetApp Technical Support.
WAFL_check Phases
When running WAFL_check on a traditional volume, it will check the volume in several phases. When running WAFL_check on an aggregate, it will check both the aggregate and the contained FlexVol volumes.
Note: WAFL_check is a diagnostic tool, and its usage and output is subject to change.
The different phases are summarized in the following table:
Phase | Traditional volume | Aggregate | FlexVol volume |
1 | Verify fsinfo blocks | Verify fsinfo blocks | Verify fsinfo blocks |
2 | Verify metadata indirect blocks | Verify metadata indirect blocks | Verify metadata indirect blocks |
3 | Scan inode file | Scan inode file | Scan inode file |
3a | Checks WAFL special metadata files | Checks WAFL special metadata files | Checks WAFL special metadata files |
3b | Checks normal (user data) files | Checks normal (user data) files | Checks normal (user data) files |
3c | Checks files that had been marked for deletion | Checks files that had been marked for deletion | Checks files that had been marked for deletion |
4 | Scan directories | Scan directories | Scan directories |
5 | N/A | Scans FlexVol volumes | Scans FlexVol volumes |
Checks volume inodes and verifies Aggregate's access to the FlexVol volumes | |||
Verifies contents within the FlexVol volumes | |||
6 | Clean up | Clean up | Clean up |
6a | Finds lost streams (for example, CIFS metadata/ACLs) | Finds lost streams (for example, CIFS metadata/ACLs) | Finds lost streams (for example, CIFS metadata/ACLs) |
6b | Finds lost files and moves them to lost+found | Finds lost files and moves them to lost+found | Finds lost files and moves them to lost+found |
6c | Finds lost blocks and moves them to lost+found | Finds lost blocks and moves them to lost+found | Finds lost blocks and moves them to lost+found |
6d | Checks blocks used | Checks blocks used | Checks blocks used |
The following are examples of the output seen on the console when running WAFL_check.
WAFL_check on a traditional volume:
Selection (1-5)? WAFL_check vol1
Checking vol1...
WAFL_check NetApp Release 7.2.3
Starting at Tue Oct 23 20:30:06 GMT 2007
Phase 1: Verify fsinfo blocks.
Phase 2: Verify metadata indirect blocks.
Phase 3: Scan inode file.
Phase 3a: Scan inode file special files.
Phase 3a time in seconds: 0
Phase 3b: Scan inode file normal files.
Phase 3b time in seconds: 2
Phase 3 time in seconds: 2
Phase 4: Scan directories.
Phase 4 time in seconds: 2
Phase 6: Clean up.
Phase 6a: Find lost nt streams.
Phase 6a time in seconds: 0
Phase 6b: Find lost files.
Phase 6b time in seconds: 7
Phase 6c: Find lost blocks.
Phase 6c time in seconds: 0
Phase 6d: Check blocks used.
Phase 6d time in seconds: 0
Phase 6 time in seconds: 7
Clearing inconsistency flag on volume vol1.
WAFL_check total time in seconds: 11
Commit changes for volume vol1 to disk? y
Inconsistent vol vol1 marked clean.
WAFL_check output will be saved to file /vol/vol1/etc/crash/WAFL_check
Press any key to reboot system.
WAFL_check on an aggregate:
Selection (1-5)? WAFL_check aggr0
Checking aggr0...
WAFL_check NetApp Release 7.2.3
Starting at Tue Oct 23 18:52:17 GMT 2007
Phase 1: Verify fsinfo blocks.
Phase 2: Verify metadata indirect blocks.
Phase 3: Scan inode file.
Phase 3a: Scan inode file special files.
Phase 3a time in seconds: 1
Phase 3b: Scan inode file normal files.
(inodes 99.74% done)
(inodes 100.00% done)
Phase 3b time in seconds: 1762
Phase 3 time in seconds: 1763
Phase 4: Scan directories.
Phase 4 time in seconds: 0
Phase 5: Check volumes.
Phase 5a: Check volume inodes
Phase 5a time in seconds: 0
Phase 5b: Check volume contents
Checking volume flexvol1...
Phase [5.1]: Verify fsinfo blocks.
Phase [5.2]: Verify metadata indirect blocks.
Phase [5.3]: Scan inode file.
Phase [5.3a]: Scan inode file special files.
Phase [5.3a] time in seconds: 27
Phase [5.3b]: Scan inode file normal files.
(inodes 100.00% done) 0 hrs 0 min estimated time remaining
Phase [5.3b] time in seconds: 3964
Phase [5.3] time in seconds: 3992
Phase [5.4]: Scan directories.
Phase [5.4] time in seconds: 6
Phase [5.6]: Clean up.
Phase [5.6a]: Find lost nt streams.
Phase [5.6a] time in seconds: 0
Phase [5.6b]: Find lost files.
Phase [5.6b] time in seconds: 16
Phase [5.6c]: Find lost blocks.
Phase [5.6c] time in seconds: 0
Phase [5.6d]: Check blocks used.
Tue Oct 23 20:29:53 GMT
Tue Oct 23 20:29:53
Phase [5.6d] time in seconds: 19
Phase [5.6] time in seconds: 35
Volume flexvol1 WAFL_check time in seconds: 4033
(No filesystem state changed.)
Phase 5b time in seconds: 4098
Phase 6: Clean up.
Phase 6a: Find lost nt streams.
Phase 6a time in seconds: 0
Phase 6b: Find lost files.
Phase 6b time in seconds: 5
Phase 6c: Find lost blocks.
Phase 6c time in seconds: 0
Phase 6d: Check blocks used.
Phase 6d time in seconds: 1
Phase 6 time in seconds: 6
Clearing inconsistency flag on aggregate aggr0.
WAFL_check total time in seconds: 5867
Commit changes for aggregate aggr0 to disk? yes
Inconsistent aggr aggr0 marked clean.
WAFL_check output will be saved to file /etc/crash/aggregates/aggr0/WAFL_check on the root volume
Where to find the changes made by WAFL_check
The results of a WAFL_check are stored in /etc/crash/WAFL_check on the storage system's root volume (pre-Data ONTAP 7G) or in a /etc/crash folder within each volume to which changes were made (Data ONTAP 7.0 and later). After an AutoSupport message is generated due to the reboot following WAFL_check, these files are rotated to WAFL_check.0, WAFL_check.1, etc.
Can WAFL_check be run on a SnapMirror/SnapVault destination?
It is possible to run WAFL_check on a SnapMirror(R) or SnapVault(R) destination, but this will break the SnapMirror/SnapVault relationships if changes are needed.
Depending on the changes made by WAFL_check, it may be possible to resync the SnapMirror/SnapVault relationships following the completion of the WAFL_check. However, resync is not guaranteed to succeed. In some cases, the relationships may need to be reinitialized.
Note: After WAFL_check is run on a destination volume for Volume SnapMirror, a "block type initialization" scan will automatically start on the traditional/FlexVol volume that was checked. Until this scanner completes, Volume SnapMirror relationships cannot be resynchronized, updated, or initialized. This limitation is tracked asNetApp Bug 142586. Please review the Bugs Online report to determine the versions of Data ONTAP containing the fix.
The 'block type initialization' scan may take several days to complete depending on the size of the FlexVol volume and the load on the storage system. To check the status of the command, use the wafl scan status
command in priv set advanced mode:
storage1> priv set advanced
storage1*> wafl scan status
Volume sm_dest:
Scan id Type of scan progress
1 block type initialization snap 0, inode 58059 of 30454809
Can WAFL_check be run on a SnapLock aggregate/volume?
WAFL_check can be run on both SnapLock(R) Compliant and SnapLock Enterprise volumes and aggregates. However, SnapLock Compliant volumes have some restrictions that may prevent wafliron from functioning properly. NetApp Technical Support should always be consulted before starting wafliron on a SnapLock aggregate/volume.
Can WAFL_check be used on a foreign, inconsistent aggregate?
An aggregate or traditional volume will be marked foreign and be offlined if it is moved to a storage system other than the one that created it. This could occur with an inconsistent aggregate if it is moved to another storage system in order to run WAFL_check while minimizing the downtime on the original storage system. The aggregate may or may not be in a degraded state.
WARNING: Do NOT attempt to online the aggregate. If the aggregate is attempted to be brought online from Maintenance Mode, the following error will be generated:
Volume (aggrname) is inconsistent and has a degraded raidgroup with dirty parity. This volume can not be brought online prior to doing the recommended steps for recovery, as it raises the risk of further system panic. If this is a replica volume, the recommended steps for recovery are to run WAFL_check at source and then execute "snapmirror initialize" on this volume, otherwise run WAFL_check on the volume.
In order to run WAFL_check on a foreign, inconsistent aggregate, the aggregate must first be restricted in order to allow the system to mark the aggregate as a native aggregate. To do this:
1. Boot the filer to Maintenance Mode.
2. Run aggr restrict
.
3. Exit Maintenance Mode
4. Reboot the filer to the Special Boot Menu
5. Start WAFL_check on the aggregate
The aggr restrict
command can also be used on traditional volumes.
If the above procedure is not used to online the aggregate, the following error will be generated:
WAFL_check: volume/aggregate (aggrname) is foreign and cannot be checked.
Can WAFL_check be used to delete Snapshot copies?
WAFL_check can delete SnapshotTM copies using the -snapshots
flag. It should only be used under the direction of NetApp Technical Support.
Note: Once the Snapshot copies are chosen for deletion, WAFL_check will automatically start on the aggregate and associated FlexVol volumes.
To do this, boot to the Special Boot Menu and enter WAFL_check -snapshots
. WAFL_check will then prompt whether each Snapshot copy on the aggregate and associated FlexVol volumes should be deleted.
Given below is an example output:
(1) Normal boot.
(2) Boot without /etc/rc.
(3) Change password.
(4) Initialize owned disks (28 disks are owned by this filer).
(4a) Same as option 4, but create a flexible root volume.
(5) Maintenance mode boot.
Selection (1-5)? WAFL_check -snapshots aggr11
Checking aggr1...
WAFL_check NetApp Release 7.2.2
Starting at Tue Jun 5 23:22:40 GMT 2007
Snapshot 19 of aggregate aggr1 with mod time Mon Jun 4 23:00:02 GMT 2007
Delete? yes
Deleting
Snapshot 20 of aggregate aggr1 with mod time Tue Jun 5 04:00:02 GMT 2007
Delete? no
Phase 1: Verify fsinfo blocks.
Phase 2: Verify metadata indirect blocks.
Phase 3: Scan inode file.
Phase 3a: Scan inode file special files.
Phase 3a time in seconds: 0
Phase 3b: Scan inode file normal files.
Phase 3b time in seconds: 0
Phase 3 time in seconds: 1
Phase 4: Scan directories.
Snapdir: directory references unused snapshot: hourly.3. Unlinking.
Inode 67, type 2: Setting link count to 6 (was 7).
Phase 4 time in seconds: 0
Phase 5: Check volumes.
Phase 5a: Check volume inodes
Phase 5a time in seconds: 0
Phase 5b: Check volume contents
Snapshot 18 of volume flexvol1 in aggregate aggr1 with mod time Sat Jun 2 04:00:01 GMT 2007
Delete? yes
Deleting
Snapshot 22 of volume flexvol1 in aggregate aggr1 with mod time Mon Jun 4 16:00:02 GMT 2007
Delete? no
Checking volume flexvol1...
Phase [5.1]: Verify fsinfo blocks.
Phase [5.2]: Verify metadata indirect blocks.
Phase [5.3]: Scan inode file.
Phase [5.3a]: Scan inode file special files.
Phase [5.3a] time in seconds: 0
Phase [5.3b]: Scan inode file normal files.
Phase [5.3b] time in seconds: 1
Phase [5.3] time in seconds: 1
Phase [5.4]: Scan directories.
Snapdir: directory references unused snapshot: nightly.1. Unlinking.
Inode 67, type 2: Setting link count to 9 (was 10).
Phase [5.4] time in seconds: 0
Phase [5.6]: Clean up.
Phase [5.6a]: Find lost nt streams.
Phase [5.6a] time in seconds: 0
Phase [5.6b]: Find lost files.
Phase [5.6b] time in seconds: 0
Phase [5.6c]: Find lost blocks.
Phase [5.6c] time in seconds: 0
Phase [5.6d]: Check blocks used.
Phase [5.6d] time in seconds: 0
Phase [5.6] time in seconds: 0
Volume flexvol1 WAFL_check time in seconds: 2
Directory link counts fixed: 1
Invalid snapshot directory entries cleared: 1
WAFL_check output will be saved to file /vol/flexvol1/etc/crash/WAFL_check
Phase 5b time in seconds: 9
Phase 6: Clean up.
Phase 6a: Find lost nt streams.
Phase 6a time in seconds: 0
Phase 6b: Find lost files.
Phase 6b time in seconds: 5
Phase 6c: Find lost blocks.
Phase 6c time in seconds: 0
Phase 6d: Check blocks used.
Phase 6d time in seconds: 1
Phase 6 time in seconds: 6
WAFL_check total time in seconds: 17
Directory link counts fixed: 1
Invalid snapshot directory entries cleared: 1
Commit changes for aggregate aggr1 to disk? yes
WAFL_check output will be saved to file /etc/crash/aggregates/aggr1/WAFL_check on the root volume
Press any key to reboot system.
Can WAFL_check be used on a 64-bit aggregate?
Data ONTAP 8.0 7-Mode includes a new type of aggregate called a 64-bit aggregate. WAFL_check cannot be used to perform file system checks on 64-bit aggregates. Please contact NetApp Support for assistance.
Can the inconsistent aggregate be moved to another storage system before starting the WAFL_check?
It is possible to move the inconsistent aggregate to another storage system in order to run WAFL_check. This action is usually taken if the original storage system contains multiple aggregates and only one non-root aggregate is inconsistent. In order to prevent downtime on the other aggregates, the inconsistent aggregate can be moved to another storage system.
Before moving the inconsistent aggregate to a new storage system, the following conditions must be met on the new storage system:
- It is running the same release of Data ONTAP.
- It can accept the additional storage of the inconsistent aggregate without reaching maximum capacity. To determine the maximum capacity, refer to theNetApp System Configuration Guide.
- The disks, shelves, and shelf modules are supported on the new storage system. To verify compatibility, refer to the NetApp System Configuration Guide.
- Downtime on the new storage system is acceptable.