WAFL File Folding Explained
While recently delivering training, one of my partners inquired about a WAFL feature called “file folding”, so I thought I’d take a moment to detail this lesser-known NetApp feature.
File Folding is a Data ONTAP feature that saves space when a user re-writes a file with the same data.
More specifically, it checks the data in the most recent Snapshot copy. If this data is identical to the Snapshot copy currently being created, it references the previous Snapshot copy -- instead of taking up disk space writing the same data in a new Snapshot copy.
Although File Folding saves disk space, it can also impact system performance, as the system must compare block contents when folding a file. If the folding process reaches a maximum limit on memory usage, it is suspended. When memory usage falls below the limit, the processes that were halted are restarted.
This feature has long been part of Data ONTAP and currently exists within 7-Mode (not Cluster-Mode) of Data ONTAP 8.1. It can be enabled with the command:
options cifs.snapshot_file_folding.enable on
But what exactly is going on behind the scenes?
There are two stages to this process: 1.) determining which files are candidates for folding and 2.) actually folding the files into the Snapshot copy. Let’s walk through both of these stages in detail:
Once WAFL has received a message for a file that may be a candidate for folding, it retrieves the inode for the candidate file via pathname (directory) lookups, parsing the directory, and mapping it to a root inode.
In other words, we’re now able to construct a file handle in order to retrieve the block (inode) from disk.
WAFL then grabs the corresponding inode of that file from the most recent snapshot (remember, when snapshots are created, a new root inode is created for that snapshot). If no corresponding file to inode match is found, then the file cannot be in the most recent snapshot and thus no folding will occur.
But let’s assume that WAFL does find a snapshot inode corresponding to the active file.
As the blocks are loaded into the buffer, the file folding process compares the blocks of the active file system vs. the snapshot blocks – on a block-for-block basis. Assuming that each block is identical, it’s time to transition to the second stage.
In the second stage of File Folding, a data block is “freed” by updating the corresponding bit of an active map of the active file system (indicating block is unallocated). Next, the block in the snapshot is allocated by updating a bit within the active map corresponding to the snapshot block (indicating it’s now being used). WAFL then updates the block pointer of the parent buffer to reference the block number of the block in the snapshot.
From this point onward, the parent buffer is tagged as “dirty” to ensure everything else is written out to disk on the next Consistency Point (CP).
And now you know probably more than you’ve ever wanted to know about File Folding!