ReFS vs NTFS – Introduction
Many companies use disk-based storage for backups, backup copies and replicas etc to adhere to or better still, exceed the 3-2-1 rule. There is a number of organic ways in which this can be achieved and for many, infused with the DNA of their preferred storage vendor, this is deemed preferable or more affordable when compared to dedicated deduplicating backup appliances, such as Dell EMC Data Domain, ExaGrid and HPE StoreOnce. I will not get into the weeds regarding these solutions in this blog, that is an entirely different conversation.
That being said, if we abstract the underlying hardware from the repository, for most, the repository will be formatted using either ReFS or NTFS. Granted I know we can use Linux repositories for this and the obvious CIFS & NFS backend options for *nix file systems as such, however it is widely accepted and evidenced, that local storage is a better option than a filer based share (CIFS & NFS) for a windows-based repository and there is a multitude, of information online confirming this.
Lastly, not every company has enough in house *nix skills to rely on a *nix based backup repository or indeed maintain it effectively. Offloading support and maintenance to a service provider or storage vendor can be an attractive option when it comes to deduplicating appliances, additionally MSP’s are more than capable of providing the same via BaaS (Backup as a Service) or as an extension of a MSA (Managed Service Agreement).
Calm Seas v Stormy Waters?
“ReFS has hardly been plain sailing over the past year!”
Many of you will be looking at the featured image post thinking “ReFS has hardly been plain sailing over the past year!” You would be right, internally (at Novosco) we found a solution last year that worked for us that let us provide stable and reliable backups and restores. I wanted to look at the data in the sense, once all the blockclone issues are resolved (which for most, appears to be the case) following Microsoft’s most recent patch, ReFS will at some point be stable and reliable.
My goal was to analyse the data with regards to the capacity savings that can be realised, along with some other factors. To this end, I focused on blockclone and GFS synthetic full backups to make a comparison against NTFS and GFS full backups employing deduplication when they are given the same information.
As it stands, many people are using ReFS in production right now and many more will be considering the move. I think if you are considering the move, there has never been a better time to start experimenting with ReFS. Kudos to Microsoft and Veeam for their collaborative efforts to address the issues early adopters have faced. Let’s be honest, a well designed Veeam backup platform can push any disk-based storage to its limits, Veeam is very efficient at moving large amounts of data quickly.
Let’s not forget, backups are by no means a “light” operation, we have platforms in production that can pretty much saturate uplinks to repositories relentlessly (TB’s per hour), until backup completion, this is not a light workload for the underlying file system either as a result, evidenced in some teething pains throughout 2017 and thus far in 2018.
Why are you running this test?
Part of my role within Novosco is to contribute to the technical validation of projects and solutions within our team. Looking at the data lets you plan and ensure, the solution being considered for a customer, is not only technically valid, but will be throughout its planned lifetime. Adding capacity or performance to a backup solution post deployment means having to spend more money, good planning avoids having to “move the goalposts”.
What were the conditions of the test?
I wanted to look at real data, synthetic tests are nice but as we all know, nothing beats running something in production to get accurate results. To this end, it had to be real data with real daily change rates and it had to be the same data targeting all repositories.
Having access to an ever-increasing number of highly capable platforms, a decision was made to perform a daily copy of 8 virtual servers to 8 different repositories over the course of 8 weeks or from another view, “further protect” some data, on a platform that would not feel any impact from the tests. To that end, a test plan was created.
2TB LUNs were created from a SAN and presented as drives to a server. All repositories where configured with “Per VM Files” to amongst other things, facilitate data analysis at the VM level. Repositories were created as follows:
ReFS vs NTFS - Veeam Repository Configuration
|Repository||File System||Block Size||Veeam Compression|
|ReFS Optimal 4K||ReFS||4K||Optimal|
|ReFS Optimal 64K||ReFS||64K||Optimal|
|NTFS Optimal 4K||NTFS||4K||Optimal|
|NTFS Uncompressed 4K||NTFS||4K||Uncompressed|
|NTFS Dedupe Friendly 4K||NTFS||4K||Dedupe Friendly|
|NTFS Optimal 64K||NTFS||64K||Optimal|
|NTFS Uncompressed 64K||NTFS||64K||Uncompressed|
|NTFS Dedupe Friendly 64K||NTFS||64K||Dedupe Friendly|
What was the flow of data for the test?
The data flow from hosts to test repositories was as follows:
- VMs are backed up to a capable BaaS node
- VMs are copied from the BaaS node to another location via backup copy jobs
- VMs are copied from the BaaS node to the test repositories outside of the normal backup window
What types of servers did you use for testing?
So now I had 8 repositories that I could target, I needed real-world data to populate them with. For this I wanted to take a cross sample of VMs with varied roles and changes rates that most companies would use. For this I selected the following based on their daily change rate and provisioned capacity having to bear in mind that I would need to be able to store 8 weeks’ worth of full backups, plus daily incremental backups within a 2TB footprint for up to 600GB of VMs.
As such one of each of the following types of server were selected:
- Web Application
- Domain Controller
- Exchange Hybrid
- Web Server
- Light Application
- Network Services
What exactly where you looking to compare?
I wanted to be able to make a direct comparison between the various block sizes, file systems, compression and deduplication settings, that are often used in backup copy jobs from what I have seen over the years. Having the same 8 VMs, copied to 8 different repositories every day for 8 weeks was a great way to see exactly how the different settings compared.
Granted, I knew this was not going to be 100% accurate for everyone due to the limits of finite resources, it is as such a relatively small sample size but the mixture, of roles on the test VMs covered a nice cross section of the types of servers in everyday use and as such would provide a good indication. As such I wanted a blend of structured, unstructured and binary data to report against. These servers fulfilled these requirements within the constraints of the repository space available for testing.
When can we see results?
Results are analysed in Part 2: