Blog - Craig Rodgers

July 12, 2022August 22, 2022

Some Thoughts Following Cloud Field Day 14

Cloud Field Day 14 had some great presentations around controlling cloud storage, data, and networks. I made some predictions on what was going to be presented in my previous blog and for the most part they were all touched upon, granted some more than others. The standard was high, and I would encourage people to check out the presentations. Before I discuss some of my key takeaways from the event below, I would like to offer a huge thanks to Stephen Foskett, his team, fellow delegates along with the presenters, for taking the time to not only present, but answer delegate questions. Its always a fun event and that takes effort!

Has storage recently become software?

No, the concept of Software Defined Storage (SDS) has been around for over a decade now. We have been able to provision and define storage for quite some time, using software policies and commands to abstract what’s happening at the hardware layer. Historically this was as simple as defining a RAID group using multiple drives to present a LUN. Complexity and performance grew when we added tiering slow drives with faster ones leading to the obvious additional tier of solid state.

These tiers were provisioned independently or combined into a pool with vendors building in some smarts around data placement depending on activity. Storage vendors differentiated themselves on how well they were able to perform this function. Variance was decided on performance, scalability, features, and price. Storage vendors made relationships with hardware vendors to provide their own individual edge using faster CPUs, more RAM, better connectivity, and reliability. The value add was the appliance package and the software and interface to control the hardware within.

So, What if Anything Has Changed?

Nothing, apart from the whole cloud adoption thing. You cannot buy an array from your preferred storage vendor and ship it to a hyper scaler, even if you could, why would you want to? It goes against the cloud operating model from a financial perspective aside from the logistical impossibilities. This created a divide between how you did storage on prem vs the public cloud. Change has recently come, in how historical storage companies have pivoted to become cloud companies. Extending their stack and feature sets into the public cloud. Hybrid multi-cloud is now a possibility potentially working with the same vendors you use for your on prem stack, albeit in the public clouds.

This is a big deal.

Why is this a big deal?

Enterprises move slow and are either carrying significant technical debt and/or engineering familiarity with their preferred vendor. Cloud native approaches have brought about new ways of doing things, many enterprises invested in the people and technologies required to do this effectively, but not all are as far along. Trusted vendors offering customers, help with the complexities around managing a hybrid cloud will be very attractive options. For example, Amazon FSx for NetApp ONTAP lets you use the technologies you have been using for years to manage data better in AWS, and soon Azure and Google.

Furthermore, NetApp are giving customers Cloud Manager to help alleviate and consolidate issues with managing aspects of the hybrid multi-cloud. One thing evident is the fact that companies want alternatives to native cloud storage.

What About Cloud Native Options?

Well, we saw a few good options there too, Weka and Lightbits presented some amazing options to scale performance massively in multiple clouds using relatively hardware agnostic options. This is a similar shift from traditional storage vendors moving away from selling hardware and software to realise business outcomes. Whilst all storage companies have the ability to provide both on prem and cloud based storage they are coming at it from different angles. If you want a polished product to present file or object based storage, Weka offers a cloud file system offering NFS, SMB and S3 compatible. Lightbits have taken a different approach offering block NVMe over TCP delivering more of a storage engine to do block very well at scale within the public clouds or on prem. Both companies leverage hardware provided by a 3^rd party and play well with various Intel accelerators.

What about Cloud Management?

We saw great presentations from Zerto, Morpheus, Alkira and Komprise all offering different ways of effectively managing clouds. Abstracting multiple clouds was a recurring theme however they all had their own specific areas of focus and can add value to many enterprises. Abstraction is a common theme for companies right now increasing the control and visibility of elements under a given solutions scope. As the ever-increasing number of layers of abstraction continues to grow, aggregation tools add value centralising operations in developer friendly ways leveraging the plethora of API’s available often trying to wrap them up under one.

What’s the takeaway?

Vendors are providing ever increasing ways to control your hybrid cloud infrastructures. Alkira offered something I have not seen before covering the network layer, something I imagine we will see more of in future. Visibility and control are the drivers leveraging existing relationships with legacy storage providers and/or public clouds but the overall theme was heavily centralised on getting multiple clouds working better together, no bad thing.

As well as reviewing social media using the #CFD14 hashtag you can watch Cloud Field Day 14 content here:

https://techfieldday.com/event/cfd14/

-Craig Rodgers

June 20, 2022June 20, 2022

An Introduction to Cloud Field Day 14 #CFD14

I have arrived in Silicon Valley again for another Cloud Field Day event and I’m looking forward to several presentations, as well as getting to actually physically meet people again. In this post, I share some thoughts on the upcoming presenters and to make a few educated guesses as to what they will be presenting, simply for the fun of it.

What is Cloud Field Day?

In case you are not aware or are new to Tech Field Day (TFD) events, Cloud Field Day (CFD) is an event where companies present their latest and greatest, via live streams, on the Tech Field Day website and LinkedIn. Streams are also recorded and added to the Tech Field Day YouTube channel for later viewing.

Isn’t that just a webinar?

Nope, these events are interactive as the companies presenting are doing so to a number of industry experts who ask the questions you are thinking about when watching, allowing greater insight and benefits to watching the streams. I am one of a number of delegates who will be attending both in person and virtually to comment on or challenge companies on their content, to help viewers make decisions on vendor offerings and solutions. Delegates are not afraid to ask difficult questions, one of my personal favourite aspects of the events and why I watched so many prior to being involved.

Great, who is presenting at CFD14?

The following companies are presenting, in chronological order.

Zerto

Zerto, a Hewlett Packard Enterprise company – Zerto provides a continuous software experience for DR, backup and data mobility.

Zerto will be presenting their latest updates around ransomware recovery, AWS, Zerto for Kubernetes and more. Zerto have been in the replication and ransomware protection game for a while now and even offer some backup services. I expect this presentation to focus on new features in Zerto 9.5 such as additional RBAC and MFA along with Zerto Kubernetes Manager and other new features for security and ransomware protection.

NetApp

NetApp are an enterprise storage leader with a current focus on data and the cloud.

Whilst I do not know what NetApp are going to present, if I had to guess, I would bet, it will be around their recent certification with VMware to offer Storage as a Service (STaaS) as a supplemental NFS storage repository to VMware cloud services running on AWS, Azure and Google. NetApp are the first storage vendor to be approved for this type of offering, a logical progression given their on-premise enterprise storage reach and the adoption of hybrid cloud.

Morpheus Data

Morpheus Data – Modernising apps and managing hybrid clouds for enterprise

I looked into Morpheus Data last year whilst architecting a multi-tenant cloud platform, and they have an interesting backstory that led to where they are today. Essentially, they exist from the need to control and abstract silos of resources to facilitate a hybrid cloud management layer, integrating with most virtualisation and cloud stacks. Will be interesting to see what they present at #CFD14 and if I had to guess here, I would assume something around VMware automation given their recent announcements, let’s see.

Weka

Weka – Next generation data platform for challenging AI and high-performance workloads

Weka aim to be a cloud filesystem providing tiered file services with variable block dedupe, compression and most recently, joined the ranks of the few that offer fingerprinting to reduce data capacity requirements through data similarity. I’m fairly sure this presentation will be aimed at their recent updates in V4 to move from being AWS native to supporting multiple clouds in AWS, Azure, Google and Oracle.

Alkira

Alkira – Reinvented networking for the cloud era connecting and integrating network services across clouds

Alkira will be presenting a solution to help demystify the complexities involved in effectively managing a multi cloud network. Cloud Area Networking is often difficult to manage and even harder when factoring in DR, so it will be interesting to see how challenges can be alleviated and addressed. I would guess an abstraction layer for automation in these regards will be presented.

Lightbits Labs

Lightbits Labs – scale out disaggregated storage platform that performs like local flash for the cloud native DC

Lightbits Labs have recently won a BIG Innovation Award 2022 for the industry’s first solution to provide NVMe over TCP and have built a close relationship with Intel to leverage various Intel workload accelerators, to reduce latencies and increase performance at every step. I expect to see a massively scalable solution here for a Cloud Data Platform, should be interesting given some of their recent announcements and something that would slot nicely into the hyper scalers given the advertised scaling capabilities.

Komprise

Komprise – Intelligent data mobility solution for unstructured data providing visibility, mobility and value

Komprise have focused on providing intelligent data tiering to minimise cloud costs such as egress charges. Keeping hot data and constantly pulling it out of a cloud costs a lot of money. Komprise will be showing Smart Data Workflows to identify where data should reside. Insight to unstructured data is important long term and leans into FinOps to govern Cloud Spending. I expect to see benefits around another layer for ransomware protection given recent announcements, the more layers protecting, the better I feel.

How do I watch?

You can watch #CFD14 from the main Tech Field Day event page and confirm the presentation schedules here:

https://techfieldday.com/event/cfd14/

You can also follow Twitter and LinkedIn using #CFD14 for live streams. Enjoy!

May 5, 2019May 6, 2019

Why I am attending Veeam On

Why are you attending Veeam On in Miami?

Blatantly obvious reasons aside, many people ask me why I like Veeam. My answer is always some permutation of the following:

Veeam keeps me safe in presales, sales, engineering and architecture.

Personally, I am responsible for data and I have pondered the question, if I owned a company, would that cadence continue?

The answer is unequivocally, yes

Knowing quite well how to architect and deploy Veeam, tightly integrated with whatever platform or solution I am involved in, I literally sleep better knowing I can restore any data that I have backed up, in a plethora of ways. To this end, put simply, I love Veeam products.

On top of this, having been involved within the vCommunity, the excellent Veeam User Group in the UK (VUGUK) and now worldwide via the Veeam Vanguard program, I have been fortunate enough to meet and interact with some of the most remarkable people online, who have always willingly and openly shared their knowledge and experiences. This contributes even further to expertise, to deliver platform availability. Events like these give you access to a collective pool of worldwide knowledge, people who are passionate about the products and that passion, when coupled with a hungry mind, can yield mutually beneficial results for not only you, your company as a whole and ultimately, the end user or customer.

Why should anyone attend Veeam On?

To ask questions, ask multiple people, ask anyone! You will get an incredibly balanced set of views, opinions, best practices and real-world experiences, from people who are doing it, day in, day out, that information is pure gold. Having seen some information on the presentations via the breakout sessions, I know I will certainly be learning and networking, then taking home valuable information and contacts with me, that will benefit not only myself, but Novosco as a whole.

The icing on the cake for me, will be having personal access to industry experts on a face to face level.

When the hotel and flights to Miami cost less than the list price of taking the VMCE course in the UK, it’s a no brainer.

Are you taking any training?

Yes! Having passed numerous exams ranging from the trusty VMware VCP to Cisco CCNP level exams, I have been seeking VMCE for a while. Now is my chance to take the course and enjoy what I hope will be blissful surroundings in the evenings, taking a stroll down no less than Miami Beach! On top of that, I know I will meet amazing people to network with and get insight from their experiences, which in turn, will help me deliver results that will further protect not only me and my company, but ultimately the customer. What’s not to like?

So, what is Happening at Veeam On 2019?

The chance to learn and network with passionate people and also join or partake, within a great community

Massively Discounted Veeam Training for the new content:

Veeam Certified Engineer (VMCE)

VMCE-Advanced Design & Optimization (ADO)

Veeam Orchestration: Disaster Recovery Design & Validation (Beta)

https://www.veeam.com/veeamon/vmce

61 Breakout Sessions!

https://www.veeam.com/veeamon/breakout-sessions

Speakers at the Event:

https://www.veeam.com/veeamon/speakers

Keynote Speakers

Guest Speakers

Veeam Technical Experts

And last but by no means least, the Veeam Party with Flo Rida!

Promotional Video?

Really looking forward to attending this event and if you see me, be sure to say hello!

Here is the event location:

-Craig Rodgers

April 2, 2018February 24, 2019

ReFS vs NTFS – Introduction (Part 1)

ReFS vs NTFS – Introduction

Many companies use disk-based storage for backups, backup copies and replicas etc to adhere to or better still, exceed the 3-2-1 rule. There is a number of organic ways in which this can be achieved and for many, infused with the DNA of their preferred storage vendor, this is deemed preferable or more affordable when compared to dedicated deduplicating backup appliances, such as Dell EMC Data Domain, ExaGrid and HPE StoreOnce. I will not get into the weeds regarding these solutions in this blog, that is an entirely different conversation.

That being said, if we abstract the underlying hardware from the repository, for most, the repository will be formatted using either ReFS or NTFS. Granted I know we can use Linux repositories for this and the obvious CIFS & NFS backend options for *nix file systems as such, however it is widely accepted and evidenced, that local storage is a better option than a filer based share (CIFS & NFS) for a windows-based repository and there is a multitude, of information online confirming this.

Lastly, not every company has enough in house *nix skills to rely on a *nix based backup repository or indeed maintain it effectively. Offloading support and maintenance to a service provider or storage vendor can be an attractive option when it comes to deduplicating appliances, additionally MSP’s are more than capable of providing the same via BaaS (Backup as a Service) or as an extension of a MSA (Managed Service Agreement).

Calm Seas v Stormy Waters?

“ReFS has hardly been plain sailing over the past year!”

Many of you will be looking at the featured image post thinking “ReFS has hardly been plain sailing over the past year!” You would be right, internally (at Novosco) we found a solution last year that worked for us that let us provide stable and reliable backups and restores. I wanted to look at the data in the sense, once all the blockclone issues are resolved (which for most, appears to be the case) following Microsoft’s most recent patch, ReFS will at some point be stable and reliable.

My goal was to analyse the data with regards to the capacity savings that can be realised, along with some other factors. To this end, I focused on blockclone and GFS synthetic full backups to make a comparison against NTFS and GFS full backups employing deduplication when they are given the same information.

As it stands, many people are using ReFS in production right now and many more will be considering the move. I think if you are considering the move, there has never been a better time to start experimenting with ReFS. Kudos to Microsoft and Veeam for their collaborative efforts to address the issues early adopters have faced. Let’s be honest, a well designed Veeam backup platform can push any disk-based storage to its limits, Veeam is very efficient at moving large amounts of data quickly.

Let’s not forget, backups are by no means a “light” operation, we have platforms in production that can pretty much saturate uplinks to repositories relentlessly (TB’s per hour), until backup completion, this is not a light workload for the underlying file system either as a result, evidenced in some teething pains throughout 2017 and thus far in 2018.

Why are you running this test?

Part of my role within Novosco is to contribute to the technical validation of projects and solutions within our team. Looking at the data lets you plan and ensure, the solution being considered for a customer, is not only technically valid, but will be throughout its planned lifetime. Adding capacity or performance to a backup solution post deployment means having to spend more money, good planning avoids having to “move the goalposts”.

What were the conditions of the test?

I wanted to look at real data, synthetic tests are nice but as we all know, nothing beats running something in production to get accurate results. To this end, it had to be real data with real daily change rates and it had to be the same data targeting all repositories.

Having access to an ever-increasing number of highly capable platforms, a decision was made to perform a daily copy of 8 virtual servers to 8 different repositories over the course of 8 weeks or from another view, “further protect” some data, on a platform that would not feel any impact from the tests. To that end, a test plan was created.

2TB LUNs were created from a SAN and presented as drives to a server. All repositories where configured with “Per VM Files” to amongst other things, facilitate data analysis at the VM level. Repositories were created as follows:

ReFS vs NTFS - Veeam Repository Configuration

Repository	File System	Block Size	Veeam Compression
ReFS Optimal 4K	ReFS	4K	Optimal
ReFS Optimal 64K	ReFS	64K	Optimal
NTFS Optimal 4K	NTFS	4K	Optimal
NTFS Uncompressed 4K	NTFS	4K	Uncompressed
NTFS Dedupe Friendly 4K	NTFS	4K	Dedupe Friendly
NTFS Optimal 64K	NTFS	64K	Optimal
NTFS Uncompressed 64K	NTFS	64K	Uncompressed
NTFS Dedupe Friendly 64K	NTFS	64K	Dedupe Friendly

What was the flow of data for the test?

The data flow from hosts to test repositories was as follows:

VMs are backed up to a capable BaaS node
VMs are copied from the BaaS node to another location via backup copy jobs
VMs are copied from the BaaS node to the test repositories outside of the normal backup window

What types of servers did you use for testing?

So now I had 8 repositories that I could target, I needed real-world data to populate them with. For this I wanted to take a cross sample of VMs with varied roles and changes rates that most companies would use. For this I selected the following based on their daily change rate and provisioned capacity having to bear in mind that I would need to be able to store 8 weeks’ worth of full backups, plus daily incremental backups within a 2TB footprint for up to 600GB of VMs.

As such one of each of the following types of server were selected:

Application
Web Application
Database
Domain Controller
Exchange Hybrid
Web Server
Light Application
Network Services

What exactly where you looking to compare?

I wanted to be able to make a direct comparison between the various block sizes, file systems, compression and deduplication settings, that are often used in backup copy jobs from what I have seen over the years. Having the same 8 VMs, copied to 8 different repositories every day for 8 weeks was a great way to see exactly how the different settings compared.

Granted, I knew this was not going to be 100% accurate for everyone due to the limits of finite resources, it is as such a relatively small sample size but the mixture, of roles on the test VMs covered a nice cross section of the types of servers in everyday use and as such would provide a good indication. As such I wanted a blend of structured, unstructured and binary data to report against. These servers fulfilled these requirements within the constraints of the repository space available for testing.

When can we see results?

Results are analysed in Part 2:

ReFS vs NTFS – Initial Analysis (Part 2)

-Craig Rodgers

April 2, 2018June 16, 2024

ReFS vs NTFS – Initial Analysis (Part 2)

This is Part 2 of a series, if you have not read Part 1 you can do so here:

ReFS vs NTFS – Introduction (Part 1)

How was Veeam configured?

Lastly, I required a configuration for the backup copy jobs, to test the file systems as best I could. To this end I decided to create backup copy jobs that targeted the same 8 servers, with 7 incremental and 8 weekly backup copies configured via GFS. The final jobs looked something like this:

I want to see results!

Okay, following on from my previous post I sought to share information that was garnered from early results. I wanted to see how the initial ingestion of data looked, to make a comparison between the different storage techniques. With that in mind, before any post processing was applied, I looked at the file systems and compared the sizes of the per VM .VBK files. (these are full backups in Veeam).

Interestingly, 64K ReFS formatted drives have an additional file system overhead once formatted, when compared to 4K. Feel free to test this yourself, create two new large identical sized thin provisioned disks on a VM running Windows 10 1703 or above or Server 2016, format the drives as ReFS 4K & 64K block size and look at the used and free space.

I will have to find out why this is, nothing turned up after a quick Google so I assume, it’s something quite low down in the ReFS filesystem. I have publicly shared the graphs below and all other graphs used, via Power BI, I would encourage people to expand the content below and have a closer look.

Use the double-sided arrow in the bottom right corner to view full screen, these are interactive graphs.

So, what can we learn from the above?

Right away we can confirm that Veeam default job settings give us a varying amount of data reduction, before the data even lands on the repository. As expected the DB server with its structured data achieves the best reduction in space, servers with binary data see the least amount of savings. Check the two graphs below for 4K and 64K block sizes comparison:

How well did they dedupe?

Similarly, as expected raw uncompressed data overwhelmingly achieved the best levels of deduplication, I have included the ReFS repository data to compare, obviously there was no post process operation on the ReFS repositories. The chart below shows the results of enabling deduplication on the NTFS repositories and compares the capacity of the repository before and after deduplication had completed:

Initial observations look good…

Over the course of the next 8 weeks the copy jobs did their thing, and during the tests I also made an unexpected discovery regarding the behaviour of dedupe-friendly vs optimal and uncompressed that caught me off guard.

The main backups are stored using optimal compression, when a copy job for optimal and uncompressed repositories run, Veeam data mover service sends the data in its deduplicated compressed form over the network to the mount server for the target repository and the blocks get written.

My observations for dedupe-friendly seemed to show that when the source optimal compression blocks were read from the backup, the data mover service inflates the data again and reprocesses it into dedupe-friendly and sends the relatively inflated dedupe-friendly data over the network. You can observe this behaviour in the slides below:

If you are unsure as to what the job name means, please refer to the jobs screenshot above. In short, Jobs assume a default of optimal compression, DF = Dedupe-Friendly * UC = Uncompressed.

Images are best viewed full screen:

I will ask some of my Veeam friends to confirm if this behaviour is expected, I imagine it is to increase the retention capabilities on deduplicating appliances such as Dell EMC Data Domain and HPE StoreOnce. I’m not sure if the data mover service on an Exagrid decompresses this before it commits to disk either to be fair, if so, it may be better sending optimal blocks over the network to increase throughput if that is a constraint, especially if over a WAN link!

What was the usage on repositories over the course of the 8 weeks of testing?

The server that hosted the LUNs for the repositories was monitored by an agent that logged drive usage every couple of minutes to our monitoring platform, this would have created a silly amount of data points so in the end I settled for 478 data points per repository for file system usage over the course of the 8 weeks. This provided capacity reporting every 2 hours 51 minutes and I was able to export this to a CSV and analyse the data.

Below you can see how each respective compression setting per file system compared for 4K and 64K block sizes:

I think this is the first time we start to see the true nature of ReFS vs NTFS, the ReFS gradient is smooth and predictable, the graphs for NTFS look choppy and come in waves. Additionally, 4K blocks and 64K blocks appear to be very similar in results:

What if you only stored full backups and ignored the daily’s?

Due to the repositories containing ReFS and NTFS filesystems, to make a fair comparison I had to chop off, the first week and last week and use the 6 weeks in the middle for the next graph. I did not want to report on potentially skewed results. Once I had all the other reporting data I needed I removed the first and last weeks from the repositories and ran scrubbing and garbage collections on all NTFS volumes, the ReFS volumes had the same backup copies removed.

The following graph is the middle 6 weeks of the backup test:

This is the only graph looking at 6 weeks data, all others report on 8 weeks

There is a lot of information in this graph. Initially the capacity savings of processed data in the NTFS uncompressed repositories is impossible to ignore, however you cannot ignore the additional space required to ingest the data. If a long-term retention repository is your goal, then within the constraints of NTFS deduplication, (1TB officially, seen 4TB restored without issue in testing) uncompressed offers huge gains in terms of data reduction, 20:1 in this case, for free, with Windows.

10:1 can be achieved using dedupe-friendly albeit using additional network bandwidth. Almost 4:1 can be attained using optimal compression which works around the 1TB officially supported file system limits nicely depending on data type. With ReFS and Optimal compression, we can achieve an approximate 2.5:1 ratio using the data in this test, obviously your real-world mileage may vary. On some deployments I have seen it as high as 3.5:1.

What are your thoughts on 4K block sizes?

4K blocks offer no benefits, if anything, they will be a hindrance long term if for nothing else other than increased volume fragmentation.

Part 3 is available here:

https://www.craigrodgers.co.uk/index.php/2018/04/02/refs-vs-ntfs-consulsion-part-3/

-Craig Rodgers

April 2, 2018February 24, 2019

ReFS vs NTFS – Conclusion (Part 3)

This is Part 3 of a series, if you have not read parts 1 or 2 you can do so here:

ReFS vs NTFS – Introduction (Part 1)

ReFS vs NTFS – Initial Analysis (Part 2)

4K blocks aside then, what else have we learnt?

If we look at the 64K repositories over the 8 weeks of testing, we can make a direct comparison between the various compression settings that make sense to deploy in the real world:

Why does the Lego man at the beginning of the NTFS graphs transform into a somewhat offensive gesture?

Initially, the settings in Windows deduplication were set to deduplicate files older than 5 days old, I changed it to 1 day old (where the 3rd Lego man appears chopped in half). That aside, it is clear at this point there is an additional overhead required to use uncompressed to avail of the benefits realised through deduplication.

Realistically, to see any benefit you would need 2.5 to 3 times the amount of used capacity on your platform to be able to ingest data. Bear in mind we also need to allow for unexpected increases in data, file server migration, DB cluster rebuild, Exchange DAG rebuild or upgrade, and all these things create relatively huge amounts of changed data “over the norm”. However, such a repository would yield huge retention capabilities.

Don’t forget, Veeam, like every other backup system on the planet, does not know what causes huge amounts of changed data if present in CBT, which is short for Change Block Tracking. This is the mechanism used by all backup software to detect changes to a physical or virtual server hard drive at the block level.

Additionally CBT does not explain that new VMs are a new SQL cluster / Exchange DAG node or upgrade / malware infection etc. You really need to scope in some breathing room for unexpected bursts in changed data. Furthermore, it is now impossible to ignore not only the performance of ReFS but the predicable trend line it provides.

Either way it is a good idea to try and scope enough flash storage in the caching layer, to allow for a burst of speed when ingesting data and performing transforms. If you can handle your daily change rate plus a comfortable overhead, you should be in a good position.

At what point commercially, does it become more effective to simply add more space and enjoy faster backup copies and restores using ReFS vs the additional capacity and hardware support requirements of NTFS?

Great question, and to quote every single person who has provided me with IT training over the years, “it depends”. If you remember back to Part 1 I referred to the organic nature of NTFS and ReFS repository configurations, it is impossible to calculate this on a general basis, you must take everything in a platform into consideration to give an accurate result. That being said, you also need to weight the speed of ReFS vs the capacity benefits of NTFS.

I know some people prefer line graphs so here is one comparing all 64K repositories:

From this we immediately learn that in our scenario up to 3 weeks of retention, ReFS is better, hands down, performance gains aside you have little difference with regards to processed capacity. At 5 weeks ReFS is holding its own on capacity vs NTFS but after this point, the benefits of deduplication really start to kick in.

Once more, we can confirm the predictable nature of ReFS vs the varying logarithmic curves of NTFS with deduplication. In essence, so far it boils down to performance vs retention.

What about drive wear and tear & the churn rate, how will that affect any decision?

If using NTFS and dedupe-friendly or uncompressed, there are requirements to read and write significantly more data than ReFS or NTFS using optimal compression, that will indeed translate into additional drive wear and tear, more IOPS, and more data reads and writes which equates to more head movements etc. If we look at the amount of changed data over time for 64K repositories we observe the following:

This is a massively under estimated graph and one that is often ignored, Supportability is a keystone in infrastructure design yet often overlooked outside of cross patching network uplinks, SAN controllers and host failover capacity. What options do we have for disk arrays?

RAID 5
RAID 6
RAID 2.0+ (Huawei Storage rebuild times are insane vs RAID 5/6 (hours and minutes vs days and hours)
JBOD
- Storage Spaces is great if you are Microsoft however its handling of failed disks in the real world frankly leaves a lot to be desired, from a supportability standpoint, if you have ever experienced a failed caching drive you will know what I am referring too.

So that leaves RAID 6 & Huawei, using RAID 5 with the IOPS and drive capacities required for backup jobs is practically insane, thankfully Microsoft have declared ReFS is not supported on any hardware virtualised RAID system, how is that deduplicating appliance looking now?

Wait, what?

Microsoft do not support RAID 5, 6 or any other type of RAID for ReFS, the official support* for ReFS means you use Storage Spaces, Storage Space Direct or JBOD.

https://docs.microsoft.com/en-us/windows-server/storage/refs/refs-overview

Excerpt from above:

Does that mean every instance of ReFS worldwide using hardware virtualised RAID is technically unsupported by Microsoft?

Technically, Yes. That being said and as previously mentioned, Microsoft have been working very closely with Veeam to resolve issues in ReFS with regards to Blockclone. Veeam have at best, “spearheaded” and at worst, been “early pioneers” of Blockclone technology, when you tie that with their close working relationship, both parties need ReFS to be stable and surely, have common motives for wanting an increased adoption of ReFS.

The obvious Hyper-V benefits are a clear indicator here as to their motivation, so this has in many ways been the saving grace for backups, however, you must bear in mind that technically it is not a supported implementation if you are using hardware virtualised RAID.

At some point this will come into play as their get out of jail card.

So, what are my options?

Microsoft seem to be working to resolve issues that by proxy contribute towards supporting ReFS functionality on RAID volumes thus far, however at some point this could change.

For now, if you have ReFS in play, it can work and will continue to receive the collaborative efforts of Microsoft and Veeam. If you are looking at a new solution, it is impossible to ignore the fact you must use a hardware virtualised RAID alternative, if you want to use ReFS in a future deployment.

ReFS works in these circumstances and there are a plethora of reports as such. I have yet to see a failure in an ReFS restore, that being said, if you follow the Veeam forums, there are some who have seen otherwise. Without platform access it is impossible to tell if there were other factors involved.

As it stands, ReFS for the most part is working well now and is probably your best bet for a primary or indeed secondary backup repository taking all these considerations into effect. As previously discussed, for me, this means a 64K block size ReFS formatted, reverse incremental backup target will be your best all round primary backup storage. With regards to a second copy, ReFS is great for fast transforms however you may be happy trading performance for retention, in which case backup copies can target an NTFS volume.

Plan ahead folks and look at your data before you commit to a solution!

If you have any comments or thoughts on this series, my opinions or anything else, please let me know via comments below or if you prefer, via Twitter or LinkedIn. I was genuinely interested in these results myself and I hope some of you were as well.

If you would like to view all the Power BI Graphs in one place you can do so here:

Additionally they can be viewed online in a browser here:

https://app.powerbi.com/view?r=eyJrIjoiZTZiNDY0MTgtMjFhNC00MjNlLWFlYzQtYmM0NTYwMmRmM2VjIiwidCI6IjhhNDc2ZjRiLTdjMzgtNDE5Mi05OTFkLWUyZjYxMWNkZDllNiIsImMiOjh9&pageName=ReportSection1

* Update 20/02/2019 – Microsoft ReFS Support on Hardware RAID *

Following some sterling work by Anton Gostev, Andrew Hansen & their respective teams, Microsoft changed their support stance on ReFS running on Hardware RAID.

Excerpt from Gostevs Weekly Digest:

“Huge news for all ReFS users! Together with many of you, we’ve spent countless hours discussing that strange ReFS support policy update from last year, which essentially limited ReFS to Storage Spaces and standalone disks only. So no RAID controllers, no FC or iSCSI LUNs, no nothing – just plain vanilla disks, period. As you know, I’ve been keeping in touch with Microsoft ReFS team on this issue all the time, translating the official WHYs they were giving me and being devil’s advocate, so to speak (true MVP eh). Secretly though, I was not giving up and kept the firm push on them – just because this limitation did not make any sense to me. Still, I can never take all the credit because I know I’d still be banging my head against the wall today if one awesome guy – Andrew Hansen – did not join that Microsoft team as the new PM. He took the issue very seriously and worked diligently to get to the bottom of this, eventually clearing up what in the end appeared to be one big internal confusion that started from a single bad documentation edit.
Bottom line: ReFS is in fact fully supported on ANY storage hardware that is listed on Microsoft HCL. This includes general purpose servers with certified RAID controllers, such as Cisco S3260 (see statement under Basic Disks), as well as FC and iSCSI LUNs on SAN such as HPE Nimble (under Backup Target). What about those flush concerns we’ve talked about so much? These concerns are in fact 100% valid, but guess what – apparently, Microsoft storage certification process has always included the dedicated flush test tool designed to ensure this command is respected by the RAID controller, with data being protected in all scenarios – including from power loss during write – using technologies like battery-backed write cache (for example, S3260 uses supercapacitor for this purpose). Anyway – I’m super excited to see this resolved, as this was obviously a huge roadblock to ReFS proliferation.”

-Craig Rodgers

December 19, 2017February 24, 2019

Forward or Reverse Incremental?

Petrol vs Diesel? Apples & Oranges? Is it a matter of preference? Moving forward seems logical? Is moving backwards better in some cases? I can think of a few real-world examples that fit both cases but in the sense of Veeam backups, which one is better?

Better is a very general term…

I suppose it is so let’s weigh things up at a more granular level. Veeam have a mountain of reference data on the functional differences, performance & best practices for each type of backup. A lot of the time it comes back to stun settings on VM snapshot removal and backup window about the additional I/O overhead in creating a reverse incremental on the target backup repository. These are both common denominators that can be easily compared as VM X shows Y and Z using Forward vs Reverse. There are other factors that need to be considered properly to compare the two, that I seldom see referenced.

Manageability?

Yes.

What is your personal preference?

Reverse incremental with a weekly active full, for longer than most.

Why then if most others use Forward Incremental?

These days times have changed, most platforms & backup targets have a cache layer that takes the initial I/O hit, from SSD cache drives in a NAS or SAN to multiple enterprise grade NVMe drives in a physical backup server or hardware deduplication appliance, RAM is awesome for this as well.

Why more RAM?

More RAM makes things smoother, faster and stable due to reducing stress on numerous systems such as disk, network and storage using caching, 100% fact. Pity the prices for DDR-4 in 2017 have gone up, thankfully Samsung have decided to increase output significantly along with others such as SK Hynix and Micron which may bode well for better prices, especially from 2019 onwards, last year’s oversupply vs this year’s shortage may fare better for those without the buying power of a Worldwide Cloud Service Provider, all of whom have been paying a lot more this year to ensure their continued hardware landscape growth.

Stun times are the same using forward or reverse incremental now?

No. Not going to misrepresent here, reverse incremental takes longer to back up owing to the fact more IOPS must take place on the target repository, as such it is a fact that it takes longer and as such the virtualisation layer snapshots will grow larger and as such will take longer to remove and that’s a fact. However, if you are looking at a new backup system, look at your current snapshot removal times, I am almost 100% certain that any new backup system you deploy will have significantly reduced snapshot removal times using reverse incremental backups vs your old forward incremental backups, if they do not, let me know because something is drastically wrong with the solution you are deploying. Feel free to create some test VMs with some equal change data via a robocopy script or the likes from file servers or whatever other mechanism you are comfortable with, just do not create a second backup of a production VM, backup once, copy many, avoids CBT issues in the real world. Daily I see snapshot removal times in the low single digit seconds using reverse incremental though.

So why bother to take longer to back them up?

90% of the time, what backup are you restoring? The most recent backup? Probably, especially in a disaster scenario, you will be restoring the last backup and having that backup available as a full will significantly affect restore speed, need a lower RTO? Having to restore through incremental backup chains has an overhead, the same can be said for older reverse incremental chains.

You mentioned manageability?

I did, so you are running a backup system, something happens that causes an abnormal amount of changed data, worst case you run all flash systems and you get a ransomware infection that does not get caught. So now you have huge amounts of changed data that needs to be backed up as far as any backup system knows from CBT data. If your repository fills up for this or any other reason, what are your options in a forward incremental job? Maybe you are insane and the repository free space is not monitored, the email warnings from Veeam telling you that space was running low were ignored for whatever the reason, you are now at near zero bytes free space on your backup repository. What do you do with a forward incremental backup target repository to get backups working again quickly for whatever reason?

Add more space?

For free? I’d love a few of those storage systems. What if you used some of the un-provisioned space last time and you don’t have enough? Using reverse incremental you can remove older parts of the backup chain and retain the ability to perform a full restore and not interrupt the backup chain, because it grows backwards.

The process is simple, ensure no jobs are active, put the repository into maintenance mode, remove older retention point files to achieve the desired amount of free space, re-scan the repository, check the backups for the job and forget all unavailable restore points.

Veeam help link?

https://helpcenter.veeam.com/docs/backup/hyperv/remove_missing_point.html?ver=95

November 18, 2017February 23, 2019

Hello World

It’s been in the back of my mind to write a blog, as with everything else that gets done, I’ve decided to make the time.

“Why is it time to give something back publicly?”

For many years, I along with every other IT engineer on the planet, has made use of other people’s writings or musings through whatever medium was available at the time. Let’s be honest, you only know what you know, you learn from exposure and sometimes playing about with technology isn’t enough. Often time can be a constraint and this is when people turn to forums, blogs, help sections or whatever else can be garnered from a google search in time of need or interest. I now feel it is time I gave something back because I feel I have benefited so much from the writings of others. To this end, I have decided another way I can give something back to the community publicly outside of internal assistance, is to share my findings via a blog. I have no idea what will end up here, it could be a VDI solution I am working on or a Firewall mechanism used in work via a change or project. One thing I know will feature regularly is my love of virtualisation, storage, backups & networking.

“Why?”

Running VMs and making them available. Day to day I am lucky enough to be part of an amazing team and culture in Novosco. We are an agile MSP headquartered in Belfast, Northern Ireland with a few other offices dotted around the UK and Ireland servicing everything from NHS Trusts to Housing Associations, Premiership Football clubs and more. I wanted to join Novosco because of what I could see from the outside a few years ago, on the inside, it’s been even better. Novosco really support engineers putting you through any certification you want to sit that is featured anywhere in the company’s product / tool set. They give us a test lab to play with technology, over 1 TB of RAM and enough cores to build something meaty, flash storage and proper SAS drives to boot, with 40gbit per host networking. Whilst this might not appeal to some, having a playground can be a gateway to deeper understanding and for me, that’s what it’s all about. Understanding, knowing and dare I say loving the technology I get exposed too, day in, day out. When you couple a great company, great people, great culture and engineers with passion, under a management team with real vision, the sky is the limit. I can honestly say for me personally, none of this would be possible without internal colleagues and the people who make the time to write articles, blogs, how to guides, forum answers etc etc etc.

“What do I do?”

My day job centres around providing solutions & support to our larger enterprise clients in London and further afield. Daily toolset & some of the technologies I am certified in include, in no particular order:

Virtualisation – VMware
Networking – Cisco / Huawei / MPLS
Storage – EMC / vSAN / Others
Backups & Replication – Veeam Availability / Zerto
Firewalls – Fortinet & Cisco ASA
MS Stack – AD and anything below or above up in the cloud

“Who cares?”

Honestly, I do not know, let’s see what happens. If my musings help in any way, and I hope they will, let me know, either way I’ll keep on rambling, feedback is as important here for me as it is anywhere else in my day job.

“Coffee?”

How good, is good coffee? It’s great!

Has storage recently become software?

So, What if Anything Has Changed?

Why is this a big deal?

What About Cloud Native Options?

What about Cloud Management?

What’s the takeaway?

What is Cloud Field Day?

Isn’t that just a webinar?

Great, who is presenting at CFD14?

Zerto

NetApp

Morpheus Data

Weka

Alkira

Lightbits Labs

Komprise

How do I watch?

Why are you attending Veeam On in Miami?

The answer is unequivocally, yes

Why should anyone attend Veeam On?

Are you taking any training?

So, what is Happening at Veeam On 2019?

Massively Discounted Veeam Training for the new content:

61 Breakout Sessions!

Speakers at the Event:

Keynote Speakers

Guest Speakers

Veeam Technical Experts

And last but by no means least, the Veeam Party with Flo Rida!

Promotional Video?

ReFS vs NTFS – Introduction

Calm Seas v Stormy Waters?

Why are you running this test?

What were the conditions of the test?

ReFS vs NTFS - Veeam Repository Configuration

What was the flow of data for the test?

What types of servers did you use for testing?

What exactly where you looking to compare?

When can we see results?

This is Part 2 of a series, if you have not read Part 1 you can do so here:

How was Veeam configured?

I want to see results!

So, what can we learn from the above?

How well did they dedupe?

Initial observations look good…

What was the usage on repositories over the course of the 8 weeks of testing?

What if you only stored full backups and ignored the daily’s?

What are your thoughts on 4K block sizes?

This is Part 3 of a series, if you have not read parts 1 or 2 you can do so here:

4K blocks aside then, what else have we learnt?

Why does the Lego man at the beginning of the NTFS graphs transform into a somewhat offensive gesture?

At what point commercially, does it become more effective to simply add more space and enjoy faster backup copies and restores using ReFS vs the additional capacity and hardware support requirements of NTFS?

What about drive wear and tear & the churn rate, how will that affect any decision?

Wait, what?

Does that mean every instance of ReFS worldwide using hardware virtualised RAID is technically unsupported by Microsoft?

So, what are my options?

If you would like to view all the Power BI Graphs in one place you can do so here:

Additionally they can be viewed online in a browser here:

*** Update 20/02/2019 – Microsoft ReFS Support on Hardware RAID ***

Forward or Reverse Incremental?

Better is a very general term…

Manageability?

What is your personal preference?

Why then if most others use Forward Incremental?

Why more RAM?

Stun times are the same using forward or reverse incremental now?

So why bother to take longer to back them up?

You mentioned manageability?

Add more space?

Veeam help link?

“Why is it time to give something back publicly?”

“Why?”

“What do I do?”

“Who cares?”

“Coffee?”

-Craig Rodgers

Technical Architect @ http://www.novosco.com

* Update 20/02/2019 – Microsoft ReFS Support on Hardware RAID *