Addressing vSAN Performance Issues

By George Crump

When considering a VMware alternative, addressing vSAN performance issues and data protection shortcomings is paramount. Incorporating a vSAN into a hyperconverged infrastructure (HCI) is a common practice for alternative solutions. However, despite the theoretical price advantages of vSANs, many IT professionals disqualify HCI due to performance, reliability, and hardware inflexibility. Improving vSAN by addressing these issues is critical so organizations can lower licensing costs and storage hardware costs.

Addressing vSAN performance issues requires innovations found in ultraconverged infrastructure, not the bolt-on workarounds common with HCI. In our previous article, we prioritized addressing data protection issues in the context of HCI and VMware alternatives, as any performance improvement would be futile without robust data protection measures. This article covers the I/O performance issues of the vSAN approach, why it limits VMware alternative selection, and how to address them.

vSAN Refresher

addressing vSAN performance issues

vSAN is a category of storage software that operates in virtual environments, scaling capacity and performance with added nodes. It should allow off-the-shelf, server-class storage in servers that are also running the hypervisor. The original goal of a vSAN is to reduce storage costs and simplify architecture and operations, eliminating the need for expensive storage controllers and high markups on storage media. However, concern over data availability and performance, plus cost savings that never actually materialized has stunted the growth of what should be a dominant architecture.

Improving vSAN Core

One key challenge of most vSAN technology is that it operates as a virtual machine of the hypervisor, relegating storage to a secondary role. This lack of tight integration means that the vSAN does not directly understand the resources, such as CPU and memory, which it will share with the hypervisor. This technical limitation poses a significant hurdle in optimizing vSAN performance.

This lack of cooperation in resource sharing and allocation between the two processes poses a significant challenge. Vendors have attempted to overcome this by creating vSAN-Ready nodes, which loosely translated, means overpowered nodes that compensate for the overhead caused by running storage as a separate VM. The problem with these vSAN-Ready nodes is that they break the original promise of less expensive storage costs.

VergeOS vSAN integrates into the Core

VergeOS takes a different approach from other HCI vendors. It is an ultraconverged infrastructure (UCI), meaning the storage functionality and network services are tightly integrated into the hypervisor. With VergeOS, there is one efficient piece of software to install, and the integration means that each of the services; storage, hypervisor, and network are aware of each other and can more accurately allocate resources. The efficiency and more accurate resource allocation means our vSAN, VergeFS, does not require a particular vSAN-Ready node; standard off-the-shelf server hardware, including in most cases, the hardware you already own, will deliver excellent performance.

Improving vSAN Connect

addressing vSAN performance issues

Any scale-out environment will require robust internode connectivity and communication to sustain performance at scale. This requirement is magnified in an HCI environment as the hypervisor and the network software will also need to coordinate between nodes, creating a heavy amount of east-west communication. Because HCI solutions run these three components separately, the separate processes require a unique communication lane, effectively tripling the network load. Lastly, most HCI vendors try to leverage legacy protocols that are not optimized for this type of traffic, which increases the “chatiness” of the communication.

The node interconnect’s lack of integration and dependence on legacy protocols significantly burden it. This burden limits scale both technically and practically while also complicating network design. The workaround, once again, is to throw hardware at the problem. Some solutions recommend 25GbE as a minimum interconnect, while others suggest NVMe connectivity.

While not directly related to performance, another aspect of the vSAN connect is supporting external arrays, specifically FC-SAN arrays. Most HCI solutions do not support these systems. Even if the VMware alternative can address all the performance issues of its vSAN, customers are unwilling to replace the existing FC-SAN array prematurely.

VergeOS vSAN Optimized Fabric

Thanks to its integration, the VergeOS vSAN, VergeFS can leverage all the attributes of VergeFabric, our software-defined networking. The integration also reduces the communication lanes to one, increasing efficiency. Additionally, VergeFS utilizes a custom network protocol that enables it to provide an active-active port utilization technique that automatically loads and balances traffic during internode communications, providing near port speed performance. The protocol is also scalable beyond two ports; for example, customers can use quad-port network interface cards (NIC), and the protocol will appropriately use all four ports for maximum performance and resiliency.

VergeOS vSAN also supports external FC-SANs so customers can leverage their existing investments. In the future, these customers can move to a vSAN to lower storage costs, continue to invest in their FC-SAN or run in a hybrid configuration.

Improving vSAN Data Optimization

addressing vSAN performance issues

Virtualized environments often contain a significant amount of redundant data. Each VM uses one of two operating systems, which means similar operating system files that consume a lot of capacity. There is often redundancy across applications and at least some redundancy within user data. As a result, most vSANs have a data deduplication capability that delivers a 3:1 to 5:1 gain in effective capacity. The problem is that the way various vSAN technologies implement deduplication extracts a noticeable impact on storage IO performance.

The reason for the negative performance impact is twofold, but the source is the same: lack of efficiency. vSANs don’t have the “luxury” of dedicated resources as do storage arrays; resources must be shared between the storage, virtualization, and network functions. Once again, because the storage software is a separate application with unique VM(s), the deduplication process runs externally to the network software and hypervisor, meaning that redundant data is handled multiple times. Second, the vendor-added deduplication capability is often added years after introducing the vSAN software. The result is that the deduplication process must run against the data being processed, updating the vSAN software, which in turn must update the hypervisor; essentially, metadata is constantly updating metadata, creating significant overhead. Solving the overhead problem, again, requires overbuying on the processor and RAM resources.

VergeOS vSAN Hypervisor-Aware Deduplication

Not all deduplication is created equal, and addressing vSAN performance issues requires building an algorithm designed specifically for converged infrastructure. For example, VergeOS’ global inline deduplication has been integrated into the product from day one, and it runs seamlessly within the same code as the hypervisor and the fabric, making both deduplication aware. The result is deduplication efficiency multiplied across storage, processing, and network connections. The tight “from day one” integration also means that a single process updates all metadata and avoids “metadata updating metadata” overhead.

Improving vSAN Data Distribution Performance

Most vSAN architectures will try to distribute data across nodes within the infrastructure which should give them a performance advantage over traditional storage arrays. However, most HCI solutions are not as scalable as traditional SANs, nor do they deliver greater network performance.

As mentioned above, part of the performance shortcoming is due to the lack of fabric optimization. Another problem is that most vSAN solutions often burden this distribution process with erasure coding for data protection. The overhead of creating parity for each block of data written and then distributing that parity takes its toll on performance. It also forces more data across the network. As a result, what should be an obvious advantage of a scale-out vSAN architecture, leveraging multiple network segments and processors, is lost.

Parity-based protection schemes like RAID or Erasure coding also significantly impact performance when trying to maintain operations in a failed state, like a drive or server failure. With both techniques, a single drive failure means the organization is just one drive away from data loss. Any data access while in the failed state must be responded to by recalculating the data through parity in real-time. Finally, when the drive is replaced, customers face a performance impact while the replaced drives are rebuilt.

VergeOS vSAN Optimizes Data Distribution

VergeOS vSAN also distributes data across all the storage contributing nodes in an instance. Again, redundant data, once identified and metadata is updated, does not travel across a network or bother another processor. Unique data is simultaneously written to two separate drives on two different nodes.

VergeOS is unique because it understands the exact location of every data segment in the environment, including redundant segments. Also, all drives are active; any drive can respond to an existing IO request. No mathematical formula needs to be executed to store redundant data, nor is a formula required to respond to a read request during a failed state.

Thanks to ioGuardian, a capability built into VergeOS, the instance can survive multiple simultaneous drive failures without impacting data availability. Drive rebuilds are performed via the ioGuardian server, further ensuring the consistency of the production instance.

Improving vSAN RAM utilization

Because the VMware alternatives’ vSANs run as a subordinate process to the hypervisor, they have a limited understanding of the total RAM resources available. The storage VMs can’t be aggressive in their use of RAM as a cache, for example. Also, because deduplication is a separate process, data in the cache isn’t necessarily deduplicated, making RAM utilization even less efficient.

VergeOS vSAN Storage Aware RAM allocation

addressing vSAN performance issues

VergeOS’ storage service completely understands the available RAM used for caching. As a result, it can aggressively use this RAM without risking VM performance. Also, since the cache algorithms run alongside the deduplication algorithm, only unique data is stored in the cache, further optimizing available RAM by 3 to 4X or more.

Improve vSAN with VergeOS Ultraconverged Infrastructure

VergeOS is an ultra-performant vSAN capable of delivering performance similar to, if not better than, any externally attached storage array. Unlike many other storage systems whose original intent was file serving, it was designed from the ground up to run virtual workloads. VergeOS is a single piece of software that controls the life of an I/O from the hardware to the virtual drive, not layers of software, often from different vendors, stitched together into a management interface.

Next Steps

Further Reading

Are All VMware Alternatives the Same?

As IT professionals evaluate VMware alternatives, they may wonder if all options are the same. VergeIO stands out by offering an integrated solution that optimizes hypervisors, storage, and networking. With deep code-level expertise, VergeIO ensures higher performance, reliability, and simplicity, making it a superior alternative.
Read More

The Sustainability Benefits of Efficient Infrastructure Software

Explore how efficient infrastructure software enhances sustainability by reducing server count, energy use, and electronic waste. Learn about the benefits of eliminating vendor lock-in, extending hardware lifespan, and integrating backup best practices while achieving significant cost savings compared to traditional solutions like VMware.
Read More

On-Premises IT Can Learn from MSPs

On-premises IT can adopt best practices from MSPs to streamline operations, improve responsiveness, and enhance efficiency. MSPs excel in efficiency, scalability, and security. Modern infrastructure software, like VergeOS, integrates virtualization, storage, and networking services, extending server longevity and supporting diverse hardware configurations while ensuring high availability and reducing costs.
Read More