Always On vs Selectable Inline Data Reduction in All Flash Arrays
It has become normal for all flash arrays (AFAs) to have inline data reduction. Data reduction does one thing: reduces the cost of an AFA by removing multiple copies of the same data to increase the effective capacity.
Workloads that benefit from data reduction
Data Reduction is not a fit for all workloads. Deduplication offers very limited reduction to databases but excellent for VDI and VSI (6-10:1). Compression is good for databases (2-3:1) but not a major benefit to VDI and VSI. Don't be fooled by marketing spiel, inline data reduction impacts the storage latency, it is an interruption in the IO path. Therefore, data reduction is best used on a platform that has great sustained low latency performance, hence why it was never popular on disk arrays for active data. A true enterprise storage array should allow you to decide when to activate data reduction and when not to so it best fits all your workloads. It is worth noting that the best data reduction you will get is from a solution which offers both de-duplication and compression, this gives you the best flexibility for different workloads.
Always On Data Reduction
Many AFA vendors offer always on data reduction without an ability to turn it off. However, always on is not a benefit or a feature, it is actually a compromise. The core reason that these vendors need always on is because they don't control the SSDs, write performance is very poor and are impacted when the devices go into garbage collection. These vendors have to reduce the number of writes to those SSDs to try and minimise this impact and this reduction doesn't always happen. Always on actually creates inconsistent write performance. This is because the performance of the IO is reliant on how fast the data reduction engine is, how overloaded it is, is there enough cache during a heavy write and if there are enough resources to run the engine. I know this because I have seen it on a number of customer sites during real workload testing. Therefore putting critical databases on these 'always on' arrays won't give you great consistent performance which means this 'feature' is not suited for all workloads.
Selectable Data Reduction
A very small number of AFA vendors offer selective data reduction and Violin Memory is the only enterprise vendor that can do this. This gives control back to the administrator so they can choice which LUN is data reduced and which ones are not. A major benefit of this is that if your database needs really low consistent latency then you can turn data reduction off on those LUNs while leaving it on for the LUNs used for VDI, VSI, dev test and non-critical databases. This means you can choose performance vs capacity and take full control of how your storage operates and how your workloads perform.
Shelf vs Controller Based Data Reduction
Some vendors run the data reduction in the controllers (or heads) and Violin Memory runs it in the flash array (shelf). Data Reduction is very CPU and DRAM intensive. The disadvantage of running it in the controllers is that you introduce a bottleneck into a solution which is meant to alleviate them. Data reduction performance and scale is bound to the amount of CPU and DRAM in the controllers above the shelves. Thats why there is always a low limit to the amount of shelves those vendors can have behind the controllers.
Violin Memory runs the data reduction in the shelves themselves, which are highly intelligent and a major benefit of ground up design over using commodity SSDs. Each shelf has its own CPU and DRAM which removes the constraint from the controllers so they can concentrate on tasks such as capacity pool management, metro clustering and sync replication. This means performance is not constraint by the amount of CPU and DRAM in the controllers, leading to much better sustained performance and greater scale. It also means that controllers don't need to be replaced every 3 years unlike other competitors.
Conclusion
Data reduction is definitely a key component in driving down the economics of an AFA. It enables businesses to gain more capacity within a smaller footprint leading to a lower $/GB. However, choosing the right implementation of how data reduction is carried out is critical to success. Workloads behave differently so there isn't one way that works for everything, selectable will always be more beneficial than always on as it gives you the flexibility to fit the workload rather than the other way round. Finally, choosing a solution with controller-head based data reduction will actually limit the gains from a flash array. A solution where the shelves carry out this intensive task is a much better architecture offering better performance and greater scale.