Storage Performance In the Cloud

Not so long ago, Oracle announced its Universal Credits that will allow its cloud customers to use a single pool of funds to consume various current and future PaaS/IaaS services. For those who have got Oracle Technology licenses, they will also be able to bring their own licenses (BYOL) to Oracle PaaS services so that they only need to pay a very low flat fee for Cloud infrastructure and tooling, which I thought was great news for many customers.

As a Principal Solution Architect at Oracle, I was particularly interested in the storage performance in the Cloud so I managed to get access to a high I/O compute instance running in Oracle Cloud to check this out. This high I/O instance is actually based on an OCIO1M shape which comes with 1 OCPU, 15 GB RAM and 400GB Non-Volatile Memory Express (NVMe) volume. NVMe is the next generation interface/protocol developed specifically for Solid State Drives (SSDs) by a consortium of vendors, and is much more superior than SATA, which is the market incumbent and dominate interface for connecting an SSD to a PC. The difference between NVMe and SATA is briefed here. In my view, NVMe revolutionises the I/O stack by optimising command submission and data paths as well as supporting massive parallel operations ie. 64K I/O queues with up to 64K outstanding commands per I/O queue.

It is also worth noting that OCPU is an Oracle ComPute Unit which is equivalent to one dedicated Intel Xeon core with hyper-threading enabled. Each OCPU represents two CPU threads without oversubscribing, which is different to other public cloud vendors. By not oversubscribing the CPU, Oracle uniquely enhances downstream I/O scalability with broader I/O bandwidth options. A detailed comparison between OCPU and vCPU is explained here. Anyway, I'd like to focus on I/O as it was perceived as the major bottleneck for scalability traditionally in enterprise data centres. Let's check out the I/O operations per second (IOPS), latency, as well as throughput or bandwidth which is normally measured by Megabytes per second (MBPS) of the 400GB NVMe volume attached to the high I/O compute instance.

In order to run such I/O tests, one needs a sound testing tool or sometimes called benchmarking tool. I found Mike Jung's wiki page was extremely handy and I decided to use one of the most recommended I/O performance benchmarking tools called Flexible I/O Tester or FIO which everyone can download from Github. With the versatility of FIO in terms of I/O testing, I felt like I should start simple.

First, I formatted the 400GB NVMe volume as ext4 which is the most commonly used Linux filesystem on the OCIO1M instance running Oracle Linux and mounted the formatted partition (/dev/xvdz1) to /u01. Arguably, I could run this against raw device (before formatting it) to generate even better results. However, workloads today in the Cloud use cooked devices i.e. filesystems, hence I chose filesystem over raw device for my tests.

Next, I used the following two commands (one for random write and the other for random read) to run 16 jobs simultaneously with 4K block size up to 180 seconds against the ext4 filesystem on the 400GB NVMe volume. I hoped these 16 jobs were able to push up the workload to certain extent. In the meantime, DirectIO is enabled with --direct=1 and ioengine is set to sync by default for both commands. These are two important parameters determining the test workload characteristics. DirectIO allows us to avoid filesystem buffer so I/O request is directly sent to disk. SyncIO makes sure the process waits until the I/O request is completed before handling the next request. So, I fired both commands one by one and very quickly the results were displayed in front of me, which were both stunning.

fio --name /u01/test --direct=1 --rw=randwrite --bs=4k --size=1G --numjobs=16 --time_based --runtime=180 --group_reporting

fio --name /u01/test --direct=1 --rw=randread --bs=4k --size=1G --numjobs=16 --time_based --runtime=180 --group_reporting

As we can see, the IOPS for random write and random read were up to 72K and 86K respectively and the random write/read throughputs or bandwidths were 287MB/s and 346MB/s respectively, which were both awesome. Latency wise, we found the average latency for random writes and reads were 0.2ms and 0.18ms respectively in the above tests, which were also amazing. If you want to know how to interpret FIO output, please click here.

Now, what about scalability? Would the IOPS and bandwidth be scalable? What if your system needs more capacity and has higher stress? To answer those questions, I spun up another high I/O instance with 800GB NVMe volume (OCIO2M) and tweaked the FIO parameters with --ioengine=libaio (async), --iodepth=32 with the same 16 concurrent jobs on 4K block size to simulate higher workloads. The random write and read IOPS results were boosted up to 297K and 311K respectively while the throughputs were also quadrupled, up to 1.1GB/s and 1.2GB/s for write and read respectively. So, it is highly scalable.

It is important to understand that any test like this is highly subject to workload characteristics and hardware being tested. The tests I did here were not intended to reveal the theoretical limit but to give us a perspective of what's possible. There is no doubt that the results were quite mind-blowing as storage performance had been the very exception to what's so called Moore's Law. It also gave us an indication of performance improvements from traditional SATA based SSD to NVMe based SSD. For example, AWS said its maximum IOPS per volume to be 20K IOPS and 75K per instance for instances with provisioned IOPS SSD (io1), which they charge US$0.138 per GB-Month and additional US$0.072 per provisioned IOPS-month in addition to the cost to run an EC2 instance. While Oracle doesn't charge for the IOPS, the high I/O compute shapes (with NVMe volume/IOPS included) are priced very competitive to serve I/O intensive workloads.

Bottom line, NVMe is a revolutionary technology that is reshaping the entire storage market. It will have profound impact on the enterprise data centres not only because of the broad adoption by public cloud vendors such as Oracle, but also due to the impact on software development. I'm glad that Oracle has taken this seriously and made it available with PaaS/IaaS Cloud services. If you want to have a try, why do you sign up the US$300 free credits and give it a go.

Disclaimer: The views expressed on this article are my own and do not necessarily reflect the views of Oracle. The FIO test tool has a variety of parameters to define the workload characteristics which could affect the test results significantly even against the same infrastructure and my test results should never be considered as formal benchmark figures for that reason.

The comparison to AWS really should be to the I3 instance type which also has direct-attached NVMe SSD rather than to provisioned IOPS EBS which is network attached and replicated across data centers.

Like
Reply

The problem with direct attached SSD is that failure of the host hardware will take out the storage. It then becomes the burden of the end customer to provide an HA solution. Network attached storage will always perform at a lower level than direct-attached but is much more resilient and can survive node failure. Also, with rare exceptions (e.g. the T2 instances) the major cloud providers do not oversubscribe CPU.

Like
Reply

To view or add a comment, sign in

More articles by Kurt Liu

Others also viewed

Explore content categories