Standard Object Storage Benchmarks
I'd love to hear your thoughts on what makes a good test set for the comparative performance of object storage protocols (S3 and Swift primarily). I've talked with others in the business and there doesn't seem to be a standard set today. Chime in via comments and let's figure it out together.
My thoughts so far:
- The CosBench tool seems pretty well suited for this
- The test should include the job files and a script to launch from the cli
- There should be a document that clearly explains how to read the results
Thoughts on this?
Thoughts on the test set?
Be sure to expand the comments so that we can have a healthy discussion!
Personally I think there are several benchmarking philosophies which can help determine what tool to use. My perspective is many are focused on running various workloads and reporting results but don't necessary help figure out where underlying problems may exist and so I wrote my own to meet my own needs, called getput (since putget was already taken at the time). Getput does NOT do workloads but rather you give it a specific object size, run time and number of threads to test against. You can also specify multiple sizes and multiple thread counts and it will run against various permutations for very extensive profiling, but the key point is repeatability which you don't necessarily get against workloads which are more statistics based. In about an hour, I can generate numbers for object sizes of 1K to about 100M over parallel thread counts from 1 to 100 or more. Using multiple clients I can drive the thread count up to well over 1000! Lets say I want to compare performance of 1K objects against 10K objects over a long period of time. I don't want a mix of object sizes or puts/gets polluting my numbers. If I want to tune the system or add/remove servers, I want to see exactly what happens with those object sizes. If I want to see what else is happening on the system with respect to cpu, network, memory, etc. I also want to see it for fixed object sizes and measure it all with collectl (a shameless plug for another tool I wrote ;)). Using this methodology I'd found a number of performance bugs the swift developers didn't even know existed though they were there for years! For example who would have ever guessed a 7887 byte PUT was twice as fast as a 7888 byte put? Profiling tools just tell you an overall workload performed at X. I'm not saying don't run workload based tools but what I am saying is perhaps you need multiple ones to get the whole picture.
CosBench looks quite good for such actions, the main question is what kind of workload we should emulate, as archive type workload is completely different than web app data type. The other thing is what we want to get of such actions ? implemented solution / storage software / some kind of hardware below storage software...
CosBench is the standard answer, indeed.
I think a spread of 4k, 32k, 128k, 1M, and 4M make sense in a matrix of 80% seq read, 80% seq write, random mixed 70%Read/30%Write. Thoughts?
David- always hear to help:)