The Power Of Memcached
Today I’m going to talk about the power of Memcached and take you through a simple web application that will be optimized on a single low end server to provide lightning fast results to many simultaneous queries. Now, the key is how we do this in a scalable and affordable way.
So with that said let’s talk about our sample web 2.0 application. I have mocked up a tiny site that presents name data per the social security administrations open data initiative as my sample set. Our goal for our web application is to search this data and provide a chart showing the history of names and popularity. In theory it sounds simple, well it gets tough fast as you look at the sample data set.
First let’s take a look at some of the data. I dumped out a csv file by state in order for us to import it properly into the database schema that we will be making which looks like this.
Pretty basic in this example. Broken out by state, our data looks like this:
With that said we need to create our database:
CREATE SCHEMA IF NOT EXISTS `namehistory` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci ;
USE `namehistory` ;
-- -----------------------------------------------------
-- Table `namehistory`.`birthnames`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `namehistory`.`birthnames` (
`idbirthnames` INT NOT NULL AUTO_INCREMENT ,
`name` VARCHAR(45) NOT NULL ,
`year` YEAR NOT NULL ,
`instances` INT NOT NULL ,
`gender` CHAR NOT NULL ,
`stateid` INT NOT NULL ,
PRIMARY KEY (`idbirthnames`) ,
UNIQUE INDEX `birthname_UNIQUE` (`name` ASC) )
ENGINE = InnoDB;
-- -----------------------------------------------------
-- Table `namehistory`.` state`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `namehistory`.`state` (
`idstate` INT NULL AUTO_INCREMENT ,
`shortdesc` VARCHAR(45) NOT NULL ,
`longdesc` VARCHAR(8) NOT NULL ,
PRIMARY KEY (`idstate`) ,
UNIQUE INDEX `shortdesc_UNIQUE` (`shortdesc` ASC) ,
UNIQUE INDEX `longdesc_UNIQUE` (`longdesc` ASC) )
ENGINE = InnoDB;
USE `namehistory` ;
So let’s populate our database with a quick script that will import our csv files and create our database entries.
php -f importcsv.php filename="/var/www/html/findaname/csv/WY.TXT"
This script loads the database with the parsed csv files based on state and matches them to the "stateid".
Now our host has a generic mysql db running with our information. We install memcached and set that to run and make sure the correct memcached php extensions are installed.
We will now restart apache and access our demonstration application. Now I’m not a designer so it will look rather basic. However, it gets the point across.
Our first run will be without any memcache running for query caching. So let’s give it a generic query for “sam”.
Now what we notice is that sam takes quite some time to generate this chart from the data in our database. What happens here is the request is sent and then we search the database. Then we aggregrate the results, process them and return the json data.
If we run it again we can see that the time it takes is pretty consistent right now.
However, it is still too slow in general and it is only going to get worse. I doubt that our goal is for only one person to use it at a time. So let’s setup a load tester for the query.
We will start with 100 requests a second and see how we fair. We will also set a limit of 1000 completed or 5 minutes total which ever happens first.
1:35 seconds into the test, we are not really doing so good. We have 1000 queued requests and 7729 discarded requests with only 746 Completed requests. Once it’s done we will delve into some of the stats.
Well that’s a little depressing... It took us 2:01 seconds to get back 1000 successful searches for the same data and we gave up on 9703 as discarded because it couldn’t handle any more in the queue.
Even worse, we look at our response times for completion we went from 2.5 seconds for the search up to over 20 seconds.
Simply put our website now is not sufficient. It can’t handle any high volume of users.
Now let’s try doing the same thing but we’re going to turn on memcached. The first query follows query regular execution and takes place against the db. After that query memcached stores the results in memory with a key value store and we start to see this:
Well, as you can see this is starting to look significantly better. I’m down to 41ms of latency on this request. This is mostly my round trip since I am VPNed and on a public consumer grade connection multiple states away. Let’s have some fun and hit it with the same requirements as last time on the load test.
Well, isn’t this a pleasant surprise we sent 1001 requests at 100/sec in only 10 seconds. That’s exactly what we want to see. Let’s take a look at some of our stats:
Now we see that the latency of requests are well within normal limits. Let’s see how this does with 200 requests a second for 1000 completed requests:
Not too shabby, in fact with a little apache optimization we could probably handle this load even better.
To summarize, I did not apply performance tuning on either Apache or MySQL. I purposely left them in their default state to showcase a typical non-optimized deployment. By making the decision on the backend to alleviate that load on the DB instance we can see massive gains. These gains would be difficult to achieve with DB tuning alone.
Memcached or any memory based Key/Value store is an excellent way to cache and optimize non-realtime data. However, it does require significant thought into how to best implement. The most important aspect is what can be cached and for how long. In our example we've taken a small single low end server and allowed it to handle high volume which in turn can be scaled into a farm based approach.
This is further outlined in my other article attached on my profile called "Designing For Scale From Day 1" that goes further into these concepts. If you would like to discuss horizontal scalability or hosting strategies please feel free to contact me directly at Mark.Lee@pclsolutions.com.
Honestly, one of the smartest people in the video space I know. Easily!
Nice article!
What a great comprehensive approach to a great technology! Very smartly written, easy to understand and well thought out in your approach! Very impressed with your publication! Keep up the great work!
Great writeup with actual information. You don't see this kind of post every day. Well done!