High-Speed Microservices

Rick H.

Published May 7, 2015

High-Speed Microservices

Microservices Architecture | High-Speed Microservices

This article endeavors to explain high speed microservices architecture. If you are unfamiliar with the term microservices, you may want to first read this blog post on microservices by Michael Brunton and if have more time on your hands this one by James Lewis and Martin Fowler.

High speed microservices is a philosophy and set of patterns for building services that can readily back mobile and web applications at scale. It uses a scale up and out versus a scale out model to do more with less hardware. A scale up model uses in-memory operational data, efficient queue hand-off, and async calls to handle more calls on a single node.

In general the cloud scale out model, employs a sense of reckless abandon. If your app has performance issues, no problem spin up 100 servers. Still not fast enough. Try 1000 servers. This has a cost. This pattern of high-speed microservices does not replace a cloud scale out model per se. It just makes a cloud scale out model more cost effective. Do more with less.

This is not to say that there are not advantages to the cloud scale out model. This is to say that an ability to do more with less hardware has cost savings, and you can still scale out as needed in the cloud.

The beauty of high-speed microservices is it gets back to OOP roots where data and logic live together in a cohesive understandable representation of the problem domain, and away from separation of data and logic. Since data lives with the service logic that operates on it. Also less time is spent dealing with cache coherency issues as the services own or lease the data (own for a period of time). The code tends to be smaller to do the same things.

You can expect to write less code. You can expect the code you write to run faster. To a true developer and software engineer this is a boon. Algorithm speed matters again. It is not dust on the scale while you wait for a response from the database. This movement frees you to do more in less time and to have code that runs orders of magnitudes faster than typical IO bound, cloud lemming services.

There are many Java frameworks and libraries for microservices that you can use to build a high speed microservice system. Vertx, Akka, Kaftka, Redis, Netty, Node.js, Go Channels, Twisted, QBit Java Microservice Lib, etc. are all great tools and technology stacks to build these types of services. This article is not about any particular technology stack or programming language but a more abstract coverage of what it means to build these type of high speed services devoid of language or technology stack preference.

The model described in this article is the inverse of how many applications are built. It is not uncommon to need 3 to 20 servers for a service where in a traditional system you might need 100s or even 1,000s. Your EC2 bill could be cut into 1/10th the cost for example. This is not just supposition but actual observation. In addition the code is more cohesive.

In this model, you typically add extra services to enable fail over support not to scale out per se. You will reduce the amount of servers needed and your code base will be more coherent if you adopt these strategies.

You may have heard, keep your services stateless. We are recommending the opposite. Make your services own their operational data. Your services should still be ephemeral but not stateless.

Attributes of High speed services
High speed services have the following attributes:

High speed services are in-memory services
High speed services do not block
High speed services own their operational data
Scale out involves sharding services
Reliability is achieved by replicating service stores

An in-memory service is a service that runs in-memory. An in-memory services is non-blocking. An in-memory service can load its data from a central data store. In-memory services can load data it owns asynchronously and does not block. It continues to service other requests while the data is loading. It streams in data from its service store whenever leased data is not found, i.e., it faults the data into the services in streams.

At first blush it appears that an in-memory service can achieve its in-memory from using a cache. This is not the case. An in-memory service can use caches but and in-memory service owns its core operational data. Cached data is only from other services that own their data. Only the service itself can edit its operational data. No other application or service can edit a services operational data.

Single writer rule: Only one service at any point in time can edit service particular set of service data

In-memory services either own their data or own their data for a period of time. Owning the data for a period of time is a lease model.

Think of it this way. Data can only be written to by one service at any give point in time. Cache inconsistencies and cache control logic is the root of all evil. The best way to keep data in sync with caches is to never use caches or use them sparingly. It is better to use a service stores that can keep up with your application vending the data as needed in a lease model. Or to create longer leases on service data to improve speed. More on leasing and service stores is described later.

The best way to keep data in sync with caches is to never use caches

Avoid the following:

Caching (use sparingly)
Blocking
Transactions
Databases for operational data

Embrace the following:

In-memory service data and data faulting
Sharding
Async callbacks
Replication / Batching / Remediations
Service Stores for operational data

Data faulting and data leasing seem a lot like caching. The key difference is ownership of data and the single writer principle.

Imagine a mobile app with a set of services that contains user data. The first call to any service checks to see if that users data is already loaded in the service. If the user data is not loaded into the service than the call from the mobile app is put into a memory queue and the call waits for the user data to get loaded asynchronously. The call in this case is a data structure with information about the call, no threads are blocked. The service continues to handle other calls and the service gets notified when the user data loads, and executes the call on the user data load event then. Since we can get many calls to load user data in a given second, we do not load each user one at a time but we load 100 user at a time or 1000 users at a time or we batch load all requests every 50ms (or both) of all user requests in that time. (We micro batch user loads also sometimes called pipelining). Loading the user data when it needed is called data faulting. Loading 1,000 users at a time or all users in the last 50ms or all users since the last user load is called batching request also called micro-batching or pipelining. Batching requests is combining many requests into a single message to optimize IO throughput. Data faulting is the same way your OS loads disk segments into memory pages for virtual memory.

High speed services employ the following:

Timed/Size Batching (Micro-batching or pipelining calls)
Callbacks
Call interception to enable data faulting from the service store
Data faulting for elasticity

Data ownership

The more data you can have in-memory the faster your services can run. Not all use cases and data fit this model. Some exceptions can be made. The more important principle is data ownership. This principle comes from the canonical definition of microservices.

In-memory is a means to an end. Mostly to facilitate non-blocking. The more important point is to have the service own their operational data instead of just being a view into shared data. The more important principle is the single writer principle and the avoidance of cache for operational data.

Let's say that some data is historical data, and historical data rarely gets edited, but it does get edited. Then in this scenario it might make no sense at all to not load the historical data from a database and then update the database directly and skip the service store altogether since the usage is rare and unlikely to hamper the overall performance of the system. However the more operational data can be served from replicated system memory, the better.

If size of the data is an issue remember that you can shard the services and you can also fault data into a service server in batches or streams (using micro-batching). These two vectors should allow most if not all of the operational data to be loaded into memory and enable the single writer principle. Think of this as more of the Pareto principle. You don't need all. You just need the set of data in-memory that is going to give you the SLA that you need. All would be nice. But you can only have 20% of the data that is faulted in and still have a really fast system.

A lease can be 1/2 hour, 8 hours, or some other period of time. Once the lease has expired which could be based on the last time that service data was used, then the data just waits in the service store and is evicted from the service.

Why Lease? Why not just own?
Why not just own data out right. Well you can if the service data is small enough. Leasing data provides a level of elasticity. This allows you to spin up more nodes. If you optimize and tune the data load from the service store to the service then loading users data becomes trivial and very performant. The faster and more trivial the data fault loading, the shorter you can lease the data and the more elastic your services are. In like manner services can save data periodically to the service store or even keep data in a fault tolerant local store (store and forward) and update the service store in batches to accommodate speed and throughput. Leasing data is not the same as getting data from the cache. In the lease model only one service node can edit the data at any given time.

Service Sharding / Service Routing
Elasticity is achieved through leasing and sharding. A service server node owns service data for a period of time. All calls for that users data is made to that server. In front of a series of service servers is a service router. A service router could be an F5 (network load balancer) that maintains server/user affinity through an HTTP header. A service router could be a more complex entity that knows more about the problem domain and knows how to route calls to other back end services. This is the API gateway.

Fault tolerance
The more important the data and the more replication and synchronization that needs to be done. The more important the data the more resources that are needed to ensure data safety.

If a service node goes down, a service router can select another service node to do that work from the service discovery. The service data will be loaded in an async/data faulting/batch. If the service was sending updates to changes then no state is lost except the state that was not sent to the service store since the last update. The more important the state/data, the more synchronization that should be done when the data is modified. For example, the service store can send an async confirmation of a save, the service could enqueue a response to the client. The client or service tier could opt to add retry logic if it does not get a response from the server. You can also replicate calls to services. You can also create a local store and forward for important calls. You could also use a persistent queue like Kafka or Kinesis.

Fault tolerance, service router and service discovery are essential to building a reactive Java microservices architecture.

Service Store
The primary store for a high speed service system is a service store. A service store can treat the service data as opaque. A service store is not a database. A service store is not a cache either. A service store may also keep the data in-memory. The primary function of a service store is vending data quickly to services that are faulting data in. The data should be sent to the services in streams. The calls to the service store need to be pipelined and async. The service store should employ micro-batching to maximize IO throughput.

A services store also takes care of data replication for data safety and safe storage. A service store should be able to bulk save data (stream saves) and bulk load data (stream loads) to/from a service and to/from replicas. A service store like the service itself should never block. Responses are sent asynchronously. WebSocket or sockets are a great mechanisms to send responses from a service store to a given number of services. JSON or some form of binary JSON is a good transport and storage mechanism for a service store.

Service stores are elastic and typically sharded but not as elastic as service servers. Service stores employ replication and synchronization to limit data loss. Service stores are special servers so the rest of your application can be elastic and more fault tolerant. It is typical to over provision service stores to allow for a particular span of growth. Adding new nodes and setting up replication is more deliberate than it is with services. The service store and the leasing model is what enables the microservice services to be elastic. This allows the microservice services to be ephemeral and yet own (via a lease) their data.

By special servers, we mean service store servers might use special hardware like disk level replication and servers which might employ additional monitoring. All service data saved to a high-speed microservice service store should be saved in at least two servers (prefer three if using replicas to distribute read operations). There is a certain level of replication that is expected. Service stores may also keep a transaction log (Kafka) so that others processes can follow the log and update databases for querying and reporting.

In high-speed services, databases are only for reporting, long term storage, backup, etc. All operational data is kept and vended out of the service stores which maintain their own replication and backups for recovery. All modifications to data is done by services. Service stores typically use JSON or some other standard data format for long term storage for both the transaction logs for storage into secondary databases.

A service store is the polar opposite of Big Data. A service store is just operational data. One could tail the transaction logs of a service store to create Big Data.

Active Objects / Threading model:

To minimize complex synchronization code that can become a bottle neck one should employ some form of Active Objects pattern for stateful, high-speed services. One could use an event bus system like Vertx or Node.js or an Actor system like Akka or GO channels or Python Twisted or of course QBit and build their own Active Object service system. Queues and messaging are essential if you want to handle back pressure and create reactive applications.

The active object pattern separates method execution from method invocation for objects that each reside in their own thread of control (or in the case of QBit for a group of objects that are in the same thread of control).

The goal of Active Objects is to introduce concurrency, by using asynchronous method invocation and a scheduler for handling requests. The scheduler could be a resumable thread draining a queue of method calls. This scheduler could also check to see if data needed for this call was already loaded into the system and fault data in from the service store into the running service before the call is made.

The Active Object pattern consists of six elements:

A client proxy to provide an interface for clients. The client proxy can be local (local client proxy) or remote (remote client proxy).
An interface which defines the method request on an active object.
A queue of pending method requests from clients.
A scheduler, which decides which request to execute next which could for example delay invocation until service data if faulted in or which could reorder method calls based on priority or which could work with several related services from one scheduler allowing said services to make local non-enqueued calls to each other.
The implementation of the active object methods. Contains your code.
A service callback for the client to receive the result.

Conclusion

High-speed in-memory Microservices is a real thing. We have employed it to great effect. You can reduce your resource spend, and increase the coherence of your domain model. It is not unheard of to reduce compute resources while also improving the maintainability of your code base and increasing developer throughput. It is not really a change to something new, but more like embracing object-oriented programming in a cloud centric world.

Developing high-speed microservices (tools needed)
Docker, Rocket, Vagrant, EC2, boto, Chef, Puppet, testing, perf testing.

Service discovery and health
Consul, etcd, Zookeeper, Nagios, Sensu, SmartStack, Serf, StatsD.

Similarities to plain microservices?

Data ownership, standalone process, container which has all parts needed, docker is the new war file, etc.

What makes high speed microservices different from plain microservices?

Async, Non-blocking, more focus on data ownership and data faulting.More focus on being reactive.

Glossary:

Service Store : Sharded fault tolerant opaque storage of service data. The service store enables services to be elastic.
Service Server : A server that hosts one or more services.
High Speed Service: An in-memory high speed service that is non-blocking that owns its service data.
Database: Something that does reporting or long term backups for a high speed service system. A database never holds operational data (there is an uncommon exception to this rule beyond the scope of this article).
Service Router: The first tier of servers which decide where to route calls to which services based on sharding rules which can be simple or complex. Simple rules could be handled by ha_proxy or an F5. Complex rules can be handled by service routing tier.

Rick Hightower, the author of this blog post is the primary author of QBit and Boon. Boon is a very fast JSON parser, string parsing, and reflection lib. QBit is a fast in-memory service queue lib that employs Actor-like Active Objects.

200 M message in-proc per second, 1 M RPC calls per second = QBit

The above shows the high-speed nature of Boon and QBit.
I put them there to pique your interest.

QBit is a lot more than just raw speed.

Learn more about QBit:

High-Speed Microservices

Rick H.