Tracking Actions Through Multiple Micro Services

In a monolithic application tracking what happens when a user clicks a button is relatively straightforward. You would normally get a request in the web server access log showing a response code. Hopefully there will be some application logging you can use to determine what was happening at the time of that request. The logs usually mix multiple requests at the same time but it's usually possible to identify the bits relating to your request with some knowledge of how the system operates.

In our micro service based architecture this very traditional approach is simply not workable. Each micro service will have it's own access logs and application logs. Attempting to tie these multiple pieces together to see what happened in response to a specific user action is next to impossible where that action triggers so many requests across multiple micro services.

In the telecoms industry they experienced the same problem in attempting to trace the path of an individual call through multiple networks so they could accurately bill each other and the end user. To resolve this they came up with the notion of a "call detail record" . This is made up of a unique identifier for the specific communication plus a bunch of metadata. The record does not include the transmitted data.

We can use the same principle in our logging to enable a user action to be easily tracked through. Such a system would even make a monolithic application easier to log. Since we are dealing with HTTP requests we will call our record a "Request Detail Record", or RDR.

When a request in initiated we will create the record. Part of creating the record will be creating a unique ID. This unique ID will be passed into each subsequent request to micro services that are used to produce the final response. Upon receipt of the final response the record will be updated with details of the response. Every log entry made during the full life cycle of the request must contain the unique request detail record identifier.

For example, an automated ajax call goes out to get the latest KPIs for your platform.

  • Before the request is sent, an RDR is created.
  • The unique RDR identifier is added to the outgoing request headers and it is sent to the reporting micro service.
  • The reporting service logs it's own actions with the RDR identifier as part of each entry
  • If the reporting service needs to call any other service the unique RDR id is added to the headers of those requests and the receiving micro services log using the RDR id as part of the log entry.
  • Upon final return to the front end of the results of the call the RDR is updated and the next call in the queue can be sent.

For a manually triggered event, such as the click of a button, the same principles apply.  

  • If the event is handled by AJAX then it is added to the queue as above.
  • If the button causes a form submission then 
  • the RDR is created and the form is submitted with the RDR id as part of the submission.
  • upon completion of the request (when the page is reloaded) the RDR is updated.

RDR Unique ID

Each RDR needs a unique id. This does not have to be cryptographically secure, just unique. It is helpful when reading logs for the identifier to be human readable, but not essential.

An example might be some combination of <platform short name>-<user identifier>-<action code>-<timestamp>-<5 random chars> which might look like

blog-paul-search-2019-06-19_11:29:21-f4xyp

RDR contents and storage

The RDR itself should be stored along with all other logging information.  A great way to do this is using logstash and elastic search, but that's a story for another post.

The RDR could contain

  • Originating platform name
  • Originating IP address
  • User ID or logged in person name
  • URL being called
  • calling time stamp
  • response code received
  • response received timestamp

Links to the full request (body and headers) and full response (body and headers) would be useful where reproducing the initial requests is needed, however the RDR must not contain that data. The services that receive the HTTP requests should be logging this information themselves and we don't need to repeating.

Retrieving the log of an action

So, now we have an identifier for the action the user has asked for we can trace it through all the micro services and see exactly what the system logged for that particular unique requests without us having to worry about all the other requests going on for different purposed at the same time.

Retrieving it from elastic search is a trivial query to get everything with the RDR we want and order it by log timestamp. File based logs? A simple grep using the RDR, maybe piped into sort.

Suddenly finding out what has happened is simple and accurate and no guesswork is needed.

Really helpful, practical advice. I remember working with CDRs, but didn't appreciate they were a unifying thread of data that consolidated multiple services or processes. Once you describe the concept, it's relatively straightforward as an approach - even if it's not necessarily technically straightforward. Your recommendation to develop an RDR is truly something that could help me right now. Thanks, Paul. I like the idea of storing it to elastic-search, because this would give you operational intelligence searchability that enables you to identify a particular data-item or incidences around a particular event. So, if you have anything to write about that, I'd like to read it.

Like
Reply

To view or add a comment, sign in

Others also viewed

Explore content categories