Mapping Business Requirements to Solution Delivery: User Profiling for Data Tiering Decisions
Part 1: The User Profile Test
Longtime users of Elasticsearch know that Elastic is FAST. Its speed is one of many reasons that Elasticsearch is ranked the #1 Search Engine and has been widely adopted across use cases cases generally falling into three categories: Enterprise Search, Observability and Security.
While speed is great, it also has a direct correlation to cost and therefore needs to be considered as part of the solution requirements.
In November of 2020, Elastic announced Searchable Snapshots, an Elasticsearch feature powering new Cold and Frozen Data Tiers. By May 2021, both tiers were GA. These tiers offer significant optimization on Elastic workloads, scaling to $100Ks+ per year.
The Benefit: Significant Cost Saving Opportunities! Yay!
The Challenge: Generally speaking, my team works with administrators delivering Elastic-as-a-Service to their internal teams. With this, two issues arise:
Over the last year, my team has engaged in deep discovery with both existing and prospective customers to understand how to think about data tier distribution and the associated cost savings opportunity. This is Part 1 of 2 on the results.
TLDR: Distribution across Data Tiers is better thought of as a business requirement, rather than a technical requirement. Therefore, decisions on data tier distribution should be driven by the Users of the Solution, rather than the Administrators.
As there is not a good way to get the type of information we are looking for from Elasticsearch metrics today, I created a behavioral 'User Profile Test' which works as follow:
Step 1: Administrators go to 2-3 trusted users/teams and ask the following question. (Note: For best results, this question should be asked blindly with the user/teams not knowing why the question is being asked. This can be done by email, slack, etc.)
"Complete the following with request to your day to day use of {InsertCurrentToolName}*
Where: X is your most common search, Y is the next common search, etc."
Example answers from three real teams:
Recommended by LinkedIn
Interestingly, User Profile responses differentiated by customers despite use case alignment. We did not observe trends that would allow us to say: "For any Logging Use Cases: 7 Days Hot, 30 Days Warm, etc." This further supports the conclusion that data tier distribution is a business requirement and should be considered based on the particular use case rather than a generalized "industry standard".
*Important: While {InsertCurrentToolName} can be an Elasticsearch cluster, the test is equally applicable to non-Elastic tools, i.e. Splunk, DataDog, etc.
Step 2: Map the written response(s) to a table and discuss.
Heres an example of what this looks like respective of Example C:
"With our existing tool, 40-45 % time I am searching the last 7 days, 40-45% time I am searching the last 30 days, 10 % time, I search the full 365 days and when I do it takes 5+ min. to get a response"
You'll note that while Elastic offers 4 Data Tiers, there is no requirement to use all 4 tiers. Therefore, we use this test to answer not only how we should distribute data across tiers but also which tiers we should use at all. For the particular example, we are justified to distribute 7 Days to the Hot Tier and 23 Days to the Warm Tier (this gives us a total of 30 Days across Hot and Warm). Given only 10% of usage is beyond 30 Days and the fact that users report a high tolerance for slower search, we distribute all data greater than 30 days directly to the Frozen Data Tier. At this point, we don't have a clear reason to include the Cold Tier and so it is omitted.
In cases where multiple teams provided responses, we use the same model side-by-side to map each teams answer and compare/contrast overlaps in requirements.
Takeaways from this past year of study...
Conclusion: We now know how to think about data tier distribution, and have a repeatable test we can use to get metrics for analysis. We still need to understand cost implications so that we can fully define the savings opportunity.
To learn more about Searchable Snapshots and Elastic Data Tiers, see below resources:
Really cool! Thanks for sharing. :)
Great article and approach Danielle....and one that more Users of Elastic should embrace to ensure their implementation meets the business requirements.
Great article Danielle Abraham thanks for sharing your insights
Danielle Abraham Thank you for sharing this insight! I am curious to see if Brad Quarry will share his point of view on this. Brad has shared some thoughts with me about the use of the terms cold and frozen possibly causing users to expect less of Elastic's capabilities and responsiveness in those tiers. Mark Simoes Bindurao Kulkarni Dinakar Challa please take a look at this article from my colleague. I look forward to hearing your thoughts and to exploring how we can help optimize your service. Jennifer Zorza Alan Sizemore Katherine Sasek, PMP Eddie M. Shri Bodas Shane Davies Aram Favela Jisha Thekkittil