Elasticsearch for...Regular Java Developers
[This is a bit long. If you just want to use the code go ahead and clone this repo]
A few years ago, when I started playing with Elasticsearch I was expecting the tool to be more 'Java Friendly', after all, ES is a service, written in Java backed by a search engine (Lucene) also written in Java!
By the time I didn't like the official
because I was looking for something that felt more like Hibernate and Spring-ORM. The corresponding Spring Data Elasticsearch was also a disapointment for many reasons, such as explicitly requiring an @Id embedded in the document and the coupling with the binary transfer protocols (thrift) as well as the elasticsearch jar itself, which in turn transitively brings along several companions such as Lucene, Netty and JNA to name a few.
To put simply, what I was looking for initially, besides basic CRUD-support, was to provide 4 main features:
1)Transparent bi-directional type conversion:
ITypedApi<VO> api = ...
VO vo = new VO();
vo.timestamp = LocalDateTime.now();
vo.title = "A short summary";
vo.someNumber = 1;
String autoID = api.insert(vo);
VO recovered = api.get(autoID);
Assert.assertEquals(vo, recovered);
vo.title="No Title";
api.saveOrUpdate(id,vo);
recovered = api.get(autoID);
Assert.assertEquals(vo, recovered);
2)JDK-8 Streaming support (Bulk Inserts and Paged Fetching):
Stream<SomeType> source = ...
BulkInsertResult result = api.bulkInsert(source);//wraps cnt, ok, err...
/*
Tunnable Chunked Evaluation: Use sensible defaults and allow clients
control the balance of Remote Roundtrips (Rest Calls)
vs Memory Footprint (Page Size)
*/
Stream<SomeType> scroll = api.query(...,optionalTuning);
// Stream is lazilly populated with data from storage backend
scroll.forEach(t->{...});
3)Constructs that would benefit from strong statically typed languages: ES JSON parlance is great and well built, but I don't want me or my team googling every time the need for nesting a range query in a should clause shows up. Yes, I want to shamelessly be able to use my editor's content assist.
4)To use indices, primarilly as a fast, distributed and fault-tolerant MultiMap-like data structures, leaving full-text capabilities to be manually activacted by advanced users. In earlier ES versions, indices used to be created with sub-optimal metafields such as _all and even in more recent versions, some under-the-hood operations like automatically indexing 2 versions of the same field (one analyzed and the keyword value) still remain as out-of-the-box features.
[You might be asking: "Why use a full-text search engine as a key-value store?"
First and foremost, the kind of application I have to build and the kind of contracts I work with (Airspace & Defense) seldom require 'did you mean...' searches and often indulge me with expensive and state of the art hardware. I mean, if you're given the opportunity of working with brand new boxes of NVMe SSDs you must do them justice by deploying software that will take full advantage of it's capabilities.
Secondly, I have worked a lot with Lucene, know the framework's internals and know it's blazing fast for my purposes, so any possible performance hiccups would most likely come from excessive network calls or something else, not from the core engine: When it comes to storing and processing reasonable amounts of data, knowing the tecnology's constraints and caveats in depth is something I hold dear. After some research and testing of other competing technologies such as Solr, Neo4j and OrientDB it felt like Elasticsearch offered the best set of non-functional capabilities I was looking for.
Last but not least, my team is very heterogeneous when it comes to experience and skill levels, but one thing was certain: No one was acquainted with the concepts and techniques used by full-text search and everyone is smart enough to instantly grasp the notions of key-value stores. I wanted everyone, from the interns to senior developers to embrace the technology and, more importantly, to quickly get productive with it.]
By the time I decided to start coding my own 'elastic-crud-wrapper', there was no official Java REST Client and I wasn't sure from where I should start...Luckily for me Daniel Mitterdorfer had published an article, a benchmark comparing Transport Client vs REST Client, which basically concludes that: "Yes, there's a bit overhead in using JSON over HTTP, but the decoupling and api stability you get from using REST certainly offset the performance benefits of the transport protocol."
After playing a bit with basic operations using the REST client, I was convinced that this was the way to go: All I needed was to properly model some Java classes and let Jackson do its thing. The result was an easy-to-use api that has been helping me and my peers to 'work with Elastic Search without knowing that you're using it'.
[Disclaimer: I have no ties to elastic.co or any of it's partners. I wrote this essay because I really enjoyed working with their product and want it to thrive even further!]