Challenge: Building data driven application on top of Data-Warehouse
Background
Recently, I started working at Nexite. Nexite collects very unique data, which is the locations of items in stores. On top of that data we want to build decision enabler dashboards to our customers.
We want to have a self service general dashboards (for several stackholders) with preset widgets. The customer could have detailed and aggregated views, and should have several filtering options.
We are in process of building the backed data-wearhouse(DWH) to support these requirements. The DWH is built on top of BigQuery.
We are now trying to pick the right solution for connecting the dashboards with the DWH, This service is referred as the `data service`.
We have some options for the implementation of the data-service, and I thought it will be interesting to publish this article and to see what solution exists out-there.
Architecture
Data service implementation
Build it ourselves
Building the data application in the regular way. Will build a service that serve and protect the data. We will have full flexibility in the API this service provides. And we will have full control the sql queries we use.
We will also need to build the presentation layer, using js UI framework that supports plotting such as D3.
My concerns about this approach:
- Time to market - Since the data application is going to be part of the critical path we must have production grade application from day one. Our resources are limited, my team for instance is only 3 engineers (1 fronted 1 backend and me).
- Industry standards - To build yourself means you need to find the industry standards yourselef, you need to pick which standards you want to implement, and which are not worth the investment at the stage the project is at.
Managed BI Service
Use fully managed analytics service, such as Looker, redash or even PowerBi.
Those kind of services have two configurations that can help us:
- Building the semantic layer and the presentation layer both using the BI service, and placing the presentation layer (which are the dashboards and widgets) in the Nexite console using Iframes. Using some kind of seamless authentication to the iframe, in order to protect the data.
- Building only the semantic layer in those services. The presentation layer will be built by our frontend engineers. The widget will interact directly with the BI-service using the BI service API.
The time to market of the iframe options looks pretty fast, and also I guess that there exist already many types of visualization that can give us the freedom to try and explore widgets. But I am afraid console website and the dashboard will not look homogeneous, and the customization levels that we will have on the presentation layer will be limited.
On the only API option, it will save us the time and effort of building such an api, we will be focused on building the semantic layer which is awesome. Except for price, are there downsides?
I think the biggest concerns I have here:
- The pricing model is per user of the bi service, while it seems that we are not going to charge our customer per users. Our customers could have many users in each account, and we don't want to buy a license for each user in the account.
- I am not sure these systems were design for this kind of use-case, I am afraid I am missing gotchas in this design.
OpenSource Solution
The OpenSource projects I have in mind are: Apache superset, redash or even metabase. The configuration options are pretty close to the ones I described on the managed BI service.
But since its opensource I don't need be concerned about the pricing model.
I also think that the customization levels of the project once its opensourced are higher, am I too naive here?
Summary
I am really torn which of the above solution will produce the best quality less effort solution.
The first and foremost thing I want, is to find the solution that will help the team and I to be focused on what makes Nexite so special, which is our data.
I want the team and I to be focused on extending the DWH, on slicing and dicing the data in many interesting ways, and to present that data in a clear way to our customers.
If you read so far, I really appreciate it. If you have suggestions or you want to share your experience in solving these kind of challenges please use the comments.
BTW: Nexite is hiring, so if building data apps on top of this data could interest you, don't hesitate to contact me as well.
Thank you
I had the same challenge at kin. We eventually went with option 1 but as you mentioned required front end full time and bi people for the data layers. I can also advise you to check out SiSense
Should all the users be able to create/modify dashboards? if not, you might want to consider QuickSight where you have session capacity pricing, and not just per user pricing. I'd also recommend scheduling a session with an AWS Analytics expert on https://floor28.co.il/meet-aws-expert (it's free, and they can help looking into your exact requirements)