Data as a Service... Org

This is a repost from my blog at leadernode.com. Check it out for weekly thoughts and insights on data.

I was recently thinking about the purpose of the Data Team at Teachable, or in general, at any organization. This was in the context of figuring out the correct relationship between data and product engineering, which is not the same in every org; but the purpose of the data team, I think, is the same in every org, at least at a high level. The data team is a service organization, and exists to make data accessible to the rest of the organization. This can encompass some different specific things, and it can look different in different places, and indeed the data team can have different levels of autonomy and ownership at different companies, but at the heart of the matter, the data team is there so everybody can get the data they need.

As I say, every organization is different, so your mileage may vary in applying my learnings to your situation; but I have been on (or simply been) the data team at several startups, a very large media company, and, slightly tangentially, at a large university, so I’ve seen a lot of ways of organizing things, and I feel like I’ve picked up some useful ways of thinking about data over the past… let’s say 10… years. Here are some things that I think are important when considering data accessibility in your organization.

  • Everyone in the company is a customer of the data team, and the data team must deliver good customer service
  • The data team owns the data infrastructure and architecture, but it must make its APIs conform with the organization’s needs
  • The data team should constantly be striving to make data more accessible than it already is, and actively seeking ways to do so.

We don’t always think of software engineers in the context of customer service. I think a lot of developers think of customers a little bit abstractly, as in, ‘I’m doing the tasks assigned by the product manager to add a feature that the customers want,’ and most likely the developer rarely if ever meets a customer or discusses their needs with them. A data engineer, though, should absolutely be thinking concretely about customers, because the data engineer’s customer is also her colleague, someone within the same company who is trying to achieve the same ultimate goal (the success of the organization). We on the data team must always approach requests we receive in this context, keeping in mind that in a bigger sense, we’re on the same team as our customers. 

But more than that, it’s useful to keep the service context in mind so that we never become dismissive of the requests we receive. Nobody is asking us to do work because they are lazy, or incompetent, and we shouldn’t ever approach requests thinking that they are-- any more than we would approach feature requests from customers with that kind of mindset. If we don’t understand why someone needs the data they’re requesting, or we think they are asking for the wrong data, we need to reach out to them and get clarifying information, just like we would do if external customers made requests that didn’t make sense to us.

Or course, delivering good customer service is just the foundation. On top of that you need to build a solid (infra)structure; and to make that accessible, you obviously need good entrypoints. I have had to interface with a lot of bad APIs in my life, and if you’re an engineer, I bet you have too. But what really is a bad API? Some of the ones I have hated the most are extremely widely used, and it’s possible that there are people who like them just fine. What makes them bad, for me, is that they don’t fit my use cases. An example, which is pretty common, and something you’ve likely run into if you are in this line of work: lots of RESTful APIs provide event-based data (tell me what happened on X date) but no way to make a relative query (give me all the records that have changed since X date, i.e., since the last time I queried). That’s very frustrating when you are trying to ingest data into your data warehouse! It means I have to jump through all kinds of hoops to get the records I need; usually there’s some search service where I can retrieve the IDs of the records I need, and then submit a batch request on some completely different API, or something like that… but it’s always added steps and added complexity.

My point? Don’t design APIs to deliver what you want to deliver; design them to deliver what your customers want. That’s not bad advice for externally-facing services, but you can’t necessarily know what your external customers want, or how they’d prefer to get it. You absolutely can know what your internal customers want, and you have a responsibility to ask them, and make your APIs work in a way that works for them. Sometimes you might not be crazy about how another team-- your colleagues, don’t forget-- architects their consumer. They might be inefficient, perhaps because they’re beholden to their own internal processes. I don’t like to be absolutist about things, so there may be exceptions to this, but in general, it’s not the data team’s business to worry about how other engineers are doing their work. If you have practical advice feel free to give it, but unless there’s a compelling reason not to, we want to make our services provide data in a format/interface that works for our customers… rather than making them jump through hoops to conform to us.

If you don’t make it easy to use your data, what’s going to happen? It’s likely that you’ll lose your customers in the end. If it’s hard for people to get data from you, they’re going to find ways to get it elsewhere. It probably won’t be your data-- maybe instead of your nice clean, vetted data warehouse, they’ll just pull straight from the messy production database. Maybe they’ll rely on the dashboards built into the third party services they use. If you believe that your organization will be most successful if it uses the data that you provide-- and if you are adding value then this ought to be the case-- then you want to make sure it’s as easy as possible for them to access your data. Beyond the question of making it easy to use your APIs, you want to always be working on getting better data into people’s hands. Across the organization, there are always needs for information that’s not available. Anyplace that people have information gaps, or they are relying on slow or inaccurate third parties, there are opportunities to develop new services that will improve the organization’s chance of success. The data team should be constantly seeking these out and developing solutions to fill in the gaps.

To view or add a comment, sign in

More articles by Peter Jaffe

  • Masters of Data Science

    This is a repost from my blog at leadernode.com.

  • Hiring for Data Science

    What is a data scientist? Data science is a phrase that gets used an awful lot right now (late 2019) but it’s…

  • Practical Data Warehouse Design Considerations

    Over on my blog at leadernode.com, I've been writing about the Single Source of Truth data warehouse we've built at…

Others also viewed

Explore content categories