REST Modeling:  designing a future-friendly API

REST Modeling: designing a future-friendly API

I wrote this article because I see a lot of REST APIs in my role and many of these APIs make the same modeling mistakes. I also believe there are not enough articles that discuss best practices of API modeling. That is, how to model parent child relationships and the impacts of the modeling decisions we make.

For example: When should you nest child objects (resources) inside other objects? What are the trade-offs? When would you choose not to? Will I live to regret today’s design choices later? 

What this article isn’t: The internet is chock full of helpful articles that describe RESTful APIs. These are essentially checklists that cover the commandments of REST: use nouns, use versions, use HTTP status codes, etc.  This article assumes you have read that stuff or that you can Google it.

You are likely familiar with OOAD – Object Oriented Analysis and Design. You’re probably already preaching DRY (don’t repeat yourself) and Single Responsibility Principle. Think of this as object-oriented design for RESTful APIs.

The Requirements. We start with example requirements of a very simple Human Resources API that provides access to Departments and Employees. We know that Departments have Employees, so there’s a parent-child relationship. 

They’re just web pages. In REST terminology, our simple Departments and Employees are “resources”. You could also think of them as objects, but we’re talking REST here. A REST resource is like a web page. Personally, thinking of API resources as web pages helps me to visualize why REST is so powerful. REST resources should have a consistent URI (path/name), be cacheable and stateless -- just like static web pages are. 

When you think of API resources as web pages, you can use existing HTTP technology: proxy servers, cache systems and more to serve these resources. You could literally write each of your API resources to a file system as text files and serve them untouched if your requirements allowed.  API resources are obviously a bit more clever than static files, but it is good to stay grounded in our REST roots. We have complex security layers, composition and collaboration of services and other features that make our API lives more interesting.

Back to our example: given these requirements, we’ve already practiced on a couple of “Build an API In Five Minutes” demos and come up with a first rough draft design…

/departments/     // returns a department collection
/departments/{deptId} // returns a department resource
   - Id
   - Name
   - (other attributes)
   - Employees      // a collection of employees in this department.
     - Employee
       - Id
       - Name
       - (other attributes)
 
/departments/{deptId}/employees/{id// returns 1 employee for a department.
  - Employee 
    - Id
    - Name
    - (other attributes)

There are some good things about this design. The /departments/{id}/employees/ method nicely models the parent-child relationship. It’s intuitive that the employees are part of this department. 

There are several disadvantages to this design. 

The REST problem. The first issue is that design violates the REST principle of accessing an Employee resource through a consistent, single URI. In the design above, the Employees are accessed two ways. Sure, pragmatic developers can often drift way from REST religion for a bunch of reasons, but in this case we find would find ourselves maintaining our Employees collection twice. We duplicate our test cases, too.

At this point, we usually hear API designers speak of how they’re doing the API consumers a favor. Consumers only have to make one call – not two!  All those employees are pre-fetched for the consumer! Think of the savings! Not so fast. While it’s good to anticipate how a consumer will call your API, it’s a slippery slope to presume to trade off performance for aggregation. Modern clients (and javascript too!) are multi-threaded. Multiple smaller asynchronous HTTP calls are often more efficient that one large call and allow the consumer to get only the data they need, when they need it.

Imagine that a future service extended our service, wrapping our HR API. This upstream service needed to filter employees according to new business logic that needs only active or inactive Employees to be returned. These upstream services would have to apply this filtering in two places: at the Department resource as well as the Employee resource.  Ideally, you would only want to apply this logic one time.

The Cardinality problem.  If a parent resource has many children, it’s said to have high cardinality. If it has few children, then low cardinality. In our example above, the Employees collection could have a high cardinality. A department might have many employees. When you choose to include a child collection of high cardinality, your parent resource may expand in size. With larger resources, you’re spending more time serializing and transmitting these “on the wire” over the network. Also, you’re likely spending more time at the database assembling them from tabular data, assuming you’re using a relational database. These can hurt your performance.

The Pagination problem. Will you ever need to return your data in pages instead of returning it all in one response? Pagination is a key way to improve performance, if done right.  When it comes to pagination, one size does not fit all. Don’t assume you have to expensively ask the database for 300 rows of data, stream those back to the client, do the math to create page numbers for every X sets, only to have the customer view the first two pages, then search again. Modern databases support “skip and take” or continuation tokens to efficiently grab just the first page, then let the user “click for more results”. Fresh results are then added to the bottom of the list.

If our Employees collection ever needs to be paginated, we’re going to have a tough time passing pagination parameters to your child collection. It can be done, but it’s ugly and our API customers won’t be impressed that we added this complex hack so we can return a big resource.  Were we to add these filtering parameters, we would appear to change the internal state of a resource with a GET. For example, “/departments/5/employees?page=1” would return different internal data than “/departments/5/employees?page=2”. 

The Identification problem. Another issue is about identifying Employee resources. Employees have an ID, but we decided to get Employees through the Department resource as “/department/{deptId}/employees/{employeeId}”. This means that Employees are also identified by a department ID. So you can’t ever access an employee just by its ID, for example “/employees/{id}.” You also need to know that Employee’s Department ID. This can be very limiting.

The Flexibility problem. With this design, we would have hard time getting a list of Employees regardless of department. We would have to query each department, get a list of Employees, and then move to the next department’s employees. Our API consumers will not be happy with this.

Counterpoint: FIGMO, it works as designed. Having listed all these design problems, it also might be just fine. Perhaps you are certain that Employees will only ever be in one Department. You also were promised that there will only be a handful of employees per department and so your size will not get out of hand. You’ve also been assured that we’ll never (ever) want to query Employees outside of a Department. If that’s true, rest assured that your requirements will never change and roll forward with confidence. 

Experience, however, tells us otherwise. We’ll change our design to be more future friendly.  As the saying goes “APIs, like diamonds, are forever.” Getting customers off of API version v1 and onto v2 is a lot harder than it sounds. Development costs money and customers hate to redevelop to your new version just because you tell them to.  A year from now we find ourselves with double maintenance, back-patching v1 with v2 fixes because that important customer doesn’t want to keep up.

When would we want to embed child resources in a parent this way? When some children must be identified by their parent. Take the example of an order (or invoice) resource with a child collection of order details. Order details are not usually accessed outside of the context of an order. Even though they may have an OrderDetailId, we will typically be in the context of an order. Order details are a great example of embedding a child collection in a parent resource. They don’t make sense to the application otherwise.

Design 2 – “Thinking in batches”

Let’s modify the design to create two top-level resource endpoints (URI): /departments and /employees.

/departments/{deptId} // returns a department resource
   - Id
   - Name
   - (other attributes)
   - Employees
       - [32,445,21,56,75] //Array of EmployeeId
 
/employees/{id// returns an employee resource.
   - Employee
     - Id
     - Name
     - (other attributes)
 

This second design moves the Employee to its own URI which solves the identification and REST problems. The “/department/employees” returns the Employees collection as (hopefully) a small batch of Employee IDs.  The API consumer would then call the Employee URI for each ID, perhaps in a multithreaded manner. This somewhat solves the pagination problem too. But this is not ideal.  

Alternatively, we could pass the batch of Employee IDs into a “/employees/search” method as a search term and returns a collection of Employees. This is “unnecessarily clever” and can be simplified. Whenever I’m tempted do “batch processing” with web services, it’s an indication that I’m thinking about the problem wrong.

We are still in the mindset that the Department is driving the parent-child relationship. Let’s turn that around and let Employees drive.

Design 3 – “Fixing the issues”

We still have two endpoints, but now we don’t return an Employee collection in the Department.  

/departments/{deptId} // returns a department resource
   - Id
   - Name
   - (other attributes)
   - Employees
     - /employees/department/{deptId}  
 
/employees/department/{deptId}  // returns a collection of employee resources.
 
/employees/{id} // returns an employee resource.
   - Employee
   - Id
   - Name
   - (other attributes)

With this design, we ask the Employees resource to find Employees by Department ID. The consuming application will know to query the “/employees/department/{departmentId}” resource to get a list of Employees for that department. This Employee collection:

  • Is intuitive to the API consumer
  • grows with our data
  • lets us paginate the list of employees easily.
  • lets us reference the Employee resource with a consistent, reusable URI
  • lets us cache the Department resource and not have to update it when Employees are added.
  • lets us easily cache the Employee resource regardless of their department.
  • has no cardinality issues

We can improve our design even further by adding a hyperlink directly in our Department resource:

/departments/{deptId} // returns a department resource, e.g. abc124
   - Id
   - Name
   - (other attributes)
   - Employees
     - http://api.example.com/employees/department/abc123

This is a simple example of REST’s HATEOS (Hypermedia As The Engine Of Application State).  Now we’re doing our customers a favor. Think of the savings! 

Other Design Thoughts

The API is your database.  With web services, whether REST or SOAP, “The API is the new Database.” As an API consumer, we don’t care how the data is stored. We only care how the API behaves.   If your developers are used to always querying the database, there will be an adjustment as they begin to accept that the API is the only way to access the database.  

Use Query Driven Design. Design your API based on how your applications will query it. Be ruthless about this. Do not provide open-ended methods to access data, such as 

String whereClause = “where X=1 and Z=2 and somefield in (32, 33, 34)”;
String apiUrl = “/employee/megasearch?criteria=” + whereClause;

Methods like this are very difficult to optimize and to guarantee a response time (service level) for. If your app is going to need that query, make a method for it and think about how you will optimize for response time.

It is okay to be “app-selfish”. In the new mindset of No-Sql (and SQL) databases, you don’t need to have one database be both a transactional (for apps) and a reporting database. One size does not fit all. Optimize your transactional DB for high performance transactions. Then ship your data off of your transactional DB to your reporting DB and optimize this reporting DB for reporting. 

Don’t let your back end drive your front end. It seems obvious to say, but if you have a slow middle tier or back end database and you connect an internet-scale API to it, you will have a slow API.  Very few legacy databases can support high performance requirements of an internet-scale API. 

Similarly, many APIs are modeled poorly because they are based on inflexible back end constraints (9-5 availability, low concurrent users, slow response times). Don’t let the sins of the past bleed into your shiny new API. Using the app-selfish approach, consider an app-selfish, read-optimized database for your API that is synchronized with your slow back end data. Synchronize with replication or when data changes, with event-based messaging or whatever it takes. Don't try to force your legacy back end to meet internet-scale demands. It just won't work. Build an internet-scale API and loosely couple it to your slow legacy data. Accept that your legacy brick-and-mortar system is slow. Your new API doesn’t have to be.

very well articulated... covering the major design hiccups that we encounter ...

Like
Reply

Nice one David. Good to see you putting your knowledge out there. Your message has certainly remained consistent. I'd like to share this with my team.

To view or add a comment, sign in

Others also viewed

Explore content categories