Fluency and Readability in API Programming

DISCLAIMER: This post expresses purely my opinion and a reflection of my past and present experiences as an engineer. They do not reflect the opinions of past or present employers.

Flow control mechanisms are one of the first tools that an engineer learns: how to build different code paths depending on one or more Boolean conditions. Most of us know, but don't continuously think about, the complexity added by such statements - especially in the context of the larger application we're building - particularly if we're too busy translating specifications written in our native language into code. In industry, there are arguments surrounding the definition of code readability - which can sometimes be interpreted as "does the code read like English" or "does the code read like the requirements and cover the edge cases". This leads engineers astray when designing certain systems. Code should be written concisely and with fluency, expressed with the full vocabulary of the language in which it is written, and not the translation upon which it was based.

APIs: A Case Study

Since Web APIs are publicly accessible, they make a good case study. Let's say I want to build a web site that uses a search API, and the first feature I'm require is inequality-based search. A classic use case for this feature is the breakdown of cost bands for price ranges on an e-commerce site: $0 to $10, $10 to $20, and so on. On the client side, the final URL might look like this:

https://my-api.example.com/api/items/search?min_cost=10&max_cost=20

If you think this is a poor design, I would agree, but examining some APIs hosted by Twitter [1] [2], Facebook [3] and Walmart [4] shows it is more common than one would expect. All specify a min and max search parameter in some form, though Walmart specifies inclusive-only semantics in a single parameter, and neither Twitter nor Facebook even document whether the range endpoints are inclusive or exclusive. Since HTTP parameters are key/value pairs, inequalities require some intelligent default, such as exclusivity (greater than, less than), and require an additional flag to specify that only the bottom of the range is inclusive. This simple design might not seem a problem at first until one considers a simplified version of what the server has to do with these values to generate a human-readable, SQL-like query:

public String getCostQuery(Map<String, String> parameters) {
  String val = parameters.get("min_cost");
  Double min = val == null ? null : Double.valueOf(val);
  val = parameters.get("max_cost");
  Double max = val == null ? null : Double.valueOf(val);
  // handle "less-than-or equal" and "greater-than-or-equal"
  boolean include_min = Boolean.valueOf(parameters.get("include_min", "false");
  boolean include_max = Boolean.valueOf(parameters.get("include_max", "false");
  String min_op = ">", max_op = "<";
  if (include_min) min_op = ">=";
  if (include_max) max_op = "<=";
  List clauses = new ArrayList();
  if (min != null) clauses.add(String.format("(cost %s %g)", min_op, min));
  if (max != null) clauses.add(String.format("(cost %s %g)", max_op, max));
  if (max == null && min == null) {
    // error! no min or max!
  }
  return StringUtils.join(clauses, " AND ");
}

The server uses > when include_min is false, and >= when include_min is true; the same is true of < and <=. This limited use case requires:

  • Checking the inputs to ensure they are numbers. This is necessary even if using a mechanism like a Spring formatter, which would necessitate a Map as the input parameter and instanceof rather than String parsing checks.
  • Checking to see whether the extrema are included, and if so, change operators
  • Combining the clauses, which can be one or two inequalities per field; throw an error if both are missing. The logic would be worse yet without a convenience function like Apache Commons StringUtils.join().

The number of possibilities for this single parameter query is large compared to the input size:

cost > 10
cost >= 10
cost < 20
cost <= 20
cost > 10 AND cost < 20
cost > 10 AND cost <= 20
cost >= 10 AND cost <= 20
cost >= 10 AND cost < 20

Even discounting error conditions in this simple case, there are 2^7 = 32 possible execution paths that need testing. This API may not seem terrible to the client or developer implementing it, but there are a number of issues to consider:

  1. Unit and system testing such an API is going to be woefully unpleasant and time consuming since cyclomatic complexity will be high.
  2. For consistency, every numerical search attribute requires a min, max, include_min, and include_max parameter, quickly devolving into a maintenance nightmare for both client and server. At best, the query strings are unwieldy; at worst, it can be the source of obscure bugs. For example, HTTP GET calls may have a limited URL length depending on the browser, even though the HTTP specification does not provide one. Internet Explorer - the bane of a web developers' existence - had limits as low as 2048 characters through version 8.0 [5]. An alternative, using HTTP POST and migrating the query string to the payload, denotes a "write" operation which removes the parameter list from the limited URL at the expense of an inconsistent API.
  3. The use case currently excludes an equality operator. Using the API as is, the semantics would look awful from a client standpoint: "A >= B AND A <= B". Having 5 parameters - one for equality, min, max, include min, and include max - is even more onerous and constitutes bad design, particularly since some of the parameters become mutually exclusive. Instead, APIs utilize more limited syntax and capabilities.
  4. Even minor enhancements will make the implementation unbearably complex:

Enhancement #1: Supporting functionality such as "or," such as "version=2 or version=3" or the SQL-like clause "hour in (1, 3, 5, 7)"

Enhancement #2: Allowing date and/or time ranges. In this case, another set of "if" statements might be needed to identify and handle the same syntax but for different types, for both client and server. Some languages, such as Javascript and Python, can make this simpler and safer with "duck typing" [6]. As for languages like Java, there are mechanisms to perform type conversion, such as a Spring formatter, to do the type conversion, but that still doesn't solve the problem that a query language might not have numerical operators to support date and time types. This, in turn, could convince engineers to convert datetime values to epoch time, a long integer, which has its own issues that I won't delve into here.

Enhancement #3: Let's consider the idea that cost is not a fixed point, but a range. For example, allowing third party sellers to list their prices and showing the price ranges of all sellers in search results. This enhancement is a bit more difficult, since the parameters min and max, which is what most of us think of as the bounds of a range, have already been taken. If the convention is to search price=x, min_price=y, etc., then the convention breaks down: "min_min_price" is nonsense. Searching for ranges of ranged values is really a search for intersecting intervals.

Let's revisit Walmart's API for a moment. It's range semantics are good, but not expressive enough. My prior employer, RealMassive, utilized an interval syntax for ranges [7] that has some interesting properties:

1. Compact expression. All inequalities and equalities can be expressed with the same parameter value. There is no need to have a min_value, max_value, include_min, include_max, etc.

2. Familiar form. The inequalities are expressed as intervals from early math education, and would be more familiar than some newly invented form. Consistency or familiarity are a hallmark of good design principles [8].

3. Simplified comprehension. The concept of ranged searches of ranged values are easily expressed in the API and understood by humans. If I am using a search interval of [10, 20] and the intervals of the candidates are [8, 12], (20, 30), and so on, it is easy to conceive of these as ranges on a number line where [8, 12] is a match.

4. "One syntax to rule them all:" foo=3 for equality, foo=3|4 for "in list," foo=(3,) for exclusive inequality, foo=(,10] for inclusive inequality, and as specified in #3, all other cases as well. There is also implied support for other types too: dates, datetime (with or without time zone), and ranges of ranged values.

4a. Simplified client-side use. Producing the query parameter can be done in many different languages with minimal use of flow control. Even C, known for its verbosity, can compactly produce all inequalities: printf("parameter=%c%g:%g%c", low_operator, lower_bound, upper_bound, high_operator). The operators can easily be put into a map/dictionary data structure to utilize a key/value lookup instead of the more costly "if" statement. The "in list" syntax can be easily handled by something like Java's String.format("parameter=%s", StringUtils.join(args, "|")) or other library equivalent.

4b. Simplified server-side use on a number of dimensions:

  • Syntax is either correct or not, and can be validated with regular expressions which, though more complex, are more expressive and comprehensive than "if" statements. Web APIs are a language, and should be validated using more accurate methods other than unit tests.
  • Regular expression matching provides better success and error handling by performing a language transformation. Parameters can be extracted easily via capture groups rather than complex parsing. While unwieldy at times, regular expressions offer a better cause than a simplified error message; the API need only provide a general error message and the failing RegEx back to the caller as part of the response. The programmer consuming the API can then use a separate tool to validate one-off inputs, or better yet, incorporate the regular expression into his own automated integration tests.

Conclusion

API design and usage are two cases where improperly utilized flow control results in bloated code. In this case, the fluent solution should involve linguistics and language theory rather than flow control. Even something as simple as a lightweight search API can grow horribly complex when flow control is overused.

Citations:

[1] https://twitter.com/search-home#

[2] https://twitter.com/search-advanced

[3] https://developers.facebook.com/docs/graph-api/reference/age-range/

[4] https://developer.walmartlabs.com/docs/read/Search_API

[5] https://support.microsoft.com/en-us/kb/208427

[6] https://en.wikipedia.org/wiki/Duck_typing

[7] http://docs.realmassive.apiary.io/#reference/spaces/search-for-a-space/fast-search-forspaces

[8] https://www.cs.umd.edu/users/ben/goldenrules.html

To view or add a comment, sign in

More articles by David L.

Others also viewed

Explore content categories