Optimize your API

Minimize the amount of data returned by a single API request by allowing users to customize what you send to them. There are many ways you can do this:

  • Pre-aggregate metrics such as counts and averages so your user doesn't have to do these calculations themselves.
  • Empower your users to only request the data they actually need for a given use case, for example specific fields or specific quantities.
  • Allow your user to pre-fetch related records in the same request.
  • Make sure there are ways for the user to download and cache reference data such as categories.

If you ensure that your users can minimize the amount of data they request from your API, you will save data and reduce the energy footprint of your application not only for your own customers but also for your customer's cutomers.

Example: A Blog with comments

Suppose you are creating an API for a blog with comments - a poor API might expose an endpoint like the following example as the only way to get a list of the blogs for a given month;

GET /posts/?year=2021&month=10
[
  {
    "id": 1,
    "title": "My first article",
    "date": "2021-10-10",
    "body": "This is the full body of the artilce"
  },
  //... ALL posts follow
]

In order to find the comments for this post, the user must send away a second request to the comments endpoint;

GET /posts/1/comments/
[
  { "author": "Fred" , "comment": "I totally agree"},
  { "author": "Barney" , "comment": "Good post"},
]

Let's say your customer is creating a "browse posts per month" feature for their site. To do this, all they need from your API is the title, the date and the number of comments for a given month - they also want to paginate through 10 posts at a time. With no way of preventing all of this data being sent back, the user is required to make a potentially large request for all the blog data, and then another request per post to get the comment count - this will all include data they do not need.

Now imagine you used a different model and gave the user some control over what they received in the response and pre-aggregated the comment count such that it could be requested directly via the blog endpoint;

GET /posts/?year=2021&month=10&fields=title,date,commentCount&skip=0&take=10
[
  {
    "title": "My first article",
    "date": "2021-10-10",
    "commentCount": 2
  },
  //... more posts follow
]

Through our API we have allowed the user to use the querystring of the request to fine tune the amount of data we return - they request only specific fields (using the fields key) and to specify the number of records they want to return (through the skip and limit keys). No more data being wasted, plus this will likely save compute time and thus speed up the request.

In addition, doing this also gives us confidence that our API will scale because we can now predict (and set limits on) the amount of data a request might return - on our original API implementation, a single month may have many blogs (a metric we might not have control of) and each blog may have many comments (a metric we almost certainly do not have control of!) - that's potentially a lot of data!

Using a specification

Whilst you can design a bespoke API that is optimal for your users, there are a number of specifications out there that provide the recommended optimisations and more out of the box: two in particular are JSON-API and GraphQL.

Both of these provide, by design, implementations that allow users to only request the data they need and offer conventions for patterns such as pagination and pre-fetching related records.

As well as giving you a battle-tested pattern to base your API around, your customers are also likely to thank you for the improved developer experience they receive when consuming your API: Not only are responses now tailorable, they are also predictable and because both JSON-API and GraphQL are schema-based. This means that you can auto-generate your documentation and your API consumers can auto-generate much of their client code!