Making the Most of JSON-API

Phil Sturgeon
APIs You Won't Hate
6 min readOct 30, 2017

--

With JSON-API (and any other compound document-based advice) starting to feel a little outdated in 2017 thanks to HTTP/2, it’s tough to continue recommending it to people.

That said, there are plenty of reasons for people to still be using JSON-API for the near future:

  1. Existing APIs with an underlying codebase tightly coupled to JSON-API, that cannot serialize multiple formats
  2. Teams that are all very used to JSON-API and are on tight deadlines to deliver a working solution
  3. APIs that are consumed in the browser, especially those which need support for mobile browsers, or operating systems older than Windows 10, or OS X 10.11 (source)

Let’s look at a few of the problems I’ve hit using JSON-API in production for two years.

Scope for Compound Documents

At a carpooling company, with trips which could contain many riders. We exposed /trips?include=riders and everything was great, until the web and iOS clients let us know months later they’d been filtering on the client side to map only active riders.

Client-side filtering is a concern as it overworks the application server and database server. It also forces clients to wait on unnecessary data to transferred over the network, and makes the browser use more juice than it needs to.

Some developers thought it was obvious to restrict the “riders” relationship to only “active riders”, and pushed code to make that happen. It turns out that broke the Android client, which actually had a Previous Riders screen and was expecting the “riders” relationship to have all statuses not just active.

My suggestion was to deprecate the riders include, and create active_riders and inactive_riders includes, a trend which we continued throughout the life of that API.

Avoiding default scopes is important, as the “default” for one application might be different to another. Don’t try to assume what default scopes should be for ever client, just make it very clear what relationships have scope by putting it in the name.

"relationships": {
"active_riders": {
"data": [
{ "type": "people", "id": "1" }
]
},
"inactive_riders": {
"data": [
{ "type": "people", "id": "2" },
{ "type": "people", "id": "3" },
{ "type": "people", "id": "4" }
]
},
"pending_riders": {
"data": [
{ "type": "people", "id": "10" }
]
}
}

This avoids the trouble of expectation not matching reality, avoids accidental changes to reality, and makes it very clear what you’re getting.

So long as you’re careful to handle this code on the backend, and you’re able to fetch your data in a way that isn’t now three times slower. If you’re querying a database for example, you need to avoid temptation to make three separate extra queries:

  1. ... WHERE status = active
  2. ... WHERE status = inactive
  3. ... WHERE status = pending

If you can avoid doing that, performance should not suffer too badly from multiple relationships. Basically you need to fetch everything then split it yourself, which… is what we were trying to avoid all the clients doing. At least adding HTTP headers will help clients hopefully cache that result.

Some people try solve this in another way: filtering relationships.

Filtering Relationships

If by default the whole relationship is everything, most SQL-minded API developers will then think: I need a WHERE clause. JSON-API doesn’t really provide anything for this, but does suggest using a query string parameter called ?filter=.

The spec is extremely vague about filtering:

The filter query parameter is reserved for filtering data. Servers and clients SHOULD use this key for filtering operations.

Note: JSON API is agnostic about the strategies supported by a server. The filter query parameter can be used as the basis for any number of filtering strategies.

That’s all the spec says, but various conventions exist. The most popular suggest is filter[riders][status]=active, which takes the relationship as the first array key, then the resource field as the next key, before finally taking the value.

Filtering like this is not so bad when it’s a tightly controlled whitelist of possible fields, but many folks implementing JSON-API get a bit carried away and allow anyone to filter by anything. This is rather common with SQL-minded developers, but an API should not be treated like an SQL database, especially if you’re handing out API keys to developers who could perform query-like requests that you don’t know about.

Allowing people to filter by any field on any relationship on a API collection that’s pulling from a database would lead to slow queries. It’s impossible to know which indexes to add ahead of time. The first rule of API development is “Clients will surprise you.” Trying to index everything is damaging to performance in a bunch of ways.

Restricting the ways in which clients can filter is a good idea, and optimize the ways in which they do. As clients request further filters, weigh it up, and maybe add it if you think it’s a good idea.

It’s a little tough to code and document these nested filter options, but you’ll need to figure it out, or suffer from awful performance.

Limitations on Relationships

Another tricky choice, is figuring out how many items you put into a relationship. Do you put 10? 100? Literally all of them?

At first you’ll say “Oh there wont ever be more than X”, and again, that’s almost certainly going to change since that initial assumption. The carpooling company said “Oh there’s never going to be more than 5 riders, because a car wont fit more than 5!”

Then we added vans with 20 riders, and im sure at some point there would have been actual buses with 100. After a few months there could be several hundreds of riders with various status, and that’s getting to be some chunky JSON.

JSON-API offers fantastic advice in the Pagination section:

A server MAY provide links to traverse a paginated data set (“pagination links”).

Pagination links MUST appear in the links object that corresponds to a collection. To paginate the primary data, supply pagination links in the top-level links object. To paginate an included collection returned in a compound document, supply pagination links in the corresponding links object.

Basically they recommend shoving some "next" links in there.

"relationships": {
"active_riders": {
"links": {
"related": "https://api.example.com/riders?trip_id=23124"
"next": "https://api.example.com/riders?trip_id=23124&cursor=learn-about-cursors-in-the-book"
}
"data": [
{ "type": "people", "id": "11" },
{ "type": "people", "id": "2819" },
{ "type": "people", "id": "3223" },
{ "type": "people", "id": "36" },
{ "type": "people", "id": "643" },
{ "type": "people", "id": "3434" },
{ "type": "people", "id": "232" },
{ "type": "people", "id": "1123" },
{ "type": "people", "id": "1563" },
{ "type": "people", "id": "1357" }
]
}
}

Ooo, links. Otherwise known as HATEOAS! Hypermedia as the Engine of Application State which in its most simple form often is “Hey you can go and get more stuff over here if you want it.”

Putting in just a few items for the relationship (say… 10!), then having a URL for clients to fetch more stuff if they want it is awesome. They can go grab that either when the end user wants it, or preemptively.

This is a lovely combination of two approaches. On one side its exactly the sort of thing the REST is asking you to do, whilst still providing a bit of an optimization for HTTP/1.1-stuck clients scared to make multiple requests.

You do of course need to make sure you have that /riders?trip_id=23124 endpoint, and you’ll probably need to make sure you have all the same query string filter options as you do with filter[riders][status].

By the time you have all these things done twice, you start to wonder if it would be better to have just the one way of doing things… and that’s where I’m at these days.

I advice against using includes, and if you do use them you need to really restrict how and where you use them, and also put links in for all of your relationships, pagination, related actions, and anywhere else that makes sense.

Summary

For those really set on using JSON-API, watch out for these troubles I’ve bumped into, and check out other articles by other smart folks who are getting into it.

Avoid building a slow SQL-over-HTTP API, that appears to works for a few months, and fails horribly once your database tables are a little better stocked than your test data. Sure it’ll fulfill any query that anyone could possibly want, but it’ll do it poorly. 😅

One cool thing about using JSON-API for now, is that you can offer includes and links. Then you can slowly move over to a more hypermedia based approach as HTTP/2 usage becomes more prolific; deprecating and removing includes as clients upgrade to just making web requests to nice, HTTP-level cacheable collections, with top level query strings and no messy compound document stuff getting in the way.

Personally I want to ditch JSON-API and pass most of the work off to HTTP/2, RFC 7807, RFC 7386, JSON HyperSchema, and maybe knock up a baaaasic little micro-standard that just covers items, collections, and a handful of little bits of metadata like pagination, sorting, filters, etc., but that’s a topic for another day.

--

--

Phil Sturgeon
APIs You Won't Hate

Bike nomad turned electric van nomad, boycotting fossil-fuels, working on reforestation and ancient woodland restoration as co-founder of Protect Earth. he/him