Ravi Talks Tech

How to Design an API

API Server Guide Prelude

Did all those college classes about data structures, operating systems, and computer networks actually prepare you to create scalable web architectures?

May 7, 2023

What is this?

Regardless of what prestigious university CS curriculums may lead you to believe, most graduates aren’t making out to do groundbreaking theoretical computer science work.1 In fact, the curriculums often don’t even include courses that focus on what most people will end up doing, building web apps.2 Coding bootcamps can provide a lot of practical experience, but even they usually focus more on the “how” than the “why”. The “why” though is the most interesting part, and it’s what puts the engineering in a software engineer’s job.

This post is the preface to a three-part series of guides which will cover how to build a back-end API for a web app. In this series, I want to explain not only how to create an API, but also the sorts of design considerations and trade-offs that you make along the way. Even though I’m pretty sure I’ve perfected the art of software engineering in my five years of industry experience, it is possible that you’ll think the architecture decisions I made were not the ones you would. But in explaining the decision process, my hope is that you will be able to analyze my choices, and then make the choices that best fit your needs.

This post will discuss some of the background information for the guides, namely the design approach I’ll be taking as well as the tech stack I’ve chosen. Because I want to focus more on the design aspect of the server, in later parts, I may gloss over some implementation details, especially for parts for which the implementation is similar to parts I’ve described earlier. For reference, the full implementation will be available in this GitHub repository, with tags for implementation up to each section.

The full series is linked here:

  • Prelude: How to Design an API Server
  • Part 1: Building a Server
  • Part 2: Creating Users (to be published)
  • Part 3: Posting and Commenting (to be published)

What is an API server?

LinkDump

Before we start designing an API server, it would help to first know what exactly that is. An Application Program Interface (API) in its broadest sense is an interface defining the way that two different programs communicate (or even two parts of the same program). The program that defines the API provides a standard way for other programs to make use of its functionality without needing to know how that functionality is implemented. The ability to compose the functionality of many different programs to create something new is the bedrock of all modern software. As an example, you could write a tool that combines the NOAA weather API with Twillio’s SMS API to send your SO a text in the morning when there is a forecast for rain so they won’t steal your umbrella for the umpteenth time.

This broad definition of an API covers a wide range of types of interfaces. It even covers interfaces as low level as Linux’s syscall API, which allows direct access to the functionality of the Linux kernel. However, when people use the term API or API server, they often mean interfaces exposed over the Internet via HTTP, and usually following REST architectural patterns (more on REST later). The NOAA and Twillio APIs linked above are both examples of this. Like those, you could develop an API with the intention that others can use your service, or you may be developing one for your own use, to support the functionality of your iOS app for instance.

Principles of design

For an API server, there are two main parts to the design, the API contract and its implementation. For the API contract, you are defining a specific way in which you require your consumers format their requests, and you are committing to responding to those properly formatted requests appropriately. Getting this contract right at the beginning is important because once you publish your committment, people will start using and relying on your API, making it harder to change. Part of making your contract well is making it as simple and understandable as possible. A good way to do this is by following established norms; we’ll explore how this works with the REST conventions and HTTP in the next section.

For the API implementation, it is less important to get the architecture perfect at the beginning because you can change it without affecting the contract. However, three attributes of your implementation will directly impact the quality of your service: its reliability, its maintainablity, and the ease of iterative development on it. These areas are easy enough to understand abstractly, but if your ambition is to grow your service to massive scale, the complexity of each starts to rapidly increase. While the strategies for scaling such systems are interesting, covering it all would be a large enough task to fill an entire book3, so this series will focus on covering some strategies applicable at the scope of a single repository.

The two main areas that I’ll touch upon are: (1) dealing with component management within a single application and (2) writing code you are confident is correct. Component management is related to both how you break up the different parts of your code into workable small-scoped parts and how you wire those parts together. Confidence in your code is realted to writing code that is testable and is actually tested. I’ll expand further on these points throughout the rest of the series, but for testability part, my approach is largely based on the practice of test-driven development (TDD).4

Also one thing to note, for the sake of clarity of examples, I may write some code in a style that differs from accepted norms, but I’ll try point out where I do (limited to the extent of my expertise in the different technologies).

The LinkDump API

LinkDump

In this series, we will be developing the LinkDump API, an entirely original service where you can post links and comment on others’ posts (please refrain from fact checking). The API will allow for: the registration and retrieval of users, the creation and retrieval of posts, the creation and retrieval of comments, and an inbox which notifies you when someone has commented on your post.

You can look through the full API specifaction, which is formatted as a YAML file representing an OpenAPI spec. To make it easier to read, you can use a tool to render the spec pretty, like the on-line Swagger Editor (at some point I will try to host a pre-rendered version of the spec).

The API follows RESTful (Representational State Transfer) conventions in its design. REST is a set of design principles and constraints which have become the de facto standard for web APIs.5 The main principles are the idea of breaking out the data the API stores into resources, having a standard interaction pattern with these resources, and making client-server communication stateless.6 In this case, I’ve decided to break up the data we need to store into four resources: users, messages, posts, and comments. When designing an API, this will be the first and most foundational decision you will have to make.

Each of these resources has its own path, like “/posts” for dealing with the post resource. Resources can also be nested, as in the case for the comments (as “/post/:post-id:/comments”). The most common way to interact with the resources is by using the HTTP methods. This style of interaction is defined by three things: the path, the HTTP method, and the request body (only necessary for some methods). Taking posts as an example resource, these are some of the commonly used interaction patterns:

MethodPathBody?Description
GET/postsGet a list of all the posts
POST/postsCreate a new post with the data from the request body
GET/posts/:post-id:Get the data for the post with the specified ID
PUT/posts/:post-id:Replace the data for the specified post with the
data in the request body
PATCH/posts/:post-id:Update the specified post only for certain fields
specified in the request body
DELETE/posts/:post-id:Delete the specified post

This list is not exhaustive, and some APIs may stray from these conventions. APIs also do not need to necessarily implement all the functionality.

The LinkDump API only supports the getting and creation interactions (i.e. no PUT, PATCH, or DELETE). It also has endpoints for functionality outside of the resources, specifically “/register” for adding new users. If you want to play around with the API, you can make requests to my reference implementation via Postman (please be nice so I don’t have to routinely cleanse the database).

The tech stack

The ideal tech stack is one using all the hottest new technology trends, so you can prove how hip and cool a developer you are. I can only aspire to that level of engineering clout however, so I’ve gone with a different strategy. For this implementation of the API, I have tried to choose technologies that are flexible in their design and are also relatively popular. This way the tools will not require a specific architectural approach, so we can compare different options. Popular tools will also have a lot of solutions to common problems easily Googleable.

The database: MongoDB

MongoDB is a NoSQL document-based database. When choosing a database, the biggest choice you make is between a relational and a NoSQL one. NoSQL databases do not store data in the form of tables like traditional relational databases do. In doing so, they either don’t support or make it more computationally expensive to perform certain actions like making ACID transactions, joining tables on relational keys, or validating data based on schemas. Also, they usually only provide eventual consistency, which means that after writing to a record, there may be a period of time when clients requesting that record will get the old value. It is up to the applications using the database to handle these shortcomings.

Why in the world would you choose a NoSQL database if it just makes more work for yourself? The biggest reason is that they make it easier to scale the database up to huge sizes and geographical distributions. The CAP theorem famously proves that out of data consistency, database availability, and partition tolerance, you can only have two guarantees. As mentioned earlier, NoSQL databases sacrifice full consistency, preferring availability and partition tolerance. Partition tolerance is mandatory to achieve massive scale; there is an upper limit to how much processing power and storage a single machine can have, so to scale, you need to be able to utilize multiple machines. When you use multiple machines, there is always a chance that network or hardware failure will cause a communication gap between them (i.e. a partition). Being designed with partition tolerance in mind, NoSQL database technologies usually make it easy to just throw more machines at a database to increase its capacity and performance. They also make it easier to add fields to your records because they generally do not require that all the records in a collection must have the same schema.

For the application we will build, it is unlikely that you will end up supporting millions of users, so the drawbacks to using a NoSQL database likely outweigh the benefits. However, if you were building an application that you hope will reach scale, then it could (though not necessarily) be good to use a NoSQL database to start with to avoid the complexity of switching to one later down the road. The schema flexibility also makes it easy to quickly prototype an API. Amongst the NoSQL database options, MongoDB is good partly because its document-based storage is suitable for our application, but mostly because it is very popular, and so the fixes to all the esoteric error messages you might encounter are waiting for you on StackOverflow.

The server runtime: Node.js (JavaScript)

A brief history

Before getting to the discussion, a brief primer on the history of JavaScript. Feel free to skip this section if you just want the rationale.

JavaScript was released in 19957 for use in Netscape Navigator, then by far the most popular browser available. Until the release of WebAssembly much later on, JavaScript was the only language you could use to make HTML interactive, and as a result, all website developers had to be familiar with it.

As JavaScript left its infancy and grew more popular, its browser runtimes started adding more functionality. In the early 2000s, Web 1.0 was slowly giving way to Web 2.0, where websites transitioned from being simple static pages which required loading a new page to change the view, to dynamic applications that could send and receive data from servers. This was built on the back of the AJAX paradigm popularized by some of Google’s web apps, like Gmail. The AJAX paradigm (Asynchronous JavaScript + XML) involved making requests to the server asynchronously instead of blocking the application to wait for a response, and so users were able to continue to interact with the web page while it was communicating with the server. This programming style eventually culminated in the HTML5 standard event loop, which provided a standardized way for executing asynchronous JavaScript code.

In 2009, Ryan Dahl released Node.js, a server-side runtime that broke JavaScript free from browsers for the first time. He took the new and very performant Google V8 JavaScript engine from Chrome and coupled it with a slightly modified version of the browser event loop along with necessary system functionality like file I/O. Node.js thus became the first popular runtime to implement an event loop model of asynchronous computing. Over time, many web engineers found that the event loop model was as natural a fit on the server as on the browser since web servers generally have to make a lot of I/O calls to databases and other web servers. Because of this and the fact that most web engineers already knew JavaScript anyway, the community rapidly grew.

For a deeper dive, you can also check out this very thorough history of JavaScript, JavaScript: The First 20 Years.

Why Node.js (and why JavaScript)?

toddler with coffee

JavaScript was not a language designed for professional-grade, enterprise-scale applications. Brendan Eich, the creator of the language, on JavaScript’s original purpose:

The idea was to make something that Web designers, people who may or may not have much programming training, could use to add a little bit of animation or a little bit of smarts to their Web forms and their Web pages.8

In designing a language in which it was incredibly easy to write snippets of code, he also designed a language in which it was incredibly easy to write bugs. JavaScript provides you many ways to shoot yourself in the foot.

However, as touched on in the history section, because of the event loop programming model and its requirement for browser apps, JavaScript is still immensely popular for web applications. The biggest selling point for JavaScript is the public package registry provided by npm, the de facto standard JavsScript code registry. Software developers publish packages for others to use, most of which for free; at over one million packages, npm claims to be the largest software registry in the world. The immense popularity of JavaScript also makes it easy to find resources to learn and to find answers to questions.

If you’re entirely unfamiliar with JavaScript, I would recommend taking some time to learn some of the basics. Mozilla and FreeCodeCamp provide free resources to learn.

As an aside, if I were creating a project I wanted to actively maintain and develop, I personally would prefer TypeScript, a statically typed version of JavaScript which helps you catch errors even before running your code. However, TypeScript introduces a lot of new syntax, restrictions, and set-up requirements which can be confusing at first, so we will stick to vanilla JavaScript for this guide.

Node.js packages

The rest of the tech stack will be composed of packages from npm, which we will make use of in our server code. These are some of the most relevant ones.

Express

Express is a web application framework for Node. This module will give our application the ability to respond to HTTP requests with functions we define. For example, we will route the path /api/users to a function which fetches and returns a list of the users from the database.

Express is a minimal, unopinionated web framework that supports request handling through the use of middleware, which allows for an easy way to compose functionality by describing a sequence of actions to be taken on each request. The simple, unopinionated nature of the library suits itself well for a tutorial, and it enjoys a level of popularity that has made it the de facto standard Node.js web framework.

The Express framework has a lot of design patterns that define the way it works; if you want to get a feel for the framework, I recommend going through their hello world example and some of their guide.

The MongoDB Node Driver

The basic reason why we need the MongoDB Node driver is fairly obvious, it’s how our server will be able to talk to our database. However, there are alternatives to the stock driver that we could use, like Mongoose or Prisma. The stock driver exposes MongoDB functionality to us at basically the same level as making requests to a bare MongoDB shell. In comparison, those alternative libraries can provide you with additional functionality, like the ability to describe the schema of your documents so you can operate on data models instead of making untyped queries. They function as an ORM, which we’ll discuss more about later. The functionality they provide is quite useful, but we will stick to the stock driver because we will be implementing some of that same functionality ourselves as a learning exercise and a way to explore certain design decisions.

Others

We will use a decent number of other Node packages in this project, but most don’t warrant a deep explanation. Here are some of them that are worth mentioning though:

  • Passport: Passport is an Express middleware that takes care of user authentication. Passport is quite flexible in terms of the types of authentication methods it supports, but we will just make use of it for basic authentication.
  • Mocha / Chai / Sinon: These three tools will support our testing. Mocha is a full-featured test runner, which makes it simple to write and organize our tests. Chai is an assertion library, which means it allows us to assert that the our code returns the results that we expect it to. Sinon is a tool which allows us to simulate the objects passed to functions in tests so we can understand how that object was interacted with. The main reason for choosing these testing tools is their ease of set-up and popularity.
  • mongodb-memory-server: mongodb-memory-server is an in-memory MongoDB server configurable within Node. Not particularly useful for production use-cases, but it will provide us with a clean way to do integration tests.

What’s next?

Was that too much pre-reading? If you’ve persevered through though, in the next part of the series, we’ll finally start to get our hands dirty building the server. The next part will start with installing the first few packages we will need, and then continue on to setting up and coding a simple server which will respond to a single request. In the part after, we’ll start implementing the first routes of the API.

See you in part 1!

Footnotes

  1. See the course requirements at Stanford, Berkeley, and MIT for example, particularly the upper division courses.

  2. Not scientific, but in Stack Overflow’s 2022 developer survey, almost 50% of users identified as back-end (presumably for web apps).

  3. Paraphrasing Conway’s Law, a well designed system depends on a well designed team organizational structure. Any large software project will necessitate a similarly large number of developers, so to fully capture strategies for scaling, you need to think not only about the design of the broad system architecture but also how to divide your organization into a team structure that can actually implement it. An approach that has been gaining a lot of traction in the last few years is splitting applications up into microservices worked on by autonomous teams, a model popularized by Spotify. This approach is not without its faults though, Spotify itself would go on to rethink its team organization as it grew. On the software architecture side, there are many technologies and practices that have been developed to deal with the problems of reliability and maintainability. By designing your application to work with cloud computing auto-scaling functionality, you not only make it easy to scale up and down your services, you’ve also then built automatic handling of issues with hardware failure and even some software failure too. At large scale, you also have to think about ways to make sure your services are running properly. You may need to implement heavy-weight technologies like log aggregation or distributed tracing just to be able debug issues. The total scope of the topic is quite large and the best-practices are ever-changing, but I will try to cover more of it in the future.

  4. If you want an intro to TDD, the basic principles are fairly simple and covered well by web articles, like this one from IBM Developer.

  5. REST is by no means the only architectural approach to designing APIs. One approach that’s rising in popularity is GraphQL which allows you to request only specific fields of a resource or to request related fields from multiple resources in a single query. Another approach is Remote Procedure Calls (RPC), which lets you make calls to code on other machines with the same syntax as making a normal local function call. The most popular RPC framework is gRPC.

  6. REST purists will argue that to be RESTful, you also must not use local identifiers in return objects and instead provide full URLs to encode the relationships between resources. For example, when querying for a post, instead of returning the ID of the user who created the post, you would return the URL to the user resource. This is the HATEOAS (Hypermedia as the Engine of Application State) principle, which can be one of the constraints of REST. The benefit of this approach is that clients should theoretically be able to navigate your API without any prior knowledge, simply using the URLs provided in the responses. However, in practice, this is rarely done because it adds non-neglible complexity to the API and to generating responses in the server. Generally, clients will need to have access to documentation anyway to fully understand the API behavior, reducing the benefit of HATEOAS. For more on this, see this article.

  7. Though that press release frequently mentions Java, the two languages are only loosely related; the name went through many iterations and the creator of JavaScript himself suggested its name was chosen by Netscape a marketing ploy.

  8. From ”JavaScript creator ponders past, future” on InfoWorld, 2008.

Feedback or Comments?