Dealing with Authorization over HTTP

Dealing with Authorization over HTTP
Photo by Alp Duran / Unsplash

This week I’ve been researching different concepts related to authentication to learn how to build an authentication microservice. As part of that research, one of the concepts I've been looking into is how authorization works over HTTP and the constraints that would need to be handled. (To learn more about what authorization is, visit the following article)

HTTP: A Stateless Protocol

The web is all about communication between web clients and servers. A client, usually a web browser, can make a request that is handled by a server. HTTP is the protocol that serves as the foundation for data exchange on the web.  It is a stateless protocol, which means that every request the client makes must have all the information necessary for the server to respond. The state is not held across multiple requests. For static server applications, this mechanism is straightforward. All requests are treated similarly regardless of the client. On a client request, the server response can be a resource like an HTML document. For dynamic server applications, requests can have varied responses based on who makes the request.

Let's take a real-world example here. You log onto Facebook (or any social media app) and you see information related to you. You see who you're friends with, the posts you made, etc. The client has made requests to grab information related to you, the user. Now, if John Doe were to log onto his Facebook account, he should expect his information, not yours. The web server is using some information in each request to determine who is making the request and what needs to be retrieved.

How do we deal with authorization over a stateless protocol?

What information a user has access to is the definition of authorization. The question now is how we determine what a user has access to. There are a few ways this can be handled:

  1. Sessions
  2. Tokens
  3. JWTs

To demonstrate the difference between these three mechanisms, I will be found going off the first 10 minutes of the following video. (I found this to be a great watch, and I highly recommend it if you're interested in how authentication works)

Sessions

An example where the clients are pinned to the servers

Sessions are the old-school way of holding onto the state over a stateless protocol. When a user logs in, a session is created with an associated user object. What is returned to the user is a session id that can be stored in a cookie. Every time a user requests the backend, this session id is passed along with the request. The session id is what gets used to determine what user information should be returned.

Sessions are a safe bet since they are a proven method that works. They are secure, but they come with some downsides. Sessions make the server stateful. While this solves the issue of working over a stateless protocol, it also means that the client is required to only interact with the server that has the session (session pinning). With modern web applications, this constraint makes it difficult to scale up with increased traffic.

Tokens

An example of using a token in each request

If we want client-server interactions to be flexible, then sessions are not the way to go. This is where tokens come in. Let's say that instead of having a monolithic service that handles application data and user info, we have multiple services. We can have one server to run the user service and multiple app services on several servers. When users log in, they will hit the user server and receive a token from an API request. In our case, this token can be some large random string (like a GUID). When the user requests any of the app servers, they provide said token as part of the request. The app service doesn't know what user made the request. Instead, the app service will make a call to the user service to validate the token. If the token is valid, the user service will return the user object mapped to the token.

The separation of responsibilities between the app service and the user service allows for multiple app servers to run and handle requests at the same time. The same user can send requests to any of the app servers as long as they pass their token along. It is now the responsibility of the app server to figure out the user associated with the token. This solves the session-pinning problem because requests are not pinned to a specific app server. It does come with its downsides. We now have a case where all the app servers will be talking to the user service on every request. The app servers and the user server are going to be very chatty. It couples the User API to all services and it makes the User API slightly stateful. Is there a way we can improve this?

JWT

JWTs solve the service coupling issue, but before we get into that let's explain what a JWT is.

A JWT stands for JSON Web Tokens. It is a base 64 encoded JSON string that can either be signed with a public/private key pair or a shared private signing via HMAC (Hash-Based Message Authentication Codes). The purpose of JWTs is to create a standard way for 2 entities to communicate with each other in a stateless and secure manner.

eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c
JWT Example

A JWT consists of 3 parts:

  1. Header
  2. Payload
  3. Signature

The header usually consists of 2 parts: (1) the type of the algorithm, and (2) the signing algorithm which can be RSA, HMAC, etc.

{
  "alg": "HS256",
  "typ": "JWT"
}
Base-64 Decoded JWT Header Content

The payload is the portion that consists of claims - statements about the user entity. This information can include the identity of the user, their permissions, etc. 3 types of claims can be a part of the payload:

  1. Registered Claims
  2. Public Claims
  3. Private Claims

Registered claims are predefined claims that are defined in the Request for Comments (RFC) publication that is maintained by the Internet Engineering Task Force (IETF). These claims are optional.  A list of these claims can be found in the JWT specification linked here.

Public claims are different from registered claims since they are not defined in the RFC. They are considered to be public since they are defined in a public registry.

Lastly, we have private claims which are custom claims that are shared between parties. JWT payloads are very flexible, and it's up to the parties using the JWT to decide what is encoded in the payload.

The signature is used to verify the authenticity of a JWT. It is created by taking the encoded header and payload, along with a secret key known only to the sender, and applying a cryptographic algorithm to generate a hash value. This hash value is then included as the signature in the JWT. When the receiver receives the JWT, they can use the same secret key to regenerate the hash value and compare it to the signature included in the JWT. If the two values match, the receiver can be sure that the JWT has not been tampered with and that the sender is who they claim to be.

The app server can validate the JWT without communicating with the user server

The benefit of using JWTs over general tokens is that the app service and user service maintain a shared secret that is used to verify the JWT that is passed in with the request. The services don't need to communicate with each other to verify the token. It makes the system stateless because each request passes all the information the app server needs to determine a user's identity and permissions.

Conclusion

This article is an introduction to setting up the auth microservice we'll be building over the next few months. We explored the different solutions to implement authorization mechanisms over HTTP. The auth microservice we build will use these concepts as a foundation for building out the system.

I hope you found this article useful! Until next time!