Intro
In my previous blog post, I've explained the concept of JWTs, how they're used as well as possible security issues that should be taken into consideration.
Today, I'll introduce you to revokable tokens, a fundamental concept for the secure usage of JSON Web Token. This tutorial is not designed to be skimmed and I encourage you to read all of it.
First of all though, please note that there are different session management flows available. Depending on the used flow, methods of securing them vary. This post will focus on one I've worked with before and believe to be broadly applicable: tokens come in pairs, a short-lived access token and a long-lived refresh token.
Secondly, making JWTs revokable is a controversial topic throughout the dev-community and I encourage you to evaluate other options like OAuth2 as well. That being said, JWTs can be made revokable quite easily as long as certain aspects are considered, and therefore are suited to be used with user sessions.
Enjoy your read ๐
2021 - not a cheap sequel of 2020 ๐
Believe it or not, I started working on this post right after publishing my last one in July. I originally intended to release it in August, maybe September. However, due to the pandemic requiring me to work extra hours (I work in health care) and college applications consuming a lot of time, I haven't been able to allocate any time to working on my blog until recently.
Enough about me, everyone had a hard year and I've picked some GIFs from my favourite music videos of 2020. I really hope you enjoy them.
2021 will be our year - speaking it into existence rn...
The problem
For those of you who haven't read the first part of my series on JWTs, here's a brief explanation of the underlying issue:
The access token is used to authenticate "regular" requests whereas the refresh token is needed to obtain a new token pair once the access token expires. Thereby, the access token is short-lived (expires in ~1 hour) and the refresh token long-lived (up to 1 year of validity).
Let's demonstrate the issue using two scenarios.
First scenario:
A malicious third party somehow gets hold of an access token. They're now able to illicitly make requests for the actual user, until the access token expires. As they are short-lived, there's still quite some damage that could be done, however, the horror would end after the access token expires (provided the malicious party doesn't get hold of another access token).
Second scenario:
A malicious third party somehow gets hold of a refresh token. Now, if your tokens aren't revokable, that's a disaster. Literally a catastrophe. Why? The refresh token allows the malicious party to request a new token pair, i.e. both an access and a refresh token. Thereby, it's possible to infinitely make illicit requests. Sure, the access token expires after a short period of time; however, there's always a refresh token available to obtain a new token pair ๐
โ Great, the issues of unrevokable tokens should be clear by now. Identifying a problem is a major step towards the final solution, however, there's still a lot of work ahead of us.
What follows is the approach I used in my project. Please feel free to comment any suggestions and improvements down below! ๐
The solution
Basics
In order for the solution to work out, certain conditions must be met:
- every generated token receives an id which is stored in a database and the token's payload
- tokens reference each other, i.e. the access token payload contains both its own as well as the refresh token's id
- per session, no more than one valid token pair must exist
- we keep track of revoked, i.e. blacklisted tokens
Those conditions will become more clear as you read on.
1 - Generating a new token pair
Every generated token receives its unique id, i.e. a uuid
, which is stored in both the token payload and an according document in our database. A document in the generatedtokenpairs
collection looks like this:
{
_id: "5fdb88db87f55c037f8afe1d",
uid: "5eea102c335f21032084721e",
sessionId: "5eb4554afdf50009bcca1597",
accessTokenId: "d754436a-890f-47d8-9562-3b7836f708b1",
refreshTokenId: "dd53c1cc-cea7-4b6c-bcdb-f4d58c635557",
createdAt: "2020-12-17T16:35:39.067+00:00",
}
Let's break it down:
uid
holds a reference to the user for which the tokens have been generatedsessionId
refers to the current session, contained in the token payloadaccessTokenId
stores theuuid
of the access token, which is contained in the token payloadrefreshTokenId
stores theuuid
of the refresh token, which is contained in the token payloadcreatedAt
contains a timestamp of when the tokens have been issued
We also want to ensure there's only one valid token pair per session. We do so by using the sessionId
. Everytime the user authenticates themself using their email & password, a new session (sessions
collection) should be created.
The session document could look like this:
{
_id: "5edd29f221a049082565befc",
uid: "5ea80f15c12e3f031a5b83ed", // reference to the user
deviceName: "Laptop Name",
deviceIdentifier: "1234507950246",
ipAddress: "yourIPAddress",
lastRefreshed: "2020-06-07T17:54:58.527+00:00", // updated every time a new token pair is issued for that session
createdAt: "2020-06-07T17:54:58.528+00:00", // date of original creation
}
โ Now, per session, there must not be more than one valid token pair. โ
Everytime a refresh token is used, you should invalidate the current token pair prior to generating a new one. We do so by blacklisting them, which is explained below.
2 - Token payloads
Our tokens are required to reference each other, i.e. store not only their own id but also the other token's id.
Therefore, the token payload looks something like this:
{
"exp": "2020-07-10T16:13:07.182Z",
"sub": "1313131313",
"name": "Taylor Swift",
"locale": "US",
// ... further user data here ...
"accessTokenId": "d754436a-890f-47d8-9562-3b7836f708b1",
"refreshTokenId": "dd53c1cc-cea7-4b6c-bcdb-f4d58c635557",
"tokenType": "access", // needed to identify this token's id, 'access' or 'refresh'
}
Aside of the token ids, we include the token type into any token's payload. This makes identifying the current token's id much easier.
We include both token ids as it reduces read-requests to the database whenever an entire token pair is blacklisted.
3 - Blacklisting token pairs
Following the generatedtokenpairs
and sessions
collection, it's time to have a look at our third and last collection: blacklistedtokens
. Its purpose is to store the ids of blacklisted tokens.
A blacklisted token document could look like this:
{
_id: "5edd3a4008cf610d0fa9fbab",
accessTokenId: "c6fccd22-344b-4d08-bece-dc823996d0a6", // id of the blacklisted access token
refreshTokenId: "dd53c1cc-cea7-4b6c-bcdb-f4d58c635557", // id of the blacklisted refresh token
uid: "5edd3755cd7b920cce4cfd24",
blacklistedAt: "2020-06-07T19:04:32.775+00:00"
}
To blacklist a token pair, all it takes is the creation of a new document inside the blacklistedtokens
collection.
This works for both refresh and access tokens.
Complying to our reinforcement of having only one valid token pair per session, we simply blacklist the current token pair prior to sending out a new one. Instead of reading the corresponding token document, you can simply grab the token ids from the refresh token's payload.
BAAAM! That's a read-request you saved right there ๐ฅณ๐๐
Another scenario in which you would blacklist tokens could be a user logging out from all sessions, like we all did to prevent an ex from parasiting our Netflix account ๐
You could easily do so by blacklisting all tokens of a user's sessions.
Whenever a token is now sent to the API, we check whether its id is contained in the blacklistedtokens
collection. If it is, bad luck - the request is denied. Otherwise, the token isn't blacklisted and therefore valid.
No worries if you haven't fully grasped the concept yet. What follows is a summary of the entire authentication flow.
Summary of our new authentication/token flow
- The user initially signs in using e.g. email and password
- A new session and token pair as well as the corresponding database entries are created; eventually the token pair is sent to the user
- The user performs "regular" requests; with every request, we verify that the used token isn't blacklisted.
- The access token expires and a request token is used. Like any token our API receives, no matter access or refresh, we check whether it's blacklisted or not. If it is, the request is denied and further measures are enforced (more on that below). Otherwise, the old tokens are blacklisted by inserting their ids into the
blacklistedtokens
collection. Now, a new token pair (and the matching document in thegeneratedtokens
collection) is generated, and sent out to the client. - "regular" requests happen
I hope the concept became more clear now. Otherwise, please comment down below and I'll be happy to help!
Next, we'll learn how token theft can be detected, possible measures to enforce and what limitations there are.
Detection of token theft
Refresh Tokens
In our case, theft of refresh tokens is relatively straight forward to detect, assuming they're capable of generating exactly one new token pair, i.e. are usable only once.
Specifically, our approach blacklists the old token pair prior to generating a new one. The party requesting the token swap receives the new tokens and therefore wouldn't use the old ones to make requests. Another party, however, doesn't know about the issuing of new tokens and would continue using the old ones. As soon as a blacklisted token is used, the request is denied and there's a high probability token fraud took place.
The right thing to do now: log out the user by blacklisting all tokens of that user's sessions and monitor the situation (i.e. keep track of the token fraud attempts; if multiple fraud attacks are detected, require the password to be reset etc.)
That's why we need the generatedtokenpairs
collection. Without it, we are unable to disband a user's sessions upon detection of token theft.
Unfortunately, things aren't as easy in the case of access tokens. Those are used multiple times which makes detection of token theft much harder.
Access Tokens
First of all, access tokens are blacklisted as well - if a revoked token is used, enforce the measures described above.
If the tokens aren't blacklisted though, access token fraud is almost impossible to detect unless your application is backed by a godlike AI, like Google or Facebook have it. Theft could only be detected using heuristic algorithms, e.g. detecting the unexpected change of IP-addresses/browsers or unusual user behaviour. Now, unless you possess said AI, you should carefully consider implementing this approach as it's highly unreliable and probably annoying for your users.
That is because it's hard to define absolute criteria for the mentioned factors. For example: how much "browser hopping" is allowed before the user is logged out? What makes a user's behaviour "unusual"?
Although it's easy to detect changes of IP-addresses, they can be manipulated quite easily and therefore aren't enough to rely on when detecting token fraud efficiently.
In a nutshell, detecting theft of access tokens is hard work as the above mentioned factors should be considered in context and holistically to accomplish high reliability.
Therefore, prior to performing critical actions like account termination or change of contact information, require the user to enter their password. Additionally, always confirm critical changes via email.
Advice
Unfortunately, there are still many scenarios in which token theft can't be detected or its detection is useless. Many of them are related directly to the user, e.g. when someone has a user's credentials, control over their PC, etc.
The possibilities are endless and there's no flawless way to secure an application. Cybersecurity is both complex and interesting, and I encourage you to pursue further reading. Although the approach explained in this post should be enough to point you in the right direction and enforce a substantial layer of security, it shouldn't be the only one you rely on. Other options could be 2FA, endorsing regular password changes etc.
If you've come to the conclusion that dealing with autentication is not something you want to worrie about, I totally get that. Take a look at Auth0 - they take care of all things authentication and even offer protection using heuristic algorithms.
Performance
Let's have a look at the required workload of our database.
Initial Login:
- CREATE operation inside the
sessions
collection - CREATE operation inside the
generatedtokenpairs
collection
Regular request (access token):
- READ operation inside the
blacklistedtokens
collection
Token swap (refresh token):
- READ operation inside the
blacklistedtokens
collection - CREATE operation inside the
blacklistedtokens
collection
Upon detection of token theft:
- CREATE operations inside the
blacklistedtokens
collection; one for every active session of the user - DELETE operations inside the
sessions
collection
Initial login requests, which represent a fraction of our total requests, require two create operations.
Now, regular requests, which make up the vast majority of requests, only require a single READ operation. That's not an issue at all, as long as the data is indexed properly (speaking from a MongoDB standpoint).
Token swap, which happens significantly less often, only requires a read and a create operation.
Token theft, which hopefully (almost certainly) barely happens, requires one create and one delete operation.
Every now and then, you want to make sure that documents of already expired tokens are removed from the token collections (generatedtokenpairs
and blacklistedtokens
).
That's it. The database overhead is perfectly acceptable. Especially when considering further optimizations like caching blacklisted tokens (e.g. by using Redis).
Outro
Congrats, you now know how to make your JSON Web Tokens revokable ๐ I sincerely hope you had a nice read and learned something new. As always, if you have any questions, just comment them and I'll try my best to help you out ๐
I would love to hear your feedback! Let me know what you liked or what could've been better with this tutorial - I always welcome constructive criticism! If you have any questions, just comment them and I'll try my best to help you out :)
In love with what you just read? Follow me for more content like this ๐
Sources
ยฉ GIFs from Giphy
๐ Inspiration for the described approach by HackerNoon
๐ Detection of token theft inspired by HackerNoon