Protecting PII Data With JWT
Use JWT tokens for fine-grained access control while avoiding direct inclusion of PII-like account numbers. Use non-PII identifiers and perform server-side lookup.
Join the DZone community and get the full member experience.
Join For FreeThe Challenge
JWT tokens are widely used for securing APIs through authentication and authorization. When an API request arrives, the resource server decodes and verifies the JWT token, typically validating the signature for authentication and checking claims or scopes for authorization. For example, the server might use claims in the token to decide if the user can access a particular endpoint.
However, finer access control is often needed. For instance, when a request fetches a customer's bank account details, the server must ensure the user is accessing their own account, not someone else's.
A common approach is to include the user's account information in the JWT token claims or scopes, enabling the server to verify the account being accessed. While this works, it exposes Personally Identifiable Information (PII) like account numbers in the token. This exposure violates data protection policies, as the token can be publicly accessible if logged or copied from a browser, leading to potential data breaches.
Public Nature of a JWT Token
The content of a JWT token's payload may not be considered secret or private data. JWT tokens typically have JSON Web Signature (Section 3 of RFC 7515), which is an encrypted string used to protect bits in a token from tampering. The JSON Web Signature is the only part of a JWT token that is encrypted. The issuer (generally IDP) encrypts it using its private key. Rest of the JWT is just base64 encoded and not necessarily encrypted. This means that a JWT works on the principle of signing rather than hiding information with encryption as a whole. A JWT token provides security by letting the resource server verify the digital signature of the origin (IDP) thereby verifying that the token was not tempered or touched by any other entity. As far as JWT specification is concerned, providing security by Encryption is not the intention (In all modern systems HTTP requests are secured using HTTPS which takes care of encryption of the entire request including that of the token, but this is a separate topic). Therefore the payload of a JWT token based on JWS (JSON Web Signature) should never contain secrets or PII.
Here's a quick refresher on the structure of a JWT token. A well-formed JWT token consists of three base64 encoded strings, concatenated using a dot (.) separator. Only the third one (JWT Signature) is encrypted on its own.
- Header (Section 4 of RFC 7515): Contains metadata about the token including the cryptographic algorithms used to secure its contents
- JWT Payload: Contains a set of security claims that are verifiable by the resource server (a resource server hosts the API or application being protected) in some way that holds meaning for the system being protected. These claims can be built-in claims defined by the JWT specification or custom claims created and configured by the IDP (issuer of the token). See Section 4 of RFC 7519.
- JWT Signature: Contains digital signature from IDP and is used to verify that the token was not tampered with. Before using or storing a JWT token its signature must be verified.
Solving the PII Problem in JWT
Given the challenge we outlined above and the basic nature of a JWT token, let us consider a hypothetical example and try to solve it. Suppose a bank customer is attempting to fetch their bank account details, like account balance, transactions, etc. During the workflow, the application calls a backend account API which is protected using JWT tokens. Let us also assume that the JWT tokens are issued by an enterprise-level IDP like Auth0. The IDP is integrated with the organization's LDAP system for identity federation and acts as an authoritative source for managing user identities and access. When a user authenticates, the request goes to the IDP that returns a JWT token in response. The JWT token consists of an identifier for the user, and some built-in or custom claims describing the user's access. Now if we want to provide access permissions to the granularity level of a customer's account, a common but flawed way of achieving this would be to embed user's accounts directly in the token.
Example
Here's an example of the payload of such a token.
Base 64 Encoded Token
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyLCJzY29wZXMiOlsiYWNjb3VudHM6MTIzNDUsNDM3ODksOTA4NzUiXX0.r6bCsRn7ALxedEt8BZcqVc3YErCymtYHvSpPdsAnQaE
Decoded Token
Notice that the payload consists of account numbers directly embedded into the token and facilitates comparison of the specific account from the current HTTPS request with these accounts from the token that the user is authorized to use. This approach works as far as access control is concerned but puts us in violation of PII privacy compliance because the fact that a JWT token is publicly identifiable information (except for the digital signature of course) is ignored. In short, the above token might end up introducing the following vulnerabilities in our system.
- Exposure to sensitive information:
- Embedding account numbers directly exposes Personally Identifiable Information (PII). If the token is logged, intercepted, or otherwise exposed (via browser for example), sensitive information can be accessed by unauthorized agents.
- JWT tokens even if signed, can be publicly decoded. So anyone can see the embedded PII data.
- Increased risk of token misuse:
- Logging: Tokens might be logged by clients, servers, or intermediate systems after decoding or before encoding. Thus account numbers and other PII, if present in the token, can be inadvertently and unwittingly exposed via logging.
- Token Leakage: Sometimes during debugging and vendor support, tokens are exchanged for support tickets, etc. account numbers and other PII data can get exposed unintentionally in such cases.
- Regulatory compliance:
- Data protection laws: Many jurisdictions and countries have strict data protection laws. Embedding PII data in tokens might violate laws such as GDPR or CCPA, leading to potential legal and financial consequences.
Recommendations
To address the above vulnerabilities, we must follow the below best practices.
- Use UUIDs or other non-PII identifiers as values of custom claims
- Instead of embedding account numbers, use UUID or another identifier that can be mapped to an actual account number from the database via the resource server hosting the protected API the user is trying to invoke.
- The basic principle is that when an account gets created in your transactional database it should get a UUID-like id assigned to it, which should also be synced into your IDP. The JWT token issued by the IDP then should contain this id instead of directly embedding the account number.
- Keep the token payload to a minimal
- You will always end up hitting size limits for the token payload because it is part of the HTTP request. Therefore, including accounts or other application-specific details directly in the token is not only insecure but also inflexible. What will you do if one of your customers is an organization and has thousands of accounts (for departments and sub-departments) with the bank? This will force you to include a large amount of data in the token thereby bloating the http request size.
- Substitute direct embedding of sensitive data with server-side lookup
- This recommendation extends the first one above. If we follow the basic principle that we will always include an identifier (UUID or something similar) instead of actual data and we make sure that this identifier is synced across the transactional database and IDP, then we can use this identifier to fetch the accounts from the database during the token verification process after the request arrives on the resource server. This allows us to perform a comparison with the account(s) in the incoming request with the user's account from the database and validate their access permissions.
- Consider using JWE (RFC 7516), if embedding sensitive data is absolutely necessary
- If it is unavoidable and absolutely necessary to include sensitive information in JWT then we should make use of JWE, to ensure that only authorized parties are allowed to decrypt and access the token payload. I believe this should be the last resort as this introduces significant complexity and may also impact performance due to extra encryption/decryption logic.
Conclusion
Including application-specific PII data in JWT tokens to provide access control at a finer level of granularity works in principle but breaches PII data security and introduces the risk of violating Data Protection Laws, such as GDPR and CCPA. Using Non-PII identifiers in the token payload and performing a server-side look-up of sensitive data using these identifiers, instead of direct token embedding can mitigate all these risks and at the same time provide equivalent access control capabilities to the desired fine level of granularity.
Opinions expressed by DZone contributors are their own.
Comments