Implementing RSA in Python From Scratch
In this article, explore an easy-to-understand explanation of the mathematical background behind RSA cryptography algorithms.
Join the DZone community and get the full member experience.
Join For FreePlease note that it is essential for me to emphasize that the code and techniques presented here are intended solely for educational purposes and should never be employed in real-world applications without careful consideration and expert guidance.
At the same time, understanding the principles of RSA cryptography and exploring various implementations is valuable for educational purposes, and understanding how to code encryption methods, and building secure encryption systems for practical use requires adherence to industry standards, best practices, and thorough security assessments. An inadequate implementation or improper usage of cryptographic techniques can lead to severe vulnerabilities, jeopardizing the confidentiality and integrity of sensitive data.
I'm sure many programmers, particularly web developers have heard of the RSA cryptography system. RSA is an asymmetric cryptography system, meaning that one key is used for encryption and the other for decryption. I've seen a lot of articles explaining the general principles of asymmetric cryptography, but I have not seen any that give easy-to-understand explanations of the mathematical background behind these algorithms, so I decided to write this article.
Explanation of the Math Behind RSA
To start, as I've mentioned in the paragraph above, to transmit an RSA-encrypted message, you need 2 keys. One that can only encrypt and one that can decrypt. Let the encryption key be denoted as e
, the decryption key will be denoted as d
and the message will be denoted as m
. The key principle behind RSA is that in the following notion:
(m^e)^d ≡ m (mod n)
The difficulty of finding d
, knowing e
and n
can be very hard if these numbers are chosen properly (as will be demonstrated below).
But first, we need to introduce some new concepts in number theory.
a ≡ b (mod n)
means that there is an integerx
such thata + nx = b
. A more intuitive explanation is that the reminder ofa / n
equals the reminder ofb / n
.gcd(a, b)
is the greatest number that divides botha
andb
.lcm(a, b)
is the smallest number that is a multiple of botha
andb
.λ(n)
is a number such thatx^λ(n) ≡ 1 (mod n)
for anyx
such thatgcd(x, n) = 1
. From this you can conclude thatx^a ≡ x^b (mod n) if a ≡ b (mod λ(n))
Generating Keys
Let's make n = pq
where p
and q
are 2 prime numbers. Since λ(p) = p - 1
and λ(q) = q - 1
(lookup Fermat's little theorem proof for why), λ(n) = (p - 1) * (q - 1)
.
Now we have to solve ed ≡ 1 (mod λ(n))
. e
can be some prime number (usually 65537) and d can be calculated using extended Euclidian's algorithm (will be written and explained in the code section) from the equation ed + xλ(n) = gcd(e, λ(n))
(d
and x
are coefficients to be calculated).
pair (e, n)
is used for encryption ((m^e) % n
) and pair (d, n)
is used for decryption ((m^d) % n
)
After computing the keys, p and q can be thrown away. Notice that without p
and q
, finding λ(n)
would mean factorizing n
, which is not an easy problem to solve for values of n up to 2^2048, which are regularly used.
Implementing RSA in Python
First list procedures and their steps:
Keys Generation
- Find 2 random prime numbers,
p
andq
- Compute
n = p * q
andλ(n) = (p - 1) * (q - 1)
- Make
e
equal some prime number, e.g.e = 35537
- Compute
d
from equationed + λ(n)x = gcd(e, λ(n)) = 1
using Extended Euclidian Algorithm (from this point on we will call iteucalg
)
Encryption
- Divide the message into sections (of 256 bits if n is 2048-bit)
- Each section (denoted as
m
) is encrypted as(m^e) % n
Decryption
- Divide the message into sections, same as above
- Each section (denoted as
m
) is decrypted as(m^d) % n
Implementation of Extended Euclidian Algorithm
This algorithm relies on the fact that if x
divides both a
and b
, there will be a pair of coefficients (k, l)
such that:a * k + b * l = x
The algorithm finds these coefficients (and x
) by repeatedly substracting the lesser argument from the bigger one, until the lesser one becomes 0. Instead of repeatedly substracting, you can calculate how many times b
can be substracted from a
(k = a // b
) and then substratct b*k
. This optimization makes the algorithm run in O(log max(a, b)) which is a lot quicker.
Below is an implementation of the algorithm in Python:
def eucalg(a, b):
# make a the bigger one and b the lesser one
swapped = False
if a < b:
a, b = b, a
swapped = True
# ca and cb store current a and b in form of
# coefficients with initial a and b
# a' = ca[0] * a + ca[1] * b
# b' = cb[0] * a + cb[1] * b
ca = (1, 0) cb = (0, 1)
while b != 0:
# k denotes how many times number b
# can be substracted from a
k = a // b
# a <- b
# b <- a - b * k
# ca <- cb
# cb <- (ca[0] - k * cb[0], ca[1] - k * cb[1])
a, b, ca, cb = b, a-b*k, cb, (ca[0]-k*cb[0], ca[1]-k*cb[1])
if swapped:
return (ca[1], ca[0])
else:
return ca
Notice that the function above can produce negative numbers for coefficients, so in case that happens, all that we need to do is set d
to λ(n) - d
.
Implementing Fast Exponentiation With Modulo
Some might suggest just using (b ** e) % n
, but this approach is not very good time and memory-wise because b ** e
can be very big despite only the last 2000 bits or so being needed. The solution is to reimplement exponentiation by calculating modulo after every division.
The exponentiation implementation below has logarithmic time complexity. Instead of iterating from 1 to e
and multiplying r
(result) by b
, it iterates from the most significant bit of e
to the least significant bit of e
, and at each bit it does r = r*r + bit
. This works because if r
equals b^x
and you're appending a bit to the end of x, new x will be x * 2 + bit
.
def modpow(b, e, n):
# find length of e in bits
tst = 1
siz = 0
while e >= tst:
tst <<= 1
siz += 1
siz -= 1
# calculate the result
r = 1
for i in range(siz, -1, -1):
r = (r * r) % n
if (e >> i) & 1: r = (r * b) % n
return r
Key Generation, Encryption, and Decryption
Keys generation, encryption, and decryption have all been explained in the math section and the code below is simply the implementation of that.
def keysgen(p, q): n = p * q lambda_n = (p - 1) * (q - 1) e = 35537 d = eucalg(e, lambda_n)[0] if d < 0: d += lambda_n # both private and public key must have n stored with them return {'priv': (d, n), 'pub': (e, n)}
def numencrypt(m, pub): return modpow(m, pub[0], pub[1])
def numdecrypt(m, priv): return modpow(m, priv[0], priv[1])
Testing the Code We Have Until Now
I stored the code in a file named rsa.py and ran the following in a Python shell:
>>> import rsa >>> keys = rsa.keysgen(31337, 31357)
>>> keys {'priv': (720926705, 982634309), 'pub': (35537, 982634309)}
>>> priv = keys['priv']
>>> pub = keys['pub']
>>> msg = 80087 >>> enc = rsa.numencrypt(msg, priv)
>>> enc 34604568
>>> dec = rsa.numdecrypt(enc, pub)
>>> dec 80087
>>>
Conclusion
At the end of writing this article, I realized that the implementation of a usable Python RSA program is more complicated than I initially speculated. Because of that, I decided to split the topic up into multiple articles. With the code presented in this article, you have all the core parts of RSA written. But you still need a random prime generator and plaintext encryption (numencrypt
and numdecrypt
are suitable only for numbers m
smaller in size than n
). All these will be included in the next article that will be published shortly.
Published at DZone with permission of Traven West, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments