Cryptography Or: How I Learned to Stop Worrying, and Love AES

by Phillip Gawlowski on July 18, 2011

Cryptography Or: How I Learned to Stop Worrying, and Love AES

This guest post is by Phillip Gawlowski, who is living in the German wilderness of Oberberg near Cologne. Phillip spends his time writing Ruby as a hobby just for fun. He tries to make life a little easier for himself and for others when he is crazy enough to release his code as open source. He’s neither famous nor rich, but likes it that way (most of the time). He blogs his musings at his blog.

Phillip Gawlowski A friend gave you the plans for Dr. Blofeld’s newest Doomsday Device. Over the engine noise of his Aston-Martin, he tells you: “Send this to offers@universal-exports.co.uk, and make sure it arrives there intact!”

All you have is a laptop, wonky Internet access, and Ruby. What to do?

AES For Safety, SHA2 For Integrity

You now have two goals:

  1. Make the Doomsday Device plans unreadable, and
  2. Ensure that the data has arrived at its destination without error.

Fortunately, Ruby provides an API to OpenSSL, a well-tested, widely used library and set of tools used for encryption of all kinds, and includes its own implementations of several cryptographic hashes.

In this article we will use AES for de- and encryption, and SHA2 to hash data.

Using SHA2

Like many things, Ruby makes creating crypto-hashes easy:

require 'digest/sha2'
sha256 = Digest::SHA2.new(256)
sha256.digest("Bond, James Bond")

The SHA2#new call provides us with the bit length we want our hash to have. SHA2 exists in two variants: 256, also called SHA256, and 512, called SHA512. A longer key length takes longer to calculate, but is also more accurate, and much more difficult to attack with a rainbow table or other cryptanalysis.

Once we have our SHA object, we pass a String of data into the #digest to have the hash of this data returned as a String.

You can call the #digest method directly when you are working with MD5 or SHA1:

require 'digest/MD5'
Digest::MD5.digest "Bond, James Bond"

The Advanced Encryption Standard

Theory

As AES is a so-called symmetric-key block cipher, it operates on chunks of data, called blocks, and applies the provided key to this block to create de- and encrypted output. The use of the same key for encryption and decryption is what makes the cipher symmetric. Conversely, asymmetrical ciphers use different keys for decryption and encryption, usually a private key known only to the recipient to decrypt, and a public key known to anyone to encrypt. SSH, SSL/TLS and PGP are examples for this kind of cipher.

The AES family has three modes of operation: 128 bit, 192 bit, and 256 bit. Just as with SHA2, you’ll find AES-128, or AES-256 being used to describe the particular block size that can be used.

The downside to this approach is that the same key is used for each block of data, which weakens the encryption (the same data is encrypted in the same way!). The solution is to use a so called “mode of operation”, which scrambles the cipher so that it becomes indistinguishable from noise.

A full discussion of methods of operation and their strengths and weaknesses would go well beyond the scope of this article, however.

…And Practice

Now let’s take a look at Ruby’s encryption API:

require 'openssl'
require 'digest/sha2'

payload = "Plans for Blofeld's newest Doomsday Device. This is top secret!"
sha256 = Digest::SHA2.new(256)
aes = OpenSSL::Cipher.new("AES-256-CFB")
iv = rand.to_s
key = sha256.digest("Bond, James Bond")

aes.encrypt
aes.key = key
aes.iv = iv
encrypted_data = aes.update(payload) + aes.final

puts encrypted_data

Since Ruby’s OpenSSL API is pretty straight forward (and so is the OpenSSL API, if you would like to use OpenSSL in C code), we will only discuss what’s really important.

OpenSSL::Cipher.new("AES-256-CFB") sets up an AES object, with a block size of 256 bits and the CFB mode of operation. To find out which ciphers are supported, OpenSSL::Cipher.ciphers allows you to interrogate the class for which ciphers are understood.

The iv variable stores our random Initialization Vector, random data to seed the mode of operation to ensure that each 256 bit block is encrypted uniquely, and thus (hopefully) indistinguishable from noise.

We also take advantage of SHA2′s 256 bit variant to generate a 256 bit password from a simpler password. AES expects the encryption key to be as long as a block of data, and since creating a 256 bit password from hand is pretty difficult, we let the computer do the job. When used in production, you most likely want to add a salt to the hash, or use a user’s already hashed password.

With the #decrypt and #encrypt methods, we put our AES object into the proper state. Behind the scenes, this initializes OpenSSL’s encryption engine. These two method calls are required before any other method call!

Last but definitely not least, the #update and #final methods are where the encryption actually happens. The more data you have, the longer the chunks, and the more complex the cipher, the longer this will take. The #final method does the same as #update, but ads padding to a chunk to bring it up to the required block size.

In case you make a mistake, or want to do another round of encryption or decryption, the #reset method can reset a Cipher object.

Decryption works pretty much the same as encryption, except that we pass the encrypted data to the #update-method:

aes.decrypt
aes.key = key
aes.iv = iv
puts aes.update(encrypted) + aes.final

Note, however, that both the key and the IV must be the same, and thus have to be stored or transmitted to the recipient of the encrypted data!

Verifying Integrity

As we’ve already seen, a hashing algorithm can turn data of arbitrary length into a fixed length, unique stream of bytes. This can function as password storage, to generate securer keys for encryption, or, since the output of a hash algorithm is deterministic (it’s always the same for the same input) as an integrity check.

If you’ve downloaded a Linux distribution or other software, you have already seen this, in the form of MD5 digests, with which you can verify that a download is complete and error free, like on Ruby’s homepage.

We will do the same with our encrypted data, as a poor man’s message authentication code–a technique in cryptography to ensure that a message has not been tampered with:

poor_mans_mac = sha2.digest(encrypted)

Now all that’s left is to send an email to James’ employer with the Doomsday Device plans, and to give them a call to give them the IV and key.

Closing Remarks

Think of the Future

Security is not a state, it is a process. You should write your security-aware code in such a way that you don’t depend on a particular cryptographic algorithm. Ruby’s API (and OpenSSL’s own API) wrap encryption abstractly, so that you can swap out the algorithm you use at any time. This is also necessary for hashing algorithms: While there are no feasible attacks against SHA2 yet, the cryptanalysis only gets better over time, as the histories of MD5 and DES show.

Schneier’s Law

Schneier’s Law states that “any person can invent a security system so clever that she or he can’t think of how to break it.” This is why Ruby’s developers use OpenSSL to do encryption, a widely tested and certified (in some variants!) cryptographic library, instead of writing their own library.

A mistake in your implementation can compromise your and your customer’s data, since so called “side channel attack” are used as a matter of course to attack cryptography.

Encryption Does Not Mean You Are Safe

It is important, and I cannot stress this enough, that you do not store encrypted data and the keys to access it on the same machine (ideally, you don’t store these things on the same network!), or do your encryption and decryption on the same machine that you store you encrypted data on. Whole libraries have been filled with books on how to design a secure system, from hardware to software. Above all, security is a mindset, and you have to be properly paranoid to secure your data and access to this data. Sooner or later, if you deploy, or are about to deploy, security relevant code have your code tested by outsiders. Penetration testing is worth your while.

Asymmetric encryption has been invented to solve one problem with encryption: It is not necessary for such a cipher to transmit the key. However, they have their own set of trade offs (key trust, and computational efficiency, among others).

The Safest Data is No Data

Like the fastest code is no code at all, if you don’t store data you don’t absolutely, positively have to store, don’t even bother with it. What you don’t have can’t be compromised.

Conclusion

This article is nothing but a superficial introduction to encryption in Ruby. There are dozens of standards and regulations that govern this vast topic. However, I have tried my best to give you, fellow Rubyists, enough knowledge about this topic for you to know which questions you should ask, which is, in the end, much more important than the code itself. Now go forth, and hash an encrypt and decrypt, and, above all, have fun doing it!

I hope you found this article valuable. Feel free to ask questions and give feedback in the comments section of this post. Thanks!

Technorati Tags: , ,

Posted by Phillip Gawlowski

{ 12 comments… read them below or add one }

Anthony Lewis July 18, 2011 at 7:02 pm

I have to question your choice of initialization vector. Using “rand.to_s” will produce an IV that is strictly numeric. A better choice would be “aes.random_iv” or perhaps an AES128 hash of another passphrase.

Otherwise, this is a nice overview of cryptography in Ruby. Thanks for sharing.

Reply

Phillip Gawlowski July 19, 2011 at 12:00 am

There should be very little difference in using a purely numeric IV or a random stream of bytes as IV, *unless* your computer has cryptographic hardware of which OpenSSL can take advantage, like an external source of randomness to seed OpenSSL’s RNG (which is what random_iv uses in the end).

Though, it’s a good habit to get into, definitely! :)

Reply

Ivo July 23, 2011 at 8:19 pm

IIRC, the initialization vector should be generated by a cryptographically secure random number generator. rand() does not produce cryptographically secure sequences of numbers. Aes.random_iv is not better because it produces ‘bytes instead of numbers’, but because it uses a cryptographically secure random number generator. I hope.

Reply

Phillip Gawlowski July 23, 2011 at 8:35 pm

There is no such thing as a “cryptographically secure” RNG in software, since they all use some sort of algorithm to generate random numbers. That’s why an RNG, given the same seed, always produces the same sequence of numbers. If you want (or need!) true randomness, you must use a hardware RNG, which uses a chaotic process to get its numbers (for example by sampling the decay of an isotope).

If you have to follow some sort of encryption standard that OpenSSL is certified for, by all means use the functions OpenSSL provides. They’ve been peer reviewed and well tested, to ensure that they provide a safe and secure cryptographic implementation.

And the IV is used by a block cipher like AES to initiate the rotating cipher it will use to encrypt the blocks. Of course, the better the IV, the more secure is the encryption in the end. Mind, Ruby’s rand() function gets its seed values from an entropy provider of its host OS. If that is a hardware RNG, you get truly random numbers almost by default (and IIRC Ruby’s rand() uses a pretty well vetted algorithm to create its random numbers from those seeds).

Reply

Ivo July 24, 2011 at 3:09 pm

‘Cryptographically secure’ was the wrong term; I meant ‘cryptographically strong’. Even though all software PRNG’s use algorithms, some are significantly more suited for use for cyptographic purposes than others. This is the difference between Java’s java.util.Random and java.security.SecureRandom; see http://download.oracle.com/javase/6/docs/api/java/security/SecureRandom.html. I don’t think Ruby’s rand() satisfies that requirement.

BTW, obtaining non-deterministic seeds is possibe without specialized hardware: the arrival times of network packets or response times of disk drives (as these are directly related to variations in rotational speed) are suitable. See the RFC referenced from in the Java documentation.

Reply

Crypto Guy August 4, 2011 at 11:17 am

Hi,

You should consider using the PBKDF2 function to generate your key material and SecureRandom for obtaining better random data for salting PBKDF2. Ruby has a pbkdf2 gem. And for MACs, use a HMAC function (it’s more secure).

Here’s a PBKDF2-based version of your encryption example (requiring the pbkdf2 gem):

require ‘openssl’
require ‘pbkdf2′
require ‘securerandom’

# First, a PBKDF2-based salted+iterated key and IV generator function:
def get_key_iv(salt, password, key_len, iv_len, iterations=4096)
data = PBKDF2.new do |p|
p.hash_function = “sha256″
p.password = password
p.salt = salt
p.iterations = iterations
p.key_length = key_len + iv_len
end.bin_string
[ data[0, key_len], data[key_len, iv_len] ]
end

payload = “Plans for Blofeld’s newest Doomsday Device. This is top secret!”
password = “Bond, James Bond”
sha256 = Digest::SHA2.new(256)
aes = OpenSSL::Cipher.new(“AES-256-CFB”)
aes.encrypt
salt = SecureRandom.random_bytes(aes.key_len + aes.iv_len)
key, iv = get_key_iv(salt, password, aes.key_len, aes.iv_len)
aes.key = key
aes.iv = iv
encrypted_data = aes.update(payload) + aes.final

puts “Encrypted binary data as a hex string:”
puts encrypted_data.unpack(‘H*’)[0]
puts

# To decrypt, the only things you need to know are the salt and the password:
aes = OpenSSL::Cipher.new(“AES-256-CFB”)
aes.decrypt
key, iv = get_key_iv(salt, password, aes.key_len, aes.iv_len)
aes.key = key
aes.iv = iv
decrypted_data = aes.update(encrypted_data) + aes.final

puts “Decrypted data:”
puts decrypted_data.inspect
puts

# Make sure the decrypted data matches the original plaintext:
raise “Decryption failed” if decrypted_data != payload

Reply

nilanjan February 13, 2013 at 12:25 pm

if decrypt were a different program, wouldn’t you need to pass the key_len and iv_len to that program?

Reply

Paul Annesley September 15, 2011 at 9:54 am

“We also take advantage of SHA2?s 356 bit variant” … s/356/256/ I think :)

Reply

Phillip Gawlowski September 15, 2011 at 11:16 am

Of course! Thank you, and I fixed it. :)

Reply

Stephen Touset November 2, 2012 at 5:43 am

The approach demonstrated here has several significant weaknesses.

1. The key should be stretched with PBKDF2 instead of a single round of SHA-256, or it should be generated randomly.
2. The IV for CBC *must* be generated by a much stronger random number generator.
3. The MAC should actually be a keyed HMAC.
4. The MAC *must* be compared using a constant-time comparison algorithm.

To fix:

“`
key = OpenSSL::PKCS5.pbkdf2_hmac(‘x’, ‘y’, 128_000, 128, ‘SHA256′) # or
key = aes.random_key #
iv = aes.random_iv

hmac = OpenSSL::HMAC.new(key, ‘SHA256′).digest(plaintext)

# short-circuit if the lengths are inequal
return false if hmac1.bytesize != hmac2.bytesize
left = hmac1.bytes.to_a
right = hmac2.bytes.to_a
result = 0

left.length.times do |i|
result |= left[i] – right[i]
end

return result == 0
“`

Failure to do any of these can compromise the plaintext.

Reply

Stephen Touset November 2, 2012 at 5:45 am

Although, ideally, your encryption key and your HMAC key should be unique from one another. An easy way to achieve this is to generate a key of double the required length, and use the first half for encryption and the latter half for generating HMACs.

Reply

Alper Akgun February 9, 2013 at 8:14 pm

Hi Phillip, thanks for the nice blog. I guess, AES-128, 256 refers to the key size; and for AES block size is always 128bits., see here: “…you’ll find AES-128, or AES-256 being used to describe the particular block size that can be used.”

Reply

Leave a Comment

{ 22 trackbacks }

Previous post:

Next post: