All posts in this series
- Understanding Cryptographic Terms - Cryptography 101
- All About Certificates - The Anatomy of a Certificate
- Which Types of Certificates Exist and How are They Created?
- What is the Subject Alternative Name and Why is it Important for VPN?
- Why Are Certificates Used for VPN? Best Practice Authentication
Cryptographic certificates are ubiquitous on the modern internet and every user has heard of them, but surprisingly only few know what certificates actually are, how they really work, what you can do with them, why they are so important and how to use them properly.
With more and more VPN protocols supporting or even fully relying on certificates, and certificates certainly playing an increasingly important role in VPN security, it's time to close these knowledge gaps and talk about certificates in all their glory.
Cryptography 101
Before we can delve into the wonderful world of certificates, we first need to understand some basic cryptographic terms and operations, but don't worry, it will be far less technical than you might fear and there will be barely any math involved. Let's get started!
Keys
When people talk about cryptography, they often talk about "keys" but what exactly is a key? A lot of people confuse keys with passwords but a password is not a key. A key is a bunch of random looking bytes that are used to initialize a cryptographic function before performing a cryptographic operation with that function.
A cryptographic function that encrypts/decrypts data is called a cipher. Most ciphers require keys of a specific length. For example, the AES cipher requires keys that have an exact length of 16 bytes (AES-128), 24 bytes (AES-192) or 32 bytes (AES-256). Some ciphers also accept a range of lengths, e.g. Blowfish accepts keys from 4 bytes to 56 bytes. And there are also a few ciphers like RC4 which can accept keys of any length.
The longer the key, the harder it is to break the encryption, right? Well, not quite. If you are trying to break the encryption by guessing the key (a so called brute force attack), the longer the key, the more guesses will be required. Yet if there is a more sophisticated attack than just guessing raw keys, the key length may not even be that relevant or not relevant at all.
So how are keys generated in the first place? If they are just random values, then the key must be stored somewhere as it cannot be reproduced otherwise, but a common way to retrieve keys is by deriving them from a password. Therefore the password is fed into a key derivation function, which turns a password of any length, that typically consists only of human readable characters, into a key or random looking bytes of a specific length.
A simple key derivation function is a hash function, like SHA-2. The downside of a hash function is that it is relatively fast. This is good if you need to calculate a lot of keys but it's bad when people try to break encryption by guessing the password, rather than guessing key directly, as the faster the key derivation, the more passwords they can try a second. That's why there are more advanced key derivation functions like PBKDF2 or Argon2, that are designed to be slow (and in case of Argon2 also to require a lot of memory), so deriving a key will take some time (and maybe other resources), which isn't an issue, as we are still talking about a second or two but it makes the life of key crackers a lot harder.
Symmetric and Asymmetric Encryption
When people think about encryption, they usually think of symmetric encryption. Symmetric encryption works by feeding a key into a cipher, followed by data to obtain encrypted data. To decrypt that data again, the cipher is used in reverse with the same key. In that case the key works like an actual key: The same key that locks the door, is also able to unlock it, you just need to use it in reverse (turning it in the opposite direction as you did when locking the door). You use a password to encrypt a file and you enter it again to decrypt the file. Simple enough.
Yet the way more interesting encryption is called asymmetric encryption. Instead of a single key, you have a key pair. You use one key to encrypt data and the other key to decrypt it again. How can this work? It's hard to find an analogy in every day life but consider the following:
You have a stack of 52 playing cards inside a machine. Only the top card of that stack is accessible from outside the machine. Also the machine has a keypad to type in numbers and the number typed in the the number of card rotations the machine will then perform. A card rotation is when the machine takes the bottom card of the stack and places it on the top.
To make the secret top card inaccessible, you just enter some random number, e.g. 30. Now the machine performs 30 rotations and the secret card is somewhere buried in the middle of the stack. To get access to the secret card, it must be moved back to top. But note: The machine can only rotate in one direction, so you cannot just rotate it 30 times backwards. Instead you need to rotate it another 22 times forward to get the secret card back to top. Note how the key for encryption (30) was different than the key for decryption (22).
If you don't know how big the stack is, you cannot know how many times you need to rotate it to get the secret card back to top, not even if you know how many rotations were performed to hide it the secret card. Of course, this doesn't sound like hard problem to solve. Just rotate the stack by one card, check the top and repeat until you found the secret card. Surely sounds like a reasonable approach, except for the case that the stack of card has more cards than our entire solar system has atoms; in that case you are totally lost, unless you know the exact amount of rotations required.
A common asymmetric cipher is RSA and while AES may use a 16 byte keys to encrypt data, RSA typically uses a 256 byte keys as of today. It is based on prime numbers and the fact that it's easy to multiply numbers but there is no efficient way to perform prime factorization of larger numbers with computers as far as we know; however, this may change with quantum computing.
An alternative to RSA that requires smaller keys are cryptographic schemes based on elliptic curves, interpreting data as coordinates on a curve and then performing calculations that are relatively easy to perform in one direction but pretty much impossible to perform the other way round, which leads to the same situation as a card stack that can only be rotated one way but not in reverse.
Signatures
One huge advantage of asymmetric encryption is that it can not just be used to encrypt data, it can also be used to sign data. The idea of a signature is to add something to data that can ensure two things: It can ensure that the data was created by a certain person, machine or organization, as only that entity is able to create the correct signature attached to the data and it can ensure that the data has not been manipulated after it has been signed. Whoever receives the data must somehow be able to verify that both expectations are still true.
To sign data, you first need a key pair. One of the two keys is named the public key and it is publicly distributed. Everybody in the entire world may get a copy of that public key. The more this key is distributed, the better. The other key is named the private key and as the name implies, it is kept private.
To sign a piece of data, first a cryptographic hash of the data is calculated, e.g. using SHA-2. This hash is then encrypted with the private key and the result is attached to the data. Done. To verify the signature is correct, just calculate a hash of the data yourself, decrypt the attached hash with the public key (which you can get access to as it is publicly distributed) and if both hashes match, the signature is valid.
How does this fulfill our two expectations above?
1) Only when encrypted with the right private key, can the data be correctly decrypted with the distributed public key. If you know that a certain public key is Alice's and you receive data that can be correctly decrypted with Alice's public key, you know for sure that this data was encrypted with Alice's private key – otherwise it would not correctly decrypt. And since only Alice has access to that private key, you know for sure that it was Alice who created that signature. And if Alice only signs data she created herself, you know that this data was created by Alice, no doubt.
2) In case anyone had manipulated the data, the hash you calculated will no longer match the decrypted hash that was attached to the data. And whoever manipulated the data cannot update the encrypted hash, as he can surely calculate a new hash value but he cannot encrypt it correctly, as he has no access to the private key of Alice to do so. So if the hashes do match, the data has not been altered since Alice signed it.
Signatures as Proof of Identity
Signatures can be used to protect data (ensuring data integrity) but they can also be used as proof of identity. If you are talking to someone in a chat who claims to be Alice but you want to make sure you are really talking to Alice, how would you do that?
Assuming you have the public key of Alice and you know for sure that this really is the public key of Alice, you can just do the following:
Make up some random sentence in your mind, send this sentence via chat to whoever you are talking to and request that person to sign that data and send back the signature.
If this signature can be validated using the public key of Alice, the person you are talking to is in possession of the private key of Alice and the only person who should have access to that key is Alice herself, so you know for sure that you are indeed talking to Alice.
That is unless Alice has signed that very same sentence in the past and the person has a copy of that signature, which is the reason why the sentence must be either very random and unpredictable or chosen in such a way that it's very unlikely to have ever been signed before, like "Today is <date> <time> and I hereby confirm to Bob that I am indeed Alice".
Next up: The Anatomy of a Certificate
In this post, we clarified some important cryptographic terms and had a look at basic cryptographic operations. Now it's time to build upon that and start to actually look at certificates in detail.
The next post covers the anatomy of a certificate – including its purpose, contents, how it is formatted, and how it is stored in files. Continue reading →
New to VPN Tracker?
Start your free, 7 day VPN Tracker trial today to test all the latest features.