Skip to content
This repository has been archived by the owner on May 1, 2021. It is now read-only.

Encryption

Harley Wiltzer edited this page Oct 26, 2017 · 13 revisions

Table of Contents

News

The forward onion protocol was simulated today, and it looks like it works! You can try it by pulling the crypto branch and running $ python stealth/test.py. You will see the initial onion packet and the how its layers are peeled off until the final plaintext message is obtained. This simulation does not include any networking however, it is merely a proof of concept to demonstrate that the encryption-layering in the forward onion is working properly.


Dependencies

Some systems may not have some python modules that are being used to implement cryptography installed (thankfully it looks like the Trottier machines do). Here's a list of dependencies:

  • PyCrypto
    • Used to implement AES and to generate random numbers
    • Install by # pip install pycrypto
    • Alternatively, some linux distros have packages for it:
      • Gentoo: # emerge dev-python/pycrypto
      • Debian-based: # apt-get install python-crypto
      • Arch Linux: # pacman -S python-crypto or with this PKGBUILD from the AUR

Overview of the Encryption Process

This section aims to provide an overview of how cryptography is used to protect sensitive data in the onion network at a design level. First, the whole process will be described abstractly, and then each component will be discussed in further detail.

Cryptography Primer

I thought it might be helpful to include a very brief primer on cryptography here, in case people aren't familiar with the vocabulary and concepts that are described in this section. If you have a general understanding of how asymmetric and symmetric cryptography work, this section won't be very interesting.

Cryptography is the science of scrambling data in such a way that only the intended receiver of the data can read it. Encryption is the process of taking some regular data (called plaintext) and some key (usually in the form of a bit string) and scrambling it, yielding incomprehensible data (called the ciphertext). Decryption is the inverse process, which takes some ciphertext and a key and outputs the corresponding plaintext. Encryption algorithms rely on the fact that decryption is easy given the key, but extremely difficult without it. There are two main types of cryptography, that differ vastly in the way that keys are used (and therefore may have different use cases), which are Symmetric Cryptography and Asymmetric Cryptography. Both of these will be used in our project.

Symmetric Cryptography

With Symmetric Cryptography, the same key is used both for encryption and decryption. Therefore, both the sender and receiver must share a secret key. An example of a Symmetric Cryptography scheme is the Caesar Cipher, which shifts every letter in the plaintext by a constant amount (the key). However, the Caesar Cipher clearly isn't very secure. The cipher that will be used for Symmetric Cryptography in this project is called AES. Symmetric cryptography is useful when both the sender and receiver can or need to share a key, which may occur frequently in bi-directional data transfer. However, a considerable issue with Symmetric Cryptography is that it may be difficult to share the secret key securely.

Asymmetric Crytography

Asymmetric Cryptography solves Symmetric Cryptography's problem of sharing a secret key by using two keys per party involved in the communication. Each communicator has a public key and a private key (which are generated together). As the names suggest, public keys can be shared freely and private keys must remain secret. Only the communicator knows his own private key. In Asymmetric Cryptography, the sender encrypts his message with the receiver's public key, and then the receiver must decrypt the ciphertext with his private key. This eliminates the need to have a secret key shared between multiple communicators. Therefore, Asymmetric Cryptography can (and will) be used to share secret symmetric keys. However, Asymmetric Cryptography has its shortcomings as well, especially in the context of this project. For example, when it comes time for the exit node to send data back, he would require the originator's public key to transmit the data back securely. This isn't good, because no one is supposed to know who the originator is!

High-Level View of Encryption in the Onion Network

An onion network is made up of several onion nodes and a directory node. The directory node knows about all onion nodes in the network, meaning it knows their respective IP addresses (or domain names) and public keys. The person sending a message through the onion network is called the Originator, and the last node in the onion network that transmits the raw message to its destination is called the exit node. The onion network must satisfy some conditions:

  1. No node may know who the originator is
  2. Each node only knows the previous node and the next node in the path
  3. No node may deduce the full path

Here's how communicating through the onion network works:

  1. The Originator contacts the Directory Node and receives an ordered list of 3 (domain, public key) pairs, corresponding to the onion nodes in the communication path that the Directory Node has chosen.
  2. The Originator generates 3 random keys that will be used for symmetric cryptography and stores them.
  3. The Originator communicates these random keys with the nodes in the communication path via the circuit establishment protocol.
  4. The Originator creates the initial onion, by layering symmetric encryption with the keys generated in step 3 in reverse order on the message he wants to send. This is done in such a way that each node in the network decrypts once and sends the output along, and when the exit node decrypts the output will be the plaintext message.
  5. After each node decrypts their layer and sends the message along, they wait for a response.
  6. Upon receiving a response, a node encrypts the response with his symmetric key and sends the output back to the node that is connected to (the previous node in the path).
  7. When the Originator receives a response, he decrypts with each symmetric key successively to receive the plaintext response.

Steps 4 and 5 will be referred to as the forward onion, and steps 6 and 7 will be referred to as the backward onion. This processes will be discussed in further detail below.

Lastly, the format of messages should be discussed. Currently, I made a JSONMessage class to handle this, but it's actually not very useful or necessary. Eventually it will be replaced with a few functions. Messages contain data aside from the message itself, such as the address of the next node and the message type. So, it was decided to represent messages with JSON to make parsing them convenient. Messages look like this:

'{
"data": "Your message goes here",
"addr": "Address of next node (or of destination in the case of the innermost layer)",
"type": "Message type, to determine how to handle the message"
}'

Right now, message types are being represented by an enum, but this is useless since it needs to be converted into a string anyway... so that will be changed. The available types right now are "Onion", for passing layered ciphers through the network, and "Data", for passing the message to a destination outside the network. In the near future, we will have other types (like "Establish" perhaps, to distinguish the messages being passed in the establish protocol). The use of JSON is very convenient, partly because Python's json library can convert JSON strings to a Python dictionary. Here's an example:

import json
msg = '{"data": "Hello, World", "addr": "node 1", "type": "Onion"}'
d = json.loads(msg) # Transforms the msg JSON to a dictionary
data = d["data"] # Gets the value associated with "data", namely "Hello, World"
d["addr"] = "mcgill.ca"
msg = json.dumps(d) # Transforms dict to a JSON string
print(msg) # Should show '{"data": "Hello, World", "addr": "mcgill.ca", "type": "Onion"}'

Forward Onion

The Forward Onion process is responsible for securely transmitting the Originator's message to its destination. This section will outline the implementation details of this process.

The Originator starts by creating a JSON message. For example, say the Originator wants to send "GET index.html" to "http://www.google.ca", the Originator creates:

'{
"data": "GET index.html",
"addr": "http://www.google.ca",
"type": "Data"
}'

This message is then encrypted with Node 3's (the exit node's) symmetric key. However, this presents an interesting problem, because for AES to function securely, it requires a new 16-byte initialization vector for every sent message. This initialization vector must be known by both the encrypter and the decrypter. To take care of this, some extra data must be stored by both the Originator and the Onion Nodes. Upon creation, the Originator creates an array of pseudo-random number generators, and uses the symmetric keys of the onion nodes as seeds. Similarly, each Onion Node has a pseudo-random number generator that is seeded with its own symmetric key. When the Originator encrypts, he generates a new random number with the generator associated to the node whose symmetric key is being used to encrypt, and uses that as the initialization vector. When the Onion Node receives the message, he generates a random number with his generator (which will be the same random number generated by the Originator, because the generators had the same seed), and uses that as the initialization vector. Now the message may be communicated securely.

At this point, the Originator has encrypted the original message with the exit node's symmetric key. Therefore, only the exit node can see this message. The Originator then creates another message with this ciphertext in the "data" field, the address of Node 3 in the "addr" field, and "Onion" in the "type field", and encrypts with Node 2's symmetric key. This process is then repeated for Node 1, and the corresponding ciphertext is sent to Node 1. And so begins the Forward Onion.

When an Onion Node receives a message, its job is to remove a layer of encryption and send the message along to the next node in the path. Say some node receives a message 'asdfhjalskjhsflkj342987yfasdkjbtkouqhfljasjhf...'. That's what ciphertext looks like (at least for the sake of this wiki page). The Onion Node decrypts with his symmetric key, and sees something that may look like:

'{
"data": "sdfhlkasdfjghlsdkjhlasdkjghsldkjhgsdlkjhsgkjhsdlkjgdsASGKJAHSD...",
"addr": "Address of Node 2",
"type": "Onion"
}'

The node now knows the address of the next node in the path, and can send what he finds in "data" to that address. However, this poses yet another issue: each time a node decrypts, the ciphertext that it sends to the next node shrinks! This isn't surprising, because the whole concept being used here is that of layered encryption, where the layers must be removed sequentially. However, the shrinking ciphertext size of a message going through a path can be a great security concern. Ideally, all ciphertext messages should have the same length, to avoid length attacks. But how is this possible? The common Onion Node can only send ciphertext to the next node, so it cannot just pad it arbitrarily if the next node is to decrypt it! This problem was solved by providing each node with the symmetric key of the node before it in the path. To maintain the condition that no node can determine who the Originator is, it will be necessary to give the Originator a symmetric key of his own. Now, each node in the network can add some padding and encrypt with its own symmetric key before sending the ciphertext over to the next node. The node's job upon receiving a message now looks like this:

  1. Decrypt with previous node's symmetric key
  2. Remove padding from ciphertext
  3. Decrypt with own symmetric key
  4. Extract data from the JSON message
  5. Add padding to the message
  6. Encrypt with own symmetric key
  7. Send ciphertext to the next node

Now it is necessary to determine how the padding will be computed and removed. This was kept simple for the sake of this project, and a more secure design can probably be concocted if there's more time. Firstly, it is desired for all messages to be of the same size. This will require a maximum message size to be enforced. The amount of padding necessary, P, will be NM - D, where N is the amount of nodes in the path (likely 3), M is the maximum message size, and D is the size of the decrypted "data" field. So, P random characters will be generated, and then "" will be appended to that. This padding is prepended to the decrypted "data" field. Removing the padding is trivial because it just requires finding the "" and truncating everything before it. Note that since the padding is encrypted, the structure of the padding should not be visible to an attacker. However, the appearance of the "***" might cause a slight known-plaintext vulnerability, but for the sake of this project it's probably secure enough.

Although a solution to padding ciphertexts to a fixed length was found, it's implementation is still not perfectly sound. This is because adding the padding involves adding extra encryption/decryption steps. Since the intermediate nodes must encrypt now (as opposed to the Originator like before), the random number generators that we currently have cannot generate initialization vectors for the padding encryption! This is solved by adding another random number generator at the initiator (seeded with the last 8 bytes of its symmetric key) and adding two more random number generators at each Onion Node. Note that two additional random number generators are needed, one to establish initialization vectors for the encryption from the previous node, and one to establish initialization vectors for the encryption with the next node. These random number generators are seeded with the last 8 bytes of the previous node's symmetric key and the last eight bytes of the current nodes symmetric key, respectively. This way, all random number generator seeds can be determined solely by the symmetric keys passed to each node, so no new data is needed. Now, the initialization vector can be computed at each node similarly to how it was done for encrypting the data.

At this stage, the onion can be reliably transmitted throughout the communication path. However, when the exit node receives a message, it will pad and encrypt the data before sending it to the destination. This isn't good, because then the destination won't know how to read the message! To take this into account, the "type" field of the innermost layer was set to "Data" rather than "Onion". This way, the exit node will know that it's an exit node, and will skip the padding and encryption steps. Although this concedes to informing the exit node that it is the exit node, this isn't likely to be an issue: the exit node can easily determine that it's an exit node in most cases simply by checking the address to which it is sending a message.

Backward Onion

Thankfully, the backward onion processes is significantly simpler than the forward onion. After doing their job in the forward onion, each node waits for a response. Upon getting the response, an Onion Node can send the response back to the previous node in the circuit (because it maintains a TCP connection to the node that sent it data during the forward onion). However, this doesn't quite work because it would involve passing the response message in plain text back to the Originator. Therefore, in the backward onion phase, Onion Nodes encrypt with their symmetric key (using the same random number generator for the initialization vector that was used to decrypt data in the forward onion) and pass the ciphertext back to the previous node. So, no nodes decrypt anything in the backward onion, they merely encrypt and propagate the ciphertext backward. When the originator receives the response, all he must do is decrypt with each of the nodes' symmetric keys to recover the plaintext response.


The rest of this wiki section will outline how the encryption functions work. It will be organized in a top-down manner, where the sections closer to the top of the page describe more abstract implements.

Implementation Details and API

Onioning

The implementation of the onion creation and management routines may be found in stealth/onion.py.

Originator

The Originator object is used to handle the creation of the onion, setting up shared symmetric keys, and decrypting the response.

Member Variables

  • key: A byte string representing the Originator's symmetric key
  • path: An array of (address, symmetric key) tuples representing the addresses and symmetric keys of the nodes in the communication path
  • gens: An array of random number generators used to generate random shared initialization vectors with the nodes in the communication path
  • rng: A random number generator used to generate random initialization vectors for the first node in the path to decrypt padding

Methods

create_onion(self, msg, dst)
  • msg: The plaintext message that the originator wants to send
  • dst: The address of the intended receiver of the message
  • returns a JSONMessage object destined for the first node in the path with the encryption-layered message.

The create_onion() function applies layered encryption with the symmetric keys of the nodes in the message's path.

add_layer(self, msg, node, first)
  • msg: The message string to include in the layer (the contents associated with "data")
  • node: The index in the path of the node that can decrypt the new layer
  • first: A boolean to distinguish the innermost layer (changes the value of "type")

The add_layer() function applies a new layer of encryption on top of the msg parameter. create_onion() iteratively calls this method on its output.

OnionNode

The OnionNode will be used for each circuit that a particular onion node is involved in, to peel off layers of encryption and to encrypt and propagate responses back towards the Originator.

Member Variables

  • key: A byte string representing the Onion Node's symmetric key
  • prevKey: A byte string representing the symmetric key of the previous node in the communication path
  • rng: A random number generator shared with the Originator to generate random initialization vectors for decrypting data in the forward onion and encrypting data in the backward onion
  • prevrng: A random number generator shared with the previous node in the communication path to generate random initialization vectors for decrypting the padding added by the previous node
  • nextrng: A random number generator shared with the next node in the communication path to generate random initialization vectors for encrypting padding before sending it to the next node

Methods

set_key(self, key)
  • key: A symmetric key string

This function sets the symmetric key of an OnionNode. It also sets the seed of the random number generator that it will use to encrypt the padding that it adds on to a message after peeling off a layer.

set_prev_key(self, key)
  • key: A symmetric key string

This function sets the value of prevKey, the symmetric key of the previous node in the circuit. It also sets the seed for the random number generator used to generate the initialization vector for decrypting the padding added by the previous node.

set_seed(self, seed)
  • seed: A byte string used as a seed for a random number generator

This function sets the seed for the random number generator that will be used to generate random initialization vectors for encrypting and decrypting message data.

peel_layer(self, cipher)
  • cipher: Ciphertext string that is sent to an onion node
  • returns a dictionary representing the decrypted data. This data was a previously-encrypted JSONMessage, so it has the address of the next node in the path and the encrypted data to send to it.

The peel_layer() function removes a layer of encryption and provides the OnionNode with enough information to send the message over to the next OnionNode in the path. It also handles padding what it decrypts and encrypting the result to securely pass fixed-with ciphers.

Encrypting

The implementation of encryption abstractions, specifically for automating AES encryption, may be found in stealth/stealth.py. Its interface is shown below.

get_random_key(nbytes)

  • nbytes: An integer containing the length of the key to be returned
  • returns a random symmetric key of length nbytes

AESProdigy

The AESProdigy class is used to abstract the process of AES encryption.

Member Variables

  • key: A byte string representing the symmetric key to be used (16 bytes long)
  • iv: A byte string representing the initialization vector to be used (16 bytes long). See Forward Onion

Methods

init(self, key, iv)
  • key: The symmetric key to be used for encryption and/or decryption. It is a byte string of 16 bytes.
  • iv: The initialization vector used for the encryption and decryption. It is a byte string of 16 bytes.
encrypt(self, message)
  • message: Plaintext string to be encrypted
  • returns a base64-encoded ciphertext of message, encrypted with self.key
decrypt(self, cipher)
  • cipher: Ciphertext string to be decrypted
  • returns the decrypted ciphertext, which was decrypted with self.key

Testing

Unit tests are being written in the stealth/test.py file. They can be run automatically with $ python stealth/test.py. The tests currently are:

  • test_forward_onion() Generates a large, random message and proceeds to carry out a forward onion process with synthetic nodes in an array. Test passes when the exit node decrypts and finds that the plaintext is equivalent to the randomly-generated message.

Additionally, stealth/forwardonion.py allows you to enter your own messages and destination addresses and watch the forward onion process.