Skip to content

Latest commit

 

History

History
344 lines (225 loc) · 17 KB

README.md

File metadata and controls

344 lines (225 loc) · 17 KB

vault-ipfs

A Vault plugin for data and access management on IPFS.

  • Fine-grained controls for authorizing reading and writing asymmetrically encrypted IPFS data.
  • TLS security from client to Vault, then encryption of data in transit and at rest within IPFS.
  • Audit trails to indicate who attempted to access IPFS Merkle forest data using Vault as a proxy.

Essentially, this plugin was inspired by Vault's native asymmetric transit encryption capability and core key-value store:

  1. Plaintext is sent to Vault to be encrypted,
  2. Once it's encrypted, Vault uploads the data to IPFS,
  3. IPFS returns the data's Content Identifier hash to Vault,
  4. Vault stores the hash in its KV store for the IPFS mount, and returns the hash to the client for reference.

The client can then discard the hash or record it, but Vault keeps the hash in a catalogue for the mount; at any time, an operator can determine which hashes on IPFS are being managed using a given mount.

When a client wants Vault-encrypted data from IPFS,

  1. The client requests the encrypted object's hash through Vault,
  2. Vault retrieves the requested IPFS object from an IPFS gateway,
  3. The plugin decrypts the node's data using the mount's decryption key and returns the decrypted data to the client.

Clever Merkle Forest Joke

IPFS was designed to store files, so use of it a company would assumedly require role-based access restriction. Consider the IPFS docs' UnixFS directory hash /ipfs/QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG:

% ipfs ls QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG
# hash                                         size name
QmZTR5bcpQD7cFgTorqxZDYaew1Wqgfbd2ud9QqGPAkK2V 1688 about
QmYCvbfNbCwFR45HiNP45rwJgvatpiW38D961L5qAhUM5Y 200  contact
QmY5heUM5qgRubMDD1og9fhCPA6QdkMp3QCwd4s7gJsyE7 322  help
QmdncfsVm2h5Kqq9hPmU7oAVX2zTSVP3L869tgTbPYnsha 1728 quick-start
QmPZ9gcCEpqKTo6aq61g2nXGUhM4iCL3ewB6LDXZCtioEB 1102 readme
QmTumTjvcYCAvRRwQ8sDRxh8ezmrcr88YFU7iYNroGGTBZ 1027 security-notes

Also of note:

  • The public IPFS Merkle forest is immense, and provisioning policies for individual nodes and links of IPFS Merkle trees would lead to complex and unmaintainable policies.
  • The data within the UnixFS nodes may not be plaintext, but for a UnixFS tree to be human-friendly the names of links must be plaintext.

To allow clients to read the IPFS docs tree through Vault, read and list capability could be provisioned to the initial DAG, and separate stanzas provisioned to all linked objects explicitly. The discrete paths approach using path "ipfs/object/<hash>" stanzas for each CID in the tree would total 7 stanzas to provision complete read-only access. If a role needs access to multiple trees, its policy would quickly spiral out of control.

This plugin supports IPFS's DHT link resolution functionality. With IPFS, a request for /ipfs/QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG/readme is resolved by the IPFS network to be a request for the readme link's CID. Therefore, a maintainable solution for administrator is to provision read and list on the root node of the tree and utilize globbing to provision the rest of the tree implicitly: path "ipfs/object/QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG/*".

Globbed CID policy paths allow clients to list the root's links and read the data beneath it as far down as the Merkle tree extends: access is to one tree only, as far as it extends. If required, deny capability can be used to restrict tree nodes ad-hoc. Using the readme link example specifically, the globbed path allows implicit read and list access to /ipfs/object/QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG/readme without requiring explicit policy grants for the readme's direct /ipfs/object/QmPZ9gcCEpqKTo6aq61g2nXGUhM4iCL3ewB6LDXZCtioEB path.

IPNS and DNSLink

IPNS and DNSLink are two well-known methods of assigning mutable references to immutable IPFS CIDs. However, IPNS utilizes the ID of the IPFS node itself to publish a single tree, and DNSLink extends that by having human-readable DNS TXT records point to IPFS entry. Use of either over a KV store/proxy would have drawbacks.

Warnings

  • Inter-IFPS node communication uses custom encryption that has not yet been audited. The mechanism is not SSL or TLS, but there is community discussion around implementing TLS 1.3.
  • This plugin can't record encryption/decryption attempts made with using a backend's encryption keys if they are removed from Vault.
  • It's impossible to rotate encryption keys for Vault-managed IPFS data in the traditional sense: you can re-encrypt objects' data with a new key, but objects encrypted with the old key(s) may remain on IPFS forever*.
  • File and directory names are uploaded to the IPFS in plaintext. If a DAG's hash is known and public IPFS, it can be queried through any IPFS node and its full tree can be discovered.

Policies

Building on the Merkle tree isolation and versioning explanations above, here are policy examples of what's possible:

// Explicitly forbid using Vault as a proxy to read unmanaged IPFS objects.
path "ipfs/object/*" {
  capabilities = ["deny"]
}

// Allow uploading data to IPFS.
path "ipfs/data" {
  capabilities = ["create"]
}

// Allow traversing the managed objects' DAG trees' metadata. A policy like this
// can be used for client-initiated garbage collection of outdated objects'
// version references.
path "ipfs/metadata" {
  capabilities = ["list"]
}
path "ipfs/metadata/*" {
  capabilities = ["list"]
}

// Allow changing references of managed objects, but only if the DAGs are part
// of the network and Vault-managed already (`create` is not allowed).
path "ipfs/data/*" {
  capabilities = ["update"]
}

// Allow listing links of a specific DAG without allowing decryption of the
// object data.
path "ipfs/data/QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG" {
  capabilities = ["list"]
}

// Same as above, but allow enumeration of a specific DAG's entire Merkle tree.
// The policy allows listing of linked DAG's links, and their links, etc.
path "ipfs/data/QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG/*" {
  capabilities = ["list"]
}

// Allow reading a specific link of a DAG. Reading managed data using an
// alternate path, such as from `/ipfs/data/<readme-hash>` would be implicitly
// disallowed.
path "ipfs/data/QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG/readme" {
  capabilities = ["read"]
}

// Allow access to a full managed DAG's Merkle tree, but forbid access to a DAG
// linked within it. The referenced link cannot be accessed (read and decrypted)
// through Vault unless permissions are circumvented with access to another tree
// that contains the node or by a policy on the DAG's direct hash.
path "ipfs/object/Qmb8wsGZNXt5VXZh1pEmYynjB6Euqpq3HYyeAdw2vScTkQ/*" {
  capabilities = ["read", "list"]
}
path "ipfs/object/Qmb8wsGZNXt5VXZh1pEmYynjB6Euqpq3HYyeAdw2vScTkQ/838 - Incident/*" {
  capabilities = ["deny"]
}

Caveats

Deletion

Deleting content from The Permanent Web is complicated, but theoretically possible. To summarize, if a node not under your influence pins or replicates an object, you can't force it to take the object down. However, if all nodes that replicated the object decide to unpin it, garbage collect it, or go offline, the object will eventually fade from IPFS.

When the IPFS mesh fails to retrieve data using an object's CID, the CID's content could be considered "deleted", although the ledger would prove something existed.

Object Size

Although Vault won't store the IPFS object data itself, it still needs to process the full Data payload to encrypt or decrypt it. Vault's TCP listeners are configured to deny payloads above 32MB by default to help mitigate denial-of-service attacks. The maximum size can be adjusted per-listener.

The plugin's API does not support pulling a full Merkle tree in a single request, but if individual DAGs requested through Vault surpass 32MB in size the max_request_size parameter would need to be adjusted.

Encoding

Vault is not meant to process binary data, only key-value pairs. For the sake of consistency data must be base64 encoded prior to being posted against Vault and base64 decoded after being read.

API

The API was designed to resemble Vault's Key-Value V2 secrets engine API. It does not account for all of IPFS's capabilities. It is biased toward newer directions the IPFS community has taken. IPFS's API itself is very full-featured, but is not yet stable.

Set IPFS Configuration

This path retrieves sets the configuration for the IPFS backend.

Method Path Produces
POST /ipfs/config 204 (application/json)

Read IPFS Configuration

This path retrieves the current configuration for the IPFS backend at the given path.

Method Path Produces
GET /ipfs/config 200 (application/json)

Get Object

Retrieve an object directly from IPFS.

Method Path Produces
GET /ipfs/object/:hash(/:link) 200 (application/json)

Parameters

  • hash (string: <required>) - Hash of content to retrieve.
  • link (string: <optional>) - Link of the desired hash's DAG to return.

List Object Links

Retrieve the list of Links of an object DAG directly from IPFS.

Method Path Produces
LIST /ipfs/object/:hash(/:link) 200 (application/json)

Parameters

  • hash (string: <required>) - Hash of content to retrieve DAG links.
  • link (string: <optional>) - Link of the desired hash's DAG to query.

Links

License

📝