-
Notifications
You must be signed in to change notification settings - Fork 340
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit b445a95
Showing
298 changed files
with
105,734 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Sphinx build info version 1 | ||
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. | ||
config: b87c046b3ed94a6cbe635c90384352b9 | ||
tags: 645f666f9bcd5a90fca523b33c5a78b7 |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Empty file.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
## Contact | ||
|
||
- Join us on `#cartography` on the [Lyft OSS Slack](https://join.slack.com/t/lyftoss/shared_invite/enQtOTYzODg5OTQwNDE2LTFiYjgwZWM3NTNhMTFkZjc4Y2IxOTI4NTdiNTdhNjQ4M2Q5NTIzMjVjOWI4NmVlNjRiZmU2YzA5NTc3MmFjYTQ). | ||
|
||
## Community Meeting | ||
|
||
Talk to us and see what we're working on at our [monthly community meeting](https://calendar.google.com/calendar/embed?src=lyft.com_p10o6ceuiieq9sqcn1ef61v1io%40group.calendar.google.com&ctz=America%2FLos_Angeles). | ||
- Meeting minutes are [here](https://docs.google.com/document/d/1VyRKmB0dpX185I15BmNJZpfAJ_Ooobwz0U1WIhjDxvw). | ||
- Recorded videos are posted [here](https://www.youtube.com/playlist?list=PLMga2YJvAGzidUWJB_fnG7EHI4wsDDsE1). | ||
- Our current project road map is [here](https://docs.google.com/document/d/18MOsGI-isFvag1fGk718Aht7wQPueWd4SqOI9KapBa8/edit#heading=h.15nsmgmjaaml). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,220 @@ | ||
# Cartography Developer Guide | ||
|
||
## Running the source code | ||
|
||
This document assumes familiarity with Python dev practices such as using [virtualenvs](https://packaging.python.org/guides/installing-using-pip-and-virtualenv/). | ||
|
||
1. **Run Neo4j** | ||
|
||
Follow the [Install Steps](../install.html) so that you get Neo4j running locally. It's up to you if you want to use Docker or a native install. | ||
|
||
1. **Install Python 3.10** | ||
|
||
1. **Clone the source code** | ||
|
||
Run `cd {path-where-you-want-your-source-code}`. Get the source code with `git clone git://github.com/lyft/cartography.git` | ||
|
||
1. **Perform an editable install of the cartography source code** | ||
|
||
Run `cd cartography` and then `pip install -e .` (yes, actually type the period into the command line) to install Cartography from source to the current venv. | ||
|
||
4. **Run from source** | ||
|
||
After this finishes you should be able to run Cartography from source with `cartography --neo4j-uri bolt://localhost:7687`. Any changes to the source code in `{path-where-you-want-your-source-code}/cartography` are now locally testable by running `cartography` from the command line. | ||
|
||
## Automated testing | ||
|
||
1. **Install test requirements** | ||
|
||
`pip install -r test-requirements.txt` | ||
|
||
1. **(OPTIONAL) Setup environment variables for integration tests** | ||
|
||
The integration tests expect Neo4j to be running locally, listening on default ports, and with auth disabled. | ||
|
||
To run the integration tests on a specific Neo4j instance, add the following environment variable: | ||
|
||
`export "NEO4J_URL=<your_neo4j_instance_bolt_url:your_neo4j_instance_port>"` | ||
|
||
1. **Run tests using `make`** | ||
- `make test_lint` runs [pre-commit](https://pre-commit.com) linting against the codebase. | ||
- `make test_unit` runs the unit test suite. | ||
|
||
⚠️ Important! The below commands will **DELETE ALL NODES** on your local Neo4j instance as part of our testing procedure. Only run any of the below commands if you are ok with this. ⚠️ | ||
|
||
- `make test_integration` runs the integration test suite. | ||
For more granular testing, you can invoke `pytest` directly: | ||
- `pytest ./tests/integration/cartography/intel/aws/test_iam.py` | ||
- `pytest ./tests/integration/cartography/intel/aws/test_iam.py::test_load_groups` | ||
- `pytest -k test_load_groups` | ||
- `make test` can be used to run all of the above. | ||
|
||
## Implementing custom sync commands | ||
|
||
By default, cartography will try to sync every intel module included as part of the default sync. If you're not using certain intel modules, you can create a custom sync script and invoke it using the cartography CLI. For example, if you're only interested in the AWS intel module you can create a sync script, `custom_sync.py`, that looks like this: | ||
|
||
```python | ||
from cartography import cli | ||
from cartography import sync | ||
from cartography.intel import aws | ||
from cartography.intel import create_indexes | ||
|
||
def build_custom_sync(): | ||
s = sync.Sync() | ||
s.add_stages([ | ||
('create-indexes', create_indexes.run), | ||
('aws', aws.start_aws_ingestion), | ||
]) | ||
return s | ||
|
||
def main(argv): | ||
return cli.CLI(build_custom_sync(), prog='cartography').main(argv) | ||
|
||
if __name__ == '__main__': | ||
import sys | ||
sys.exit(main(sys.argv[1:])) | ||
``` | ||
|
||
Which can then be invoked using `python custom_sync.py` and will have all the features of the cartography CLI while only including the intel modules you are specifically interested in using. For example: | ||
|
||
``` | ||
cartography$ python custom_sync.py | ||
INFO:cartography.sync:Starting sync with update tag '1569022981' | ||
INFO:cartography.sync:Starting sync stage 'create-indexes' | ||
INFO:cartography.intel.create_indexes:Creating indexes for cartography node types. | ||
INFO:cartography.sync:Finishing sync stage 'create-indexes' | ||
INFO:cartography.sync:Starting sync stage 'aws' | ||
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials | ||
... | ||
``` | ||
|
||
## dev.Dockerfile | ||
|
||
We include a dev.Dockerfile that can help streamline common dev tasks. It is different from the main Dockerfile in that | ||
|
||
1. It is strictly intended for dev purposes. | ||
1. It performs an editable install of the cartography source code and test requirements. | ||
1. It does not define a docker entrypoint. This is to allow you to run a custom sync script instead of just the main `cartography` command. | ||
|
||
To use it, build dev.Dockerfile with | ||
```bash | ||
cd /path/to/cartography/repo | ||
docker build -t lyft/cartography-dev -f . dev.Dockerfile | ||
docker-compose --profile dev up -d | ||
``` | ||
|
||
With that, there are some interesting things you can do with it. | ||
|
||
### Dev with docker-compose | ||
|
||
#### Run the full test suite | ||
|
||
```bash | ||
docker-compose run cartography-dev make test_lint | ||
docker-compose run cartography-dev make test_unit | ||
docker-compose run cartography-dev make test_integration | ||
|
||
# for all the above | ||
docker-compose run cartography-dev make test | ||
``` | ||
|
||
#### Run a [custom sync script](#implementing-custom-sync-commands) | ||
|
||
```bash | ||
docker-compose run cartography-dev python custom_script.py | ||
``` | ||
|
||
#### Run the cartography CLI | ||
|
||
```bash | ||
docker-compose run cartography-dev cartography --help | ||
``` | ||
|
||
### Equivalent manual docker commands | ||
|
||
If you don't like docker-compose or if it doesn't work for you for any reason, here are the equivalent manual docker commands for the previous scenarios: | ||
|
||
#### Run unit tests with dev.Dockerfile | ||
|
||
```bash | ||
docker run --rm lyft/cartography-dev make test_unit | ||
``` | ||
|
||
This is a simple command because it doesn't require any volume mounts or docker networking. | ||
|
||
#### Run the linter with dev.Dockerfile | ||
|
||
```bash | ||
docker run --rm \ | ||
-v $(pwd):/var/cartography \ | ||
-v $(pwd)/.cache/pre-commit:/var/cartography/.cache/pre-commit \ | ||
lyft/cartography-dev \ | ||
make test_lint | ||
``` | ||
|
||
The volume mounts are necessary to let pre-commit from within the container edit source files on the host machine, and for pre-commit's cached state to save on your host machine without needing to update itself every time you run it. | ||
|
||
#### Run integration tests with dev.Dockerfile | ||
|
||
First run a Neo4j container: | ||
```bash | ||
docker run \ | ||
--publish=7474:7474 \ | ||
--publish=7687:7687 \ | ||
--network cartography-network \ | ||
-v data:/data \ | ||
--name cartography-neo4j \ | ||
--env=NEO4J_AUTH=none \ | ||
neo4j:4.4-community | ||
``` | ||
|
||
and then call the integration test suite like this: | ||
```bash | ||
docker run --rm \ | ||
--network cartography-network \ | ||
-e NEO4J_URL=bolt://cartography-neo4j:7687 \ | ||
lyft/cartography-dev \ | ||
make test_integration | ||
``` | ||
|
||
Note that we needed to specify the `NEO4J_URL` env var so that the integration test would be able to reach the Neo4j container. | ||
|
||
#### Run the full test suite with dev.Dockerfile | ||
|
||
Bring up a neo4j container | ||
```bash | ||
docker run \ | ||
--publish=7474:7474 \ | ||
--publish=7687:7687 \ | ||
--network cartography-network \ | ||
-v data:/data \ | ||
--name cartography-neo4j \ | ||
--env=NEO4J_AUTH=none \ | ||
neo4j:4.4-community | ||
``` | ||
|
||
and then run the full test suite by specifying all the necessary volumes, network, and env vars. | ||
```bash | ||
docker run --rm \ | ||
-v $(pwd):/var/cartography \ | ||
-v $(pwd)/.cache/pre-commit:/var/cartography/.cache/pre-commit \ | ||
--network cartography-network \ | ||
-e NEO4J_URL=bolt://cartography-neo4j:7687 \ | ||
lyft/cartography-dev \ | ||
make test | ||
``` | ||
|
||
#### Run a [custom sync script](#implementing-custom-sync-commands) with dev.Dockerfile | ||
|
||
```bash | ||
docker run --rm lyft/cartography-dev python custom_sync.py | ||
``` | ||
|
||
#### Run cartography CLI with dev.Dockerfile | ||
|
||
```bash | ||
docker run --rm lyft/cartography-dev cartography --help | ||
``` | ||
|
||
## How to write a new intel module | ||
See [here](writing-intel-modules.html). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
.. toctree:: | ||
|
||
developer-guide | ||
writing-analysis-jobs | ||
writing-intel-modules |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,122 @@ | ||
# How to extend Cartography with Analysis Jobs | ||
|
||
## Overview | ||
In a nutshell, Analysis Jobs let you add your own customizations to Cartography by writing Neo4j queries. This helps you add powerful enhancements to your data without the need to write Python code. | ||
|
||
### The stages | ||
There are 3 stages to a cartography sync. First we create database indexes, next we ingest assets via intel modules, and finally we can run Analysis Jobs on the database (see [cartography.sync.build\_default\_sync()](https://github.com/lyft/cartography/blob/master/cartography/sync.py)). This tutorial focuses on Analysis Jobs. | ||
|
||
### How to run | ||
Each Analysis Job is a JSON file with a list of Neo4j statements which get run in order. To run Analysis Jobs, in your call to `cartography`, set the `--analysis-job-directory` parameter to the folder path of your jobs. Although the order of statements within a single job is preserved, we don't guarantee the order in which jobs are executed. | ||
|
||
## Example job: which of my EC2 instances is accessible to any host on the internet? | ||
The easiest way to learn how to write an Analysis Job is through an example. One of the Analysis Jobs that we've included by default in Cartography's source tree is [cartography/data/jobs/analysis/aws_ec2_asset_exposure.json](https://github.com/lyft/cartography/blob/master/cartography/data/jobs/analysis/aws_ec2_asset_exposure.json). This tutorial covers only the EC2 instance part of that job, but after reading this you should be able to understand the other steps in that file. | ||
|
||
### Our goal | ||
After ingesting all our AWS data, we want to explicitly mark EC2 instances that are accessible to the public internet - a useful thing to know for anyone running an internet service. If any internet-open nodes are found, the job will add an attribute `exposed_internet = True` to the node. This way we can easily query to find the assets later on and take remediation action if needed. | ||
|
||
But how do we make this determination, and how should we structure the job? | ||
|
||
### The logic in plain English | ||
We can use the following facts to tell if an EC2 instance is open to the internet: | ||
|
||
1. The EC2 instance is a member of a Security Group that has an IP Rule applied to it that allows inbound traffic from the 0.0.0.0/0 subnet. | ||
2. The EC2 instance has a network interface that is connected to a Security Group that has an IP Rule applied to it that allows inbound traffic from the 0.0.0.0/0 subnet. | ||
|
||
The graph created by Cartography's sync process already has this information for us; we just need to run a few queries to properly to mark it with `exposed_internet = True`. This example is complex but we hope that this exposes enough Neo4j concepts to help you write your own queries. | ||
|
||
### Translating the plain-English logic into Neo4j's Cypher syntax | ||
We can take the ideas above and use Cypher's declarative syntax to "sketch" out this graph path. | ||
|
||
1. _The EC2 instance is a member of a Security Group that has an IP Rule applied to it that allows inbound traffic from the 0.0.0.0/0 subnet._ | ||
|
||
In Cypher, this is | ||
|
||
``` | ||
MATCH | ||
(:IpRange{id: '0.0.0.0/0'})-[:MEMBER_OF_IP_RULE]->(:IpPermissionInbound) | ||
-[:MEMBER_OF_EC2_SECURITY_GROUP]->(group:EC2SecurityGroup) | ||
<-[:MEMBER_OF_EC2_SECURITY_GROUP]-(instance:EC2Instance) | ||
|
||
SET instance.exposed_internet = true, | ||
instance.exposed_internet_type = coalesce(instance.exposed_internet_type , []) + 'direct'; | ||
``` | ||
|
||
In the `SET` clause we add `exposed_internet = True` to the instance. We also add a field for `exposed_internet_type` to denote what type of internet exposure has occurred here. You can read the [documentation for `coalesce`](https://neo4j.com/docs/cypher-manual/current/functions/scalar/#functions-coalesce), but in English this last part says "add `direct` to the list of ways this instance is exposed to the internet". | ||
|
||
2. _The EC2 instance has a network interface that is connected to a Security Group that has an IP Rule applied to it that allows inbound traffic from the 0.0.0.0/0 subnet._ | ||
|
||
This is the same as the previous query except for the final line: | ||
|
||
``` | ||
MATCH | ||
(:IpRange{id: '0.0.0.0/0'})-[:MEMBER_OF_IP_RULE]->(:IpPermissionInbound) | ||
-[:MEMBER_OF_EC2_SECURITY_GROUP]->(group:EC2SecurityGroup) | ||
<-[:NETWORK_INTERFACE*..2]-(instance:EC2Instance) | ||
|
||
SET instance.exposed_internet = true, | ||
instance.exposed_internet_type = coalesce(instance.exposed_internet_type , []) + 'direct'; | ||
``` | ||
|
||
The `*..2` operator means "within 2 hops". We use this here as a shortcut because there are a few more relationships between NetworkInterfaces and EC2SecurityGroups that we can skip over. | ||
|
||
Finally, notice that (1) and (2) are similar enough that we can actually merge them like this: | ||
|
||
``` | ||
MATCH | ||
(:IpRange{id: '0.0.0.0/0'})-[:MEMBER_OF_IP_RULE]->(:IpPermissionInbound) | ||
-[:MEMBER_OF_EC2_SECURITY_GROUP]->(group:EC2SecurityGroup) | ||
<-[:MEMBER_OF_EC2_SECURITY_GROUP|NETWORK_INTERFACE*..2]-(instance:EC2Instance) | ||
|
||
SET instance.exposed_internet = true, | ||
instance.exposed_internet_type = coalesce(instance.exposed_internet_type , []) + 'direct'; | ||
``` | ||
|
||
Kinda neat, right? | ||
|
||
### The skeleton of an Analysis Job | ||
Now that we know what we want to do on a sync, how should we structure the Analysis Job? Here is the basic skeleton that we recommend. | ||
|
||
#### Clean up first, then update | ||
In general, the first statement(s) should be a "clean-up phase" that removes custom attributes or relationships that you may have added in a previous run. This ensures that whatever labels you add on this current run will be up to date and not stale. Next, the statements after the clean-up phase will perform the matching and attribute updates as described in the previous section. | ||
|
||
**Here's our final result:** | ||
|
||
``` | ||
{ | ||
"name": "AWS asset internet exposure", | ||
"statements": [ | ||
{ | ||
"__comment": "This is a clean-up statement to remove custom attributes", | ||
"query": "MATCH (n) | ||
WHERE n.exposed_internet IS NOT NULL | ||
AND labels(n) IN ['AutoScalingGroup', 'EC2Instance', 'LoadBalancer'] | ||
WITH n LIMIT $LIMIT_SIZE | ||
REMOVE n.exposed_internet, n.exposed_internet_type | ||
RETURN COUNT(*) as TotalCompleted", | ||
"iterative": true, | ||
"iterationsize": 1000 | ||
}, | ||
{ | ||
"__comment__": "This is our analysis logic as described in the section above", | ||
"query": MATCH (:IpRange{id: '0.0.0.0/0'})-[:MEMBER_OF_IP_RULE]->(:IpPermissionInbound) | ||
-[:MEMBER_OF_EC2_SECURITY_GROUP]->(group:EC2SecurityGroup) | ||
<-[:MEMBER_OF_EC2_SECURITY_GROUP|NETWORK_INTERFACE*..2]-(instance:EC2Instance) | ||
|
||
SET instance.exposed_internet = true, | ||
instance.exposed_internet_type = coalesce(instance.exposed_internet_type , []) + 'direct';, | ||
"iterative": true, | ||
"iterationsize": 100 | ||
} | ||
] | ||
} | ||
``` | ||
|
||
Setting a statement as `iterative: true` means that we will run this query on `#{iterationsize}` entries at a time. This can be helpful for queries that return large numbers of records so that Neo4j doesn't get too angry. | ||
|
||
Now we can enjoy the fruits of our labor and query for internet exposure: | ||
|
||
![internet-exposure-query](../images/exposed-internet.png) | ||
|
||
## Recap | ||
As shown, you create an Analysis Job by putting together a bunch of `statements` together (which are essentially Neo4j queries). In general, each job should first clean up the custom attributes added by a previous run, and then it can perform the match and update steps to add the custom attributes back again. This ensures that your data is up to date. |
Oops, something went wrong.