-
Notifications
You must be signed in to change notification settings - Fork 340
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit d717f31
Showing
278 changed files
with
104,740 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Sphinx build info version 1 | ||
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. | ||
config: 20c30e1e72d3581b39cb4679c80b9b22 | ||
tags: 645f666f9bcd5a90fca523b33c5a78b7 |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Empty file.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
## Contact | ||
|
||
- Join us on `#cartography` on the [Lyft OSS Slack](https://join.slack.com/t/lyftoss/shared_invite/enQtOTYzODg5OTQwNDE2LTFiYjgwZWM3NTNhMTFkZjc4Y2IxOTI4NTdiNTdhNjQ4M2Q5NTIzMjVjOWI4NmVlNjRiZmU2YzA5NTc3MmFjYTQ). | ||
|
||
## Community Meeting | ||
|
||
Talk to us and see what we're working on at our [monthly community meeting](https://calendar.google.com/calendar/embed?src=lyft.com_p10o6ceuiieq9sqcn1ef61v1io%40group.calendar.google.com&ctz=America%2FLos_Angeles). | ||
- Meeting minutes are [here](https://docs.google.com/document/d/1VyRKmB0dpX185I15BmNJZpfAJ_Ooobwz0U1WIhjDxvw). | ||
- Recorded videos are posted [here](https://www.youtube.com/playlist?list=PLMga2YJvAGzidUWJB_fnG7EHI4wsDDsE1). | ||
- Our current project road map is [here](https://docs.google.com/document/d/18MOsGI-isFvag1fGk718Aht7wQPueWd4SqOI9KapBa8/edit#heading=h.15nsmgmjaaml). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
# Cartography Developer Guide | ||
|
||
## Testing | ||
|
||
_If you'd like to test using Docker and Docker Compose, see [here](testing-with-docker.html)_ | ||
|
||
### Running from source | ||
|
||
1. **Install** | ||
|
||
Follow steps 1 and 2 in [Installation](../install.html#cartography-installation). Ensure that you have JVM 11 installed and Neo4j Community Edition 4.4 is running on your local machine. | ||
|
||
2. **Clone the source code** | ||
|
||
Run `cd {path-where-you-want-your-source-code}`. Get the source code with `git clone git://github.com/lyft/cartography.git` | ||
|
||
3. **Install from source** | ||
|
||
Run `cd cartography` and then `pip install -e .` (yes, actually type the period into the command line) to install Cartography from source. | ||
|
||
ℹ️You may find it beneficial to use Python [virtualenvs](https://packaging.python.org/guides/installing-using-pip-and-virtualenv/) (or the [virutalenvwrapper](https://virtualenvwrapper.readthedocs.io/en/latest/command_ref.html#managing-environments)) so that packages installed via `pip` are easier to manage. | ||
|
||
4. **Run from source** | ||
|
||
After this finishes you should be able to run Cartography from source with `cartography --neo4j-uri <uri for your neo4j instance; usually bolt://localhost:7687>`. Any changes to the source code in `{path-where-you-want-your-source-code}/cartography` are now locally testable by running `cartography` from the command line. | ||
|
||
### Manually testing individual intel modules | ||
|
||
After completing the section above, you are now able to manually test intel modules. | ||
|
||
1. **If needed, comment out unnecessary lines** | ||
|
||
See `cartography.intel.aws._sync_one_account()`[here](https://github.com/lyft/cartography/blob/master/cartography/intel/aws/__init__.py). This function syncs different AWS objects with your Neo4j instance. Comment out the lines that you don't want to test for. | ||
|
||
For example, IAM can take a long time to ingest so if you're testing an intel module that doesn't require IAM nodes to already exist in the graph, then you can comment out all of the `iam.sync_*` lines. | ||
|
||
2. Save your changes and run `cartography` from a terminal as you normally would. | ||
|
||
### Automated testing | ||
|
||
1. **Install test requirements** | ||
|
||
`pip install -r test-requirements.txt` | ||
|
||
2. **(OPTIONAL) Setup environment variables for integration tests** | ||
|
||
The integration tests expect Neo4j to be running locally, listening on default ports, with auth disabled: | ||
|
||
To disable auth, edit your `neo4j.conf` file with `dbms.security.auth_enabled=false`. Additional details on [neo4j.com]( https://neo4j.com/docs/operations-manual/current/authentication-authorization/enable/). | ||
|
||
To run the integration tests on a specific Neo4j instance, add the following environment variable: | ||
|
||
`export "NEO4J_URL=<your_neo4j_instance_bolt_url:your_neo4j_instance_port>"` | ||
|
||
3. **Run tests using `make`** | ||
- `make test_lint` can be used to run [pre-commit](https://pre-commit.com) linting against the codebase. We use [pre-commit](https://pre-commit.com) to standardize our linting across our code-base at Lyft. | ||
- `make test_unit` can be used to run the unit test suite. | ||
|
||
⚠️ Important! The below commands will **DELETE ALL NODES** on your local Neo4j instance as part of our testing procedure. Only run any of the below commands if you are ok with this. ⚠️ | ||
|
||
- `make test_integration` can be used to run the integration test suite. | ||
For more granular testing, you can invoke `pytest` directly: | ||
- `pytest ./tests/integration/cartography/intel/aws/test_iam.py` | ||
- `pytest ./tests/integration/cartography/intel/aws/test_iam.py::test_load_groups` | ||
- `make test` can be used to run all of the above. | ||
|
||
## Implementing custom sync commands | ||
|
||
By default, cartography will try to sync every intel module included as part of the default sync. If you're not using certain intel modules you can create a custom sync script and invoke it using the cartography CLI. For example, if you're only interested in the AWS intel module you can create a sync script, `custom_sync.py`, that looks like this: | ||
|
||
```python | ||
from cartography import cli | ||
from cartography import sync | ||
from cartography.intel import aws | ||
from cartography.intel import create_indexes | ||
|
||
def build_custom_sync(): | ||
s = sync.Sync() | ||
s.add_stages([ | ||
('create-indexes', create_indexes.run), | ||
('aws', aws.start_aws_ingestion), | ||
]) | ||
return s | ||
|
||
def main(argv): | ||
return cli.CLI(build_custom_sync(), prog='cartography').main(argv) | ||
|
||
if __name__ == '__main__': | ||
import sys | ||
sys.exit(main(sys.argv[1:])) | ||
``` | ||
|
||
Which can then be invoked using `python custom_sync.py` and will have all the features of the cartography CLI while only including the intel modules you are specifically interested in using. For example: | ||
|
||
``` | ||
cartography$ python custom_sync.py | ||
INFO:cartography.sync:Starting sync with update tag '1569022981' | ||
INFO:cartography.sync:Starting sync stage 'create-indexes' | ||
INFO:cartography.intel.create_indexes:Creating indexes for cartography node types. | ||
INFO:cartography.sync:Finishing sync stage 'create-indexes' | ||
INFO:cartography.sync:Starting sync stage 'aws' | ||
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials | ||
... | ||
``` | ||
|
||
## How to write a new intel module | ||
See [here](writing-intel-modules.html). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
.. toctree:: | ||
|
||
developer-guide | ||
writing-analysis-jobs | ||
writing-intel-modules | ||
testing-with-docker |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
# Testing with docker | ||
|
||
## Using the included docker-compose support | ||
|
||
### Usage | ||
|
||
```bash | ||
docker build -t lyft/cartography | ||
docker-compose up -d | ||
docker-compose run cartography ... | ||
``` | ||
|
||
### Configuration | ||
|
||
Configuration is possible via the `.compose` directory, which is | ||
git ignored. neo4j config, logs, etc is located at `.compose/neo4j/...` | ||
|
||
Configuration for cartography itself should be passed in through | ||
environment variables, using the docker-compose format `-e VARIABLE -e VARIABLE` | ||
|
||
AWS credentials can be bind mapped in using volumes. TODO: document correct | ||
bind mount format for docker-compose run. | ||
|
||
### Notes | ||
|
||
* On initial start of the compose stack, it's necessary to | ||
change the neo4j user's password through the neo4j UI. | ||
* Neither the docker image, nor the docker-compose file define an | ||
entrypoint, so it's necessary to pass in the command being run. This | ||
also makes it possible to run a custom sync script, rather than only | ||
cartography. | ||
|
||
### Example | ||
|
||
```bash | ||
# Temporarily disable bash command history | ||
set +o history | ||
# See the cartography github configuration intel module docs | ||
export GITHUB_KEY=BASE64ENCODEDKEY | ||
# You need to set this after starting neo4j once, and resetting | ||
# the default neo4j password, which is neo4j | ||
export NEO4j_PASSWORD=... | ||
# Reenable bash command history | ||
set -o history | ||
# Start cartography dependencies | ||
docker-compose up -d | ||
# Run cartography | ||
docker-compose run -e GITHUB_KEY -e NEO4j_PASSWORD cartography cartography --github-config-env-var GITHUB_KEY --neo4j-uri bolt://neo4j:7687 --neo4j-password-env-var NEO4j_PASSWORD --neo4j-user neo4j | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,122 @@ | ||
# How to extend Cartography with Analysis Jobs | ||
|
||
## Overview | ||
In a nutshell, Analysis Jobs let you add your own customizations to Cartography by writing Neo4j queries. This helps you add powerful enhancements to your data without the need to write Python code. | ||
|
||
### The stages | ||
There are 3 stages to a cartography sync. First we create database indexes, next we ingest assets via intel modules, and finally we can run Analysis Jobs on the database (see [cartography.sync.build\_default\_sync()](https://github.com/lyft/cartography/blob/master/cartography/sync.py)). This tutorial focuses on Analysis Jobs. | ||
|
||
### How to run | ||
Each Analysis Job is a JSON file with a list of Neo4j statements which get run in order. To run Analysis Jobs, in your call to `cartography`, set the `--analysis-job-directory` parameter to the folder path of your jobs. Although the order of statements within a single job is preserved, we don't guarantee the order in which jobs are executed. | ||
|
||
## Example job: which of my EC2 instances is accessible to any host on the internet? | ||
The easiest way to learn how to write an Analysis Job is through an example. One of the Analysis Jobs that we've included by default in Cartography's source tree is [cartography/data/jobs/analysis/aws_ec2_asset_exposure.json](https://github.com/lyft/cartography/blob/master/cartography/data/jobs/analysis/aws_ec2_asset_exposure.json). This tutorial covers only the EC2 instance part of that job, but after reading this you should be able to understand the other steps in that file. | ||
|
||
### Our goal | ||
After ingesting all our AWS data, we want to explicitly mark EC2 instances that are accessible to the public internet - a useful thing to know for anyone running an internet service. If any internet-open nodes are found, the job will add an attribute `exposed_internet = True` to the node. This way we can easily query to find the assets later on and take remediation action if needed. | ||
|
||
But how do we make this determination, and how should we structure the job? | ||
|
||
### The logic in plain English | ||
We can use the following facts to tell if an EC2 instance is open to the internet: | ||
|
||
1. The EC2 instance is a member of a Security Group that has an IP Rule applied to it that allows inbound traffic from the 0.0.0.0/0 subnet. | ||
2. The EC2 instance has a network interface that is connected to a Security Group that has an IP Rule applied to it that allows inbound traffic from the 0.0.0.0/0 subnet. | ||
|
||
The graph created by Cartography's sync process already has this information for us; we just need to run a few queries to properly to mark it with `exposed_internet = True`. This example is complex but we hope that this exposes enough Neo4j concepts to help you write your own queries. | ||
|
||
### Translating the plain-English logic into Neo4j's Cypher syntax | ||
We can take the ideas above and use Cypher's declarative syntax to "sketch" out this graph path. | ||
|
||
1. _The EC2 instance is a member of a Security Group that has an IP Rule applied to it that allows inbound traffic from the 0.0.0.0/0 subnet._ | ||
|
||
In Cypher, this is | ||
|
||
``` | ||
MATCH | ||
(:IpRange{id: '0.0.0.0/0'})-[:MEMBER_OF_IP_RULE]->(:IpPermissionInbound) | ||
-[:MEMBER_OF_EC2_SECURITY_GROUP]->(group:EC2SecurityGroup) | ||
<-[:MEMBER_OF_EC2_SECURITY_GROUP]-(instance:EC2Instance) | ||
|
||
SET instance.exposed_internet = true, | ||
instance.exposed_internet_type = coalesce(instance.exposed_internet_type , []) + 'direct'; | ||
``` | ||
|
||
In the `SET` clause we add `exposed_internet = True` to the instance. We also add a field for `exposed_internet_type` to denote what type of internet exposure has occurred here. You can read the [documentation for `coalesce`](https://neo4j.com/docs/cypher-manual/current/functions/scalar/#functions-coalesce), but in English this last part says "add `direct` to the list of ways this instance is exposed to the internet". | ||
|
||
2. _The EC2 instance has a network interface that is connected to a Security Group that has an IP Rule applied to it that allows inbound traffic from the 0.0.0.0/0 subnet._ | ||
|
||
This is the same as the previous query except for the final line: | ||
|
||
``` | ||
MATCH | ||
(:IpRange{id: '0.0.0.0/0'})-[:MEMBER_OF_IP_RULE]->(:IpPermissionInbound) | ||
-[:MEMBER_OF_EC2_SECURITY_GROUP]->(group:EC2SecurityGroup) | ||
<-[:NETWORK_INTERFACE*..2]-(instance:EC2Instance) | ||
|
||
SET instance.exposed_internet = true, | ||
instance.exposed_internet_type = coalesce(instance.exposed_internet_type , []) + 'direct'; | ||
``` | ||
|
||
The `*..2` operator means "within 2 hops". We use this here as a shortcut because there are a few more relationships between NetworkInterfaces and EC2SecurityGroups that we can skip over. | ||
|
||
Finally, notice that (1) and (2) are similar enough that we can actually merge them like this: | ||
|
||
``` | ||
MATCH | ||
(:IpRange{id: '0.0.0.0/0'})-[:MEMBER_OF_IP_RULE]->(:IpPermissionInbound) | ||
-[:MEMBER_OF_EC2_SECURITY_GROUP]->(group:EC2SecurityGroup) | ||
<-[:MEMBER_OF_EC2_SECURITY_GROUP|NETWORK_INTERFACE*..2]-(instance:EC2Instance) | ||
|
||
SET instance.exposed_internet = true, | ||
instance.exposed_internet_type = coalesce(instance.exposed_internet_type , []) + 'direct'; | ||
``` | ||
|
||
Kinda neat, right? | ||
|
||
### The skeleton of an Analysis Job | ||
Now that we know what we want to do on a sync, how should we structure the Analysis Job? Here is the basic skeleton that we recommend. | ||
|
||
#### Clean up first, then update | ||
In general, the first statement(s) should be a "clean-up phase" that removes custom attributes or relationships that you may have added in a previous run. This ensures that whatever labels you add on this current run will be up to date and not stale. Next, the statements after the clean-up phase will perform the matching and attribute updates as described in the previous section. | ||
|
||
**Here's our final result:** | ||
|
||
``` | ||
{ | ||
"name": "AWS asset internet exposure", | ||
"statements": [ | ||
{ | ||
"__comment": "This is a clean-up statement to remove custom attributes", | ||
"query": "MATCH (n) | ||
WHERE n.exposed_internet IS NOT NULL | ||
AND labels(n) IN ['AutoScalingGroup', 'EC2Instance', 'LoadBalancer'] | ||
WITH n LIMIT $LIMIT_SIZE | ||
REMOVE n.exposed_internet, n.exposed_internet_type | ||
RETURN COUNT(*) as TotalCompleted", | ||
"iterative": true, | ||
"iterationsize": 1000 | ||
}, | ||
{ | ||
"__comment__": "This is our analysis logic as described in the section above", | ||
"query": MATCH (:IpRange{id: '0.0.0.0/0'})-[:MEMBER_OF_IP_RULE]->(:IpPermissionInbound) | ||
-[:MEMBER_OF_EC2_SECURITY_GROUP]->(group:EC2SecurityGroup) | ||
<-[:MEMBER_OF_EC2_SECURITY_GROUP|NETWORK_INTERFACE*..2]-(instance:EC2Instance) | ||
|
||
SET instance.exposed_internet = true, | ||
instance.exposed_internet_type = coalesce(instance.exposed_internet_type , []) + 'direct';, | ||
"iterative": true, | ||
"iterationsize": 100 | ||
} | ||
] | ||
} | ||
``` | ||
|
||
Setting a statement as `iterative: true` means that we will run this query on `#{iterationsize}` entries at a time. This can be helpful for queries that return large numbers of records so that Neo4j doesn't get too angry. | ||
|
||
Now we can enjoy the fruits of our labor and query for internet exposure: | ||
|
||
![internet-exposure-query](../images/exposed-internet.png) | ||
|
||
## Recap | ||
As shown, you create an Analysis Job by putting together a bunch of `statements` together (which are essentially Neo4j queries). In general, each job should first clean up the custom attributes added by a previous run, and then it can perform the match and update steps to add the custom attributes back again. This ensures that your data is up to date. |
Oops, something went wrong.