NOTE: This is still work in progress. Please come back in a few days!
This proof of concept implements the following streaming architecture:
The architecture above implements an app that does the following:
- Ingests events from two sources:
server
andclient
. - Translates the user IDs in all incoming events into user names
- Delivers all
client
events to a table calledpolku_poc.client
in Redshift. - Delivers all
server
events to a table calledpolku_poc.server
in Redshift. - Logs a warning when a user is named for the first time.
- Forwards all warning log events to Slack.
You should have configured the AWS CLI with a profile that has enough rights to deploy all the components of this infrastructure environment. You will also need a S3 bucket where that same CLI profile has read and write access.
This PoC uses the Python3.6 runtime in AWS Lambda. That means that to be able to run the tests below you will need to install that Python version on your local machine. I recommend you use pyenv to manage and install multiple Python versions on your system. With pyenv installing the required Python version is as easy as:
pyenv install 3.6.0b2
make install
. .env/bin/activate
humilis configure --local
The following environment variables are needed to deploy all the feature of this PoC:
variable | description |
---|---|
HUMILIS_BUCKET | A S3 bucket for deployment artifacts |
HUMILIS_AWS_REGION | The AWS region, e.g. eu-west-1 |
REDSHIFT_HOST | The hostname of your Redshift cluster master node |
REDSHIFT_PORT | The port where the Redshift master node is listening |
REDSHIFT_DB | The name of the Redshift database |
REDSHIFT_USER | The Redshift username |
REDSHIFT_PWD | The Redshift password |
SENTRY_DSN | The [Sentry][sentry] DSN |
SLACK_TOKEN | The token to access Slack's web API |
SLACK_CHANNEL | The name of the channel where messages will be posted |
Note that you will need to manually create the HUMILIS_BUCKET
S3 bucket before attempting to deploy this Polku PoC.
I have extracted the most important deployment parameters into a [parameters.yaml.j2][./parameters.yaml.j2] file. A brief explanation of the purpose of each parameter can be found in the comments embedded in the parameters file. You can edit the deployment parameters as you see fit. Then:
polkupoc --stage DEV appy
The command above will deploy to a stage named DEV
. You can have multiple parallel (identical) deployments by using a different deployment stage.
Once the deployment has completed you will find the deployment outputs (things such as the name of the S3 bucket where events are delivered) in a file called polkupoc-[STAGE]-outputs.yaml
.
Unit Test:
make test
Integration Test:
make testi
There is one last step you need to take to have a fully functional app. You need to create the target tables in Redshift so that Firehose can deliver the relevant events to them. You do that by editing the models in polku_poc/models/polkupoc.py and then using Alembic to generate a migration script for you:
polkupoc --stage DEV alembic -- revision --autogenerate
Check that the migration script is correct, then apply the migration:
polkupoc --stage DEV alembic -- upgrade head
If you have questions, bug reports, suggestions, etc. please create an issue on the GitHub project page.