This repo contains the individual services of an automated data collection tool.
- car_feed is configurable with only 6 parameters and a mapping
- car_feed can parse a car data item from a website in parrallel.
- car_feed exposes an iterator over a rest api
- car_feed can store cars in a cache on command.
- car_feed can pick up where it last left off when hydrating cache
- car_feed can be focused on a different car make/model
- car_learning can supply dataframes of cars over rest api
- car_feed to supply raw car objects
- car_mapper seperate microservice to handle translating raw source to car object
- Decompose html source to a dataframe of nodes and meta info.
- Consume html df into a neural network to predict which node contains the target field
To run development of docker-compose on an ec2 instance:
-
run the following to make an instance
docker-machine create --driver amazonec2 --amazonec2-instance-type t2.medium --amazonec2-region us-west-1 car
-
Ensure that it is running
docker-machine ls
-
then do
docker-machine ssh car git clone https://github.com/rorymcstay/feed_platform.git eval $(docker-machine env car)
-
To get IP details
docker-machine inspect car
-
To stop the running instance do
docker-machine kill docker-machine rm
Configuration is stored in the directory cd /Users/rorymcstay/.docker/machine/machines/
ssh -i "id_rsa" [email protected]
These commands were taken from the docker documentation