- What is the distribution of value for a particular
make
,model
,trim
,history
andmiles driven
? - How long does it take for a great car at a great price to be sold by a dealer?
- Given a particular
make
,model
, andtrim
, can you alert a particular user when a car enters the market at the price point that they want? - More questions pending.
You can run the data scraping aspect of this project from your local Windows computer for a particular make and model here.
Windows:
- Check that you have python version 3.11 or greater (for windows: type "
py
") and gh in commandline (for windows: type "gh --version
") - From the command line, change directory to the location you want to include the repo:
cd /PATH/TO/REPO/
- Enter: "
gh repo clone vcavanna/bs_linkedin_src
" - From the root directory of the project, initialize the virtual environment:
py -m venv venv
- Activate the virtual environment:
.\venv\Scripts\activate
- If using Visual Studio Code: access the command palette with
ctrl + shift + p
, then type: "python: select interpreter
" - Visual Studio Code: Select the venv directory located in the root directory ("
.\venv\Scripts\python.exe
") py -m pip install bs4, requests
cd edmunds_etl
py "edmunds_scraper.py"
- You should see a csv file appear in the root directory of the project with all of the used cars in the edmunds database for the make and model.
- At this point, the workflow would continue with the
upload_to_s3.py
script. However, for that you would need AWS credentials.
To understand how to contribute, please read the CONTRIBUTE page.
- You might not need to. One to-do list item on this project is to dissociate Redshift database calls from the rest of the project (i.e. have an interface that has a redshift implementation and a local database implementation). So keep an eye out for contributing in that way.
- If you have to contribute to AWS, contact me. I'll set up a single sign on MFA account so you can add to Redshift and S3 aspects of the project.
- See the tutorials and guides section below for more on what has been helpful to learn Redshift, S3, lambda, etc.