The following project was created as part of Joe Brady's capstone for his Bachelor's Degree in Computer Science at Western Governor's University (WGU).
Car Sales Price Prediction Engine is a web application that will help a fictional car sales company set prices for the used vehicles they have acquired and wish to sell. The web application uses machine learning and a predictive algorithm to estimate the sales price of a given car based on the mileage. This includes descriptive methods (graphs) to help the client understand the data and a predictive method (a linear regression graph) to help them make educated price-setting decisions.
The goal of this project and application is to decrease the time employees spend manually estimating appropriate prices, and, ultimately, help the company increase revenue. Historically, this fictional company has struggled to set appropriate prices for the cars they sell, leading to a loss in revenue. This project will help them to understand the market value of the cars they acquire so they can set appropriate prices to maximize their profit.
The project has the following three descriptive methods (graphs):
- A scatter plot comparing the mileage to the sales price of each car.
- A bar graph that shows the number of cars sold at various price ranges.
- A pie chart showing the percent of cars sold at the price ranges used in the bar graph
The project has one prescriptive method, which is a scatter plot with a regression line. This regression line will be generated using machine learning and is what helps the employees determine the estimate for the sales price based on the mileage.
This project is a web application written with the following technologies:
- JavaScript/TypeScript
- Node.js v16.13.2 LTS
- TensorFlow.js (_Node.js library used for machine learning
- React.js (JavaScript library for building user interfaces)
- Next.js (Full-stack React.js framework)
- Tailwind CSS (Utility-first CSS framework)
- Chart.js (JavaScript charting library)
The environment used and target platform for this application is Microsoft Windows.
The storage format that is used is CSV, which contains all the data points. The CSV is read into the system, cleaned, and unit conversions are done, as necessary. This dataset was found on Kaggle.com at the following url:
https://www.kaggle.com/datasets/nehalbirla/vehicle-dataset-from-cardekho
Before you can install and run the web application, you must first install the following:
- Node.js 16
- Yarn 1
Install Node by downloading and running the Node.js 16 LTS installer from this website: https://nodejs.org/en/
Once installed, confirm that the correct version of Node is installed by running the following command in a terminal/command prompt window:
node --version
After verifying that Node.js 16 is installed correctly, run the following command to install Yarn:
npm install --global yarn
Verify that the correct version of Yarn is installed by running the following command:
yarn --version
After verifying that Node.js and Yarn are installed, you are ready to install the application. In a terminal window, navigate to a folder where you'd like to install the application and clone this repository:
git clone https://github.com/JoebradyDev/jb-capstone.git
Once the repository is cloned, using a terminal window, navigate to the directory where the application was downloaded:
cd jb-capstone
In the application root directory, install the application's dependencies using yarn, using this command:
yarn install
Once dependencies have been installed, you are ready to run the application. Run the following command to launch the application server:
yarn dev
Known Bug: Note that you cannot use "yarn start" or "npm start" to start the application. This will result in the charts not loading, so you must use the "yarn dev" command as mentioned above.
After the server has been started, you can use a browser navigate to the following location and view the application:
After starting the server and navigating to http://localhost:3000, you will be brought to a login page. Please enter the following credentials to log into the application:
- Username: "john.doe"
- Password: "wgu1234$"
If you need to log out of the application, you can click the "Logout" button at the top right of the screen. Note that you will also be automatically logged out after an hour of inactivity.
When you log in, you are immediately brought to the Dashboard page. This has a welcome message and a description of each page of the application.
On the left side of the screen there is a sidebar with links you can use to navigate to each page of the application.
The sidebar contains the following items:
- Dashboard (Descriptions of/links to each page)
- Mileage vs. Price (Scatter plot of the data)
- Cars Sold by Price (Bar graph of cars sold by price)
- Percent Sold by Price (Pie chart of cars sold by price)
- Price Prediction (Regression line chart/scatter plot)
After clicking any of the chart pages other than the Dashboard, you are brought to each respective chart page. Note that, at this point, the chart is not loaded.
For each of the charts, there are a default set of chart actions (at the bottom right of each chart), which include:
- Clear Data (Removes data from the chart)
- Load Data (Loads relevant data into the chart)
After clicking the "Load Data" button, each chart will be filled with data. Note that each chart is color-coded and labeled with the appropriate data.
For the "Price Prediction" chart, there is one additional action:
-
Run Prediction (Runs the machine learning algorithm and loads into chart)
Note that the machine learning algorithm takes some time to run.
The "Percent Sold by Price" pie chart has a bonus feature that allows the user to select price ranges and recalculate the percentages in real time.
Price ranges are selected by clicking them in the legend at the top of the pie chart. These will cross each price range off the legend and remove it from the chart. To add the price range back, simply click the crossed off item and it will become uncrossed and reappear in the chart.
This is how the chart looks when the user selects to only look at cars sold with a price range between $0 and $8,000.
The "Mileage vs. Price" and "Price Prediction" scatter plot charts show tooltips when you hover over each data point.
If there are multiple data points of cars at the same mileage and price, these are repeated in the tooltip
For the "Price Prediction" chart, the user can toggle whether to display the data points or the regression line.
Data points are removed by clicking the "Actual" or "Predicted" items in the legend. Clicking the item in the legend once will check it off and remove the regression or data points from the chart. Clicking again will bring them back.
This is how the "Price Prediction" chart looks when the "Actual" data points have been hidden.
- This application can only be run in development mode due to a bug involving the integration of Chart.js 3 with Next.js.
- It would be more appropriate to use logarithmic regression for this dataset, but linear regression was used because it is simpler and was sufficient for the project requirements.