Skip to content

Use reviews to calculate similarity between Airbnb listings

Notifications You must be signed in to change notification settings

bdferris642/airbnb_insight

Repository files navigation

I built Yourbnb, an NLP-powered web-app that mines Airbnb reviews to find listings that align with user preferences.

Web address: www.yourbnb.xyz Link to google slides presentation: https://bit.ly/2qcMbVI

Approach: (1) Train a Latent Dirichlet Allocation (LDA) topic model on Airbnb reviews from listings in NYC.
(2) For each listing, get a vector represeting the ratio with which each topic was discussed in the listings' reviews.
Save the Airbnb listing data and LDA model as pickles in the directory: 'airbnb_insight/Flask/MVP/'
(3) Validate the model by checking that the topics discussed within listings are more similar than those between listings.
(4) Deploy a Flask web app (hosted on AWS EC2) that takes a user's input, applies the same LDA model to it, and ranks listings by their similarity to the topics discussed by the user.

Code for steps (1) - (3) are included in the jupyter notebook 'airbnb_insight/Topic Analysis.ipynb'
Code for the Flask app is in: 'airbnb_insight/Flask/MVP/'
• Python code for the app is: 'airbnb_insight/Flask/MVP/views.py'
• Helper functions for this app located in: 'airbnb_insight/Flask/MVP/a_Model.py'
• Input and Output pages: 'airbnb_insight/Flask/MVP/templates/input.html' and 'airbnb_insight/Flask/MVP/templates/output.html'

Airbnb listing and review data were downloaded from http://insideairbnb.com/get-the-data.html
Reviews: reviews.csv.gz under the 'New York City, New York, United States' heading
Listings: listings.csv.gz under the 'New York City, New York, United States' heading

This app has not yet been optimized for running on someone else's machine

About

Use reviews to calculate similarity between Airbnb listings

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published