Skip to content

blitzapurv/Smart-Product-Review-Classification-tool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sentiment-Analysis-Bucketing

Categorizing Product reviews in appropriate categories.

This solution enables us to classify the raw reviews into categories according to a pre-decided bucketing scheme.

The process is broadly divided into two steps:

Raw Reviews

⇓ 1

Entity Sentiment Table

⇓ 2

Final Output with Categories and Sub-Categories

  • Step 1: Making Use of GCP Natural Language API for Entity Sentiment Analysis and Sentiment Analysis.

    • Entity Sentiment Analysis gets entities and their corresponding salience score, sentiment score, and sentiment magnitude.
    • Sentiment Analysis gets the overall sentiment score of the reviews.
    • Uploading the resultant data to BigQuery.
  • Step 2: Determining the respective categories of each review.

    • Firstly, a general Level 1 dictionary is used to map the entities directly to their Level 1 Categories.
    • Second, a more general Level 2 dictionary is used along with the GloVe representation of words (100 dim). Closeness to cluster centroid of each category is used to determine the Level 1 category for remaining reviews.
    • A threshold is fixed based on the within-category deviation of distances of words in each cluster. The threshold metric can be min distance, max distance, standard deviation, or mean absolute deviation.
    • Next, for the reviews where no entities were detected, POS tagging is used to extract nouns, adjectives, and verbs from the review. A combined vector representation of the extracted words is used to determine the Category by again measuring the closeness.
    • Finally, Upload the resultant data to BigQuery Table.

Fields in Raw Reviews,

  • Review_ID: Unique Review ID
  • Comment: Review
  • View: Title / Description
  • Date: Date of Review
  • Platform: Platform, ex: Amazon, Flipkart, etc.
  • Product_Category: Type of Product, ex: fan, grinder, sewing machine, etc.

Fields in Sentiment Analysis Table,

  • Review_ID: Unique Review ID
  • Reviews: Review
  • Overall_Sentiment_Score: Overall sentiment score of review
  • Overall_Sentiment_Magnitude: Overall sentiment magnitude
  • View: Title / Description
  • Product_Category: Type of Product
  • Entity: Entity
  • Salience: Salience Score corresponding to entity
  • Sentiment_Score: Sentiment Score corresponding to entity
  • Sentiment_Magnitude: Sentiment Magnitude corresponding to entity

Fields in Final Output Table, (Individual tables are created for Each product type in Product_Category)

  • Review_ID: Unique Review ID
  • Reviews: Review
  • Overall_Sentiment_Score: Overall sentiment score of review
  • Overall_Sentiment_Magnitude: Overall sentiment magnitude
  • View: Title / Description
  • Product_Category: Type of Product
  • Entity: Entity
  • Salience: Salience Score corresponding to entity
  • Sentiment_Score: Sentiment Score corresponding to entity
  • Sentiment_Magnitude: Sentiment Magnitude corresponding to entity
  • Category: Level 1 Category estimated by the process.

This process makes us of a configuration file, a google sheet, sentiment_params.gsheet which is accessed and modified using Google Sheets API. link to g-sheet

  • Task ID: Unique task ID.
  • Client: Client / Project Name.
  • Flag: 1 to run, 0 to not run for a give Task ID.
  • File Type: File type of Mapping Sheet file.
  • Raw Reviews Key File: Credential Json File to access the project.
  • Raw Reviews Input: Raw Review table path.
  • Sentiment Analysis Key File: Credential Json File to access the project.
  • Sentiment Analysis Project ID: Project ID.
  • Sentiment Analysis Dataset: Dataset.
  • Sentiment Analysis Table: Table name.
  • NL API Key File: Key file to use for NL API.
  • Mapping Sheet: Mapping sheet ID.
  • Output Key File: Credential Json File to access the project.
  • Output Project ID: Project ID.
  • Output Dataset: Dataset.
  • Output Table Suffix: Output Table Suffix.
  • From Email: In case there is an error in uploading the files, an email is sent from this email id.
  • From Email Password: Password for From_Email.
  • To Email: In case there is an error in uploading the files, an email is sent to this email id.

About

Categorizing Product reviews in appropriate categories.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages