Python-Google-Cloud-Vision

Python for Google Cloud Vision OCR for Image Folder Organizer

Before Start

Setting Up Google Cloud: https://cloud.google.com/vision/docs/setup
Setting Up Google Cloud Vision API: https://cloud.google.com/vision/docs/labels
Setting Up sklearn: https://scikit-learn.org/stable/install.html
Setting Up PyDictionary: https://pypi.org/project/PyDictionary/
Setting Up nltk (For Lemmatization): https://www.nltk.org/

Entity Annotation Image Response JSON

This is the response JSON after sending request to Google cloud for labeling the picture. It seems to have problem with the score and topicality (having same value). (https://issuetracker.google.com/issues/117855698?pli=1)

{
  "responses": [
    {
      "labelAnnotations": [
        {
          "mid": "/m/0199g",
          "description": "Bicycle",
          "score": 0.96705616,
          "topicality": 0.96705616
        },
        {
          "mid": "/m/0h9mv",
          "description": "Tire",
          "score": 0.9641615,
          "topicality": 0.9641615
        }
      ]
    }
  ]
}

Folders under Resources

tmp: Folder that contains the pictures before clustering
results: Folder that contains the pictures of result
pictures: Folder that contains the pictures after clustering
csv: Folder that contains the csv files of dataframe used in the project

Files

FolderCreater.py: Creating the folder with appropriate cluster and move the image according to the cluster result
KMeanClustering.py: Using the Sklearn Kmean clustering library, it creates the 5 cluster of images.
Lemmatization.py: Using the nltk WorkNetLemmatizer, it preprocesses (lemmatize) the word. (eg. computer, computing, computerize -> compute)
Main.py: Create the dataframe using the labels from the Google Vision API
OCR.py: Opens up the connection with Google Cloud and process Google Vision API labeling

How It Works

In this project, I mainly used the Google Cloud Vision API to extract the labels of each pictures. The response JSON format is describe above and more detail can be found in https://cloud.google.com/vision/docs/reference/rest/v1/AnnotateImageResponse#EntityAnnotation.

Then, I mainly used

label.describe

label.topicality

to get the appropriate label information and topicality score of the label to the picture. Due to timing issue, I only used top 3 topicaity scored labels when creating dataframe (more in issues).

Using the labels from the Google Vision API, I used PyDictionary which is python library that provides the definition of the word to create the information document of the image.

Using the definition document, I proecess the TF-IDF and lemmatization to creat vector score of each image. Then, using the vector score, I process K mean alogrithm to create the clusters of images.

Information about PyDictionary can be found in: https://pypi.org/project/PyDictionary/

Information about nltk lemmatization can be found in: https://www.nltk.org/_modules/nltk/stem/wordnet.html

Information about kmean sklearn can be found in: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

Information about tfidf skelarn can be found in: https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html

The result dataframe before clustering:

The result dataframe after clustering:

You can find the csv file under resources/csv

Result

Before Clustering:

After Clustering:

Code Proecssing Time:

Issues & Future Development Plan

Right now, the name of the folders where label with cluster number. In future, I will try to extact the main concept words from the cluster and use it as the name of the folder.

Issue that the project currently has is the timing issue. Eventhough the code proecess time is relatively low, (~2 total secs), the amount of time for result is about 3-5minutes for 12 pictures. So, the time for clustering will increase when the number of picture also increase

Another issue is Google Clound Vision API that I am using is free trial version. It mean, I cannot use the project after August. Moroever, since it is free trial version, there is limited number of picture that we can process to get labeling. Pricing can be found in (https://cloud.google.com/vision/pricing).

In the later version, I will delete the PyDictionary and will only use the labels and topicality to create clustering. (maybe useing cosine similairty rules) https://en.wikipedia.org/wiki/Cosine_similarity

Credential

Document Clustering using Sklearn: https://romg2.github.io/mlguide/03_%EB%A8%B8%EC%8B%A0%EB%9F%AC%EB%8B%9D-%EC%99%84%EB%B2%BD%EA%B0%80%EC%9D%B4%EB%93%9C-08.-%ED%85%8D%EC%8A%A4%ED%8A%B8%EB%B6%84%EC%84%9D-%EB%AC%B8%EC%84%9C-%EA%B5%B0%EC%A7%91%ED%99%94/
Lemmatization: https://en.wikipedia.org/wiki/Lemmatisation
TF-IDF Example: https://en.wikipedia.org/wiki/Tf%E2%80%93idf

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
resources		resources
src		src
.gitignore		.gitignore
FolderCreater.py		FolderCreater.py
KMeanClustering.py		KMeanClustering.py
Lemmatization.py		Lemmatization.py
OCR.py		OCR.py
README.md		README.md
TfidfVectorizer.py		TfidfVectorizer.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python-Google-Cloud-Vision

Before Start

Entity Annotation Image Response JSON

Folders under Resources

Files

How It Works

Result

Before Clustering:

After Clustering:

Code Proecssing Time:

Issues & Future Development Plan

Credential

About

Releases

Packages

Languages

jhkwag970/Python-OCR-Picture-Organizer

Folders and files

Latest commit

History

Repository files navigation

Python-Google-Cloud-Vision

Before Start

Entity Annotation Image Response JSON

Folders under Resources

Files

How It Works

Result

Before Clustering:

After Clustering:

Code Proecssing Time:

Issues & Future Development Plan

Credential

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages