GitHub - vatsalgupta13/PyTesseract-assisted-OCR-of-Medical-Documents: This project makes use of cv2 and pytesseract to enable users to perform OCR on any given document. There are three separate py files. One is for pre-processing images to make it fit for the second python file: handwritten prescription.py. The third is an independent py file for extracting text from printed documents.

vatsalgupta13 / PyTesseract-assisted-OCR-of-Medical-Documents Public

Notifications You must be signed in to change notification settings
Fork 1
Star 1

This project makes use of cv2 and pytesseract to enable users to perform OCR on any given document. There are three separate py files. One is for pre-processing images to make it fit for the second python file: handwritten prescription.py. The third is an independent py file for extracting text from printed documents.

1 star 1 fork Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
FromDocument.py		FromDocument.py
Handwritten Prescription.py		Handwritten Prescription.py
Image_preprocessing.py		Image_preprocessing.py
exampledoc.docx		exampledoc.docx
image1.jpeg		image1.jpeg
image2.jpeg		image2.jpeg
image3.jpeg		image3.jpeg
image4.jpeg		image4.jpeg
medicalreport.pdf		medicalreport.pdf
ocr.py		ocr.py
readme.txt		readme.txt
test-600.png		test-600.png
trialimg.jpg		trialimg.jpg

Repository files navigation

Assuming you have a windows system:

1) Donwload anaconda with python 3.8 from https://www.anaconda.com/products/individual.
Anaconda will install the latest version of spyder and jupyter notebook.
2) Next open spyder separately to see its working. Use a simple print('Hello World') command to see if its working.
3) Open anaconda prompt (search using search bar) in windows
4) Install all the required packages using the pip installer (anaconda comes with pip installer so no issues)
5) use pip install $package_name$ to install all the required modules:
pip install pandas
pip install numpy
pip install opencv-python
pip install os-sys
pip install Pillow==2.2.2
pip install jsonlib
pip install apyio
pip install matplotlib
pip install seaborn
pip install requests
pip install python-docx

* Some of these modules might already be preinstalled - None the less run all these commands to make sure you have everything you need.

6) Install tesseract using windows installer available at: https://github.com/UB-Mannheim/tesseract/wiki. Install for all users
Note the default tesseract installation path:
Default installation path will most likely be : 'C:\Program Files\Tesseract-OCR\tesseract.exe' or 'C:\Users\USER\AppData\Local\Tesseract-OCR'
It may change so please check the installation path.
7) pip install pytesseract in anaconda prompt
8) Close and reopen spyder
9) In ocr.py - in 7th line: if the path of your tesseract.exe is different then change it.
I have set the path as 'C:\Program Files\Tesseract-OCR\tesseract.exe' since that's the installation path for my
It should
10) In spyder, set your working directory as the 'Optical Recognition' folder
11) Run the code, if it says that a particular module is not installed, then go to anaconda prompt and install that module and restart spyder
12) Note that everytime you start spyder, you need to set your working directory.
13) The result will be written to a file called "result.txt"
14) Kindly check the result for "image4" and "trialimg".
15) You can do so by replacing the file name in the code.