DESCRITION
A simple captcha breaker. It has been tested with only one type of numeric captcha image (found in the "Resources > in" folder. It extracts a 4-digits long string from an image using Pytesseract.
INSTALLATION
- Download code.
- Create a virtual environment and activate it.
pip install -r requirements.txt
USAGE
- place captcha image in anyformat accepted by Pillow in "Resources > in".
- In the root folder, open "captcha_solver.py" and change the "file_name" to the name of the captcha image uploaded in step 1.
- Run it!
- The function returns a 4-digit long string.
OTHER SETTINGS
-
captcha_solver.py
avg_deviation
: in order to reach absolute contrast, each pixel is compared with the average brightness of the whole image. If it's lower or equal than the sum of average and the user-defined avg_deviation, then the pixel is converted to pure black(pixel value of zero). Otherwise, it's converted to pure white (pixel value of 255). This setting is increase by 1 for each unsuccessful iteration of the captcha breaker, in case the returned value does not match the length ofcaptcha_text_length
(see below).The avg_deviation also dictates how many iterations the captcha breaker should attempt before giving up.u_filter
: short for scipy's "uniform filter", which compares each pixel to its direct neighbours and converts the pixel to the brightness of the majority of pixels that surround it. The u_filter value specifies how many pixels it should compare the current pixel with.captcha_text_length
: the number of digits in the captcha.
-
Logger >init.py, line 6:
logger.setLevel(logging.WARN)
- This sets the minimum log value that is printed onto the terminal when the code is run. To see all logs, change this to
logger.setLevel(logging.INFO)
.
- This sets the minimum log value that is printed onto the terminal when the code is run. To see all logs, change this to
FURTHER DEVELOPMENT
- Instead of only
avg_deviation
being changed in each iteration, ideally it would cover every possible permutation ofavg_deviation
andu_filter
until a solution is found.
- This code rellies on Pytesseract to work. Ideally it would use its own trained model instead.
LICENSE