Model was increased.
Test accuracy: 87.5%.
Production accuracy: 82%
There's a certain bias in data the model was trained, validated and tested on. Labels were done by humans so captchas humans weren't able to recognize are not in the dataset. Therefore model will always have some limitations.