Reading Captchas – Putting everything together

Hello my lovelies!

Finally! We now have all the puzzle pieces to put together a function which can read a captcha and tell us the four letters!

Saving the model

Something that we did not do in the last post was saving the model. Of course, if we want to reuse the model again and again, we need to save it somewhere. Also, we want to keep the model weights.

# Saving the model 
#architecture of model, weights of the model, training configuration, state of the optimizer
from keras.models import load_model
model.save('captcha_model.h5')

#forget model
del model

Creating a captcha reading function

For this step it makes sense to open up a new script. Even though we deleted the model in the last step.

Import the tools

This also means that we have to import the necessary tools again.

import cv2
import pickle
import numpy as np
from imutils import paths
import os
#Sample size with resample from scikit learn
from sklearn.utils import resample

from keras.models import Sequential
from keras.layers import Dense
from keras.models import model_from_json
import numpy as np
import os
import os.path
import cv2
import glob
import pickle
from imutils import paths
from sklearn import preprocessing

Load the model

Firstly we create a function which load the previously saved model, so we can then use it later in our captcha reading function. If we store the model this way and call it this way again, we can use it right away. There are other ways to do this. Go to the Keras FAQ page if you are interested to learn more.

def loadModel(captcha_model):
    from keras.models import load_model

    # returns a compiled model
    # identical to the previous one
    model = load_model(captcha_model)
    return model

Label conversion

When we created the labels to pass on to Keras we did this in two steps:

Firstly, we transformed the letters into numerical values from 1 to 26.

Secondly, we did the OneHotEncoding for these numerical values.

This means two things: we need to translate the numerical values back to letters and the OneHotEndocing might make things difficult for us. We will see during the process how hard it actually is.

For now, I am defining this function for the conversion.

def convertLabels(labelsArr):
    alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
               'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u',
               'v', 'w', 'x', 'y', 'z']

    ##Convert the labels to numbers
    dic = dict(zip(alphabet, list(range(1,len(alphabet)+1))))
    dicRev = dict(zip(list(range(1,len(alphabet)+1)),alphabet))
    letter_labels = [dicRev[int(x)] for x in labelsArr]
    return letter_labels

Read actual Captchas

First I define all the variables I will need over and over again. For example the path to the Keras model.

#set the path for examplatory captchas to test
captcha_ex = r'...\Einfach\0.1039887523084.jpg'
captcha_ex_1 = r'...\Einfach\0.1500734389565.jpg'
captcha_model = r'...\captcha_model.h5'

Now on to the core function…

def readCaptchaFromFile(image_file, captcha_model):
    #Load the model for the captchas
    model = loadModel(captcha_model)
    #Load the image and convert it to greyscale  
    im = cv2.imread(image_file)
    gray = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)

    #Add some extra padding around the image
    gray = cv2.copyMakeBorder(gray, 8, 8, 8, 8, cv2.BORDER_REPLICATE)

    letters = []
    #convert the image to only black and white (using the threshhold)
    ret, thresh = cv2.threshold(gray, 90, 200, cv2.THRESH_BINARY_INV)
    ret, thresh2 = cv2.threshold(thresh, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)
    thresh3 = cv2.threshold(thresh2, 0, 255, cv2.THRESH_BINARY_INV)[1]

    #find the contours in the image
    contours, hierarchy = cv2.findContours(thresh3.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)

    letter_image_regions = []
    counts={}

    for contour in contours:        
        x, y, w, h = cv2.boundingRect(contour)

        if w / h > 2:
            # This contour is too wide to be a single letter
            # Split it in half into two letter regions
            half_width = int(w / 2)
            letter_image_regions.append((x, y, half_width, h))
            letter_image_regions.append((x + half_width, y, half_width, h))
            #print(letter_image_regions)
        else:
            # This is a normal letter by itself
            letter_image_regions.append((x, y, w, h))
    if len(letter_image_regions) >= 4:
        letter_image_regions = sorted(letter_image_regions, key=lambda x: x[0])
        # Store each letter as a single image
        for letter_bounding_box in letter_image_regions:
            x, y, w, h = letter_bounding_box
            letter_image = thresh2[y - 2:y + h + 2, x - 2:x + w + 2]
            #For the network the pictures need to have all the same size, so we will add a padding to make them all 60x60 pixels
            letter_image = cv2.resize(letter_image, dsize=(60, 60), interpolation=cv2.INTER_CUBIC)
            # Add a dimension to the image to process it in Keras later
            letter_image = np.expand_dims(letter_image, axis=2)          
            #append the letters
            letters.append(letter_image)
        #predict the classes with the previously loaded model
        captcha_letters = model.predict_classes(np.array(letters))
        #convert numerical classes to letter
        print(captcha_letters)
        print(convertLabels(captcha_letters))
    #print the image, so the user can check
    cv2.namedWindow("Display window", cv2.WINDOW_AUTOSIZE)
    cv2.imshow( "Display window", im)
    cv2.waitKey(0)

Something is off

As previously mentioned, the label conversion might be tricky. And in fact, it is not that easy. For one test captcha I go the following results.

Clearly, the label conversion is off. We remember, that we did not have all the 26 letters of the alphabt in our captcha dataset. My guess is, that the OneHotEncoding got the letters scrambled due to this fact.

Label correction

I have been looking at the problem now for about an hour and I could not really find the mistake in the OneHotEncoding. Instead I think I did a logical error when I created the Label function. For Python all the numbers start with 0, so when keras gives out the letter “a” it would give us 0 as a result. Or at least this is what I have seen so far, so I changed the label function to this:

def convertLabels(labelsArr):
    alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
               'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u',
               'v', 'w', 'x', 'y', 'z']

    ##Convert the labels to numbers
    dic = dict(zip(alphabet, list(range(1,len(alphabet)+1))))
    dicRev = dict(zip(list(range(1,len(alphabet)+1)),alphabet))
    letter_labels = [dicRev[int(x + 1)] for x in labelsArr]
    return letter_labels

But still most of the captchas return the wrong four letters… I am going one level deeper now and create a read function for the single letters which we used to train the data and check out, whether my model is really as good as I thought… yes, I could have done this earlier but I let myself blind a bit by the nice charts we created in the last post. This is what I am using for the check:

#Test the model and the labels on the single letters
letter_folder = r'...\Einfach_letters\b'
#creates a list with the path for each picture
image_files = glob.glob(os.path.join(letter_folder, '*.png'))

def readLetterFromFile(image_file, captcha_model):
    #Load the model for the captchas
    model = loadModel(captcha_model)
    #Load the image and convert it to greyscale  
    im = cv2.imread(image_file)
    gray = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)

    #Add some extra padding around the image
    gray = cv2.copyMakeBorder(gray, 8, 8, 8, 8, cv2.BORDER_REPLICATE)

    letters = []
    #convert the image to only black and white (using the threshhold)
    ret, thresh = cv2.threshold(gray, 90, 200, cv2.THRESH_BINARY_INV)
    ret, thresh2 = cv2.threshold(thresh, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)
    thresh3 = cv2.threshold(thresh2, 0, 255, cv2.THRESH_BINARY_INV)[1]

    letter_image = thresh2
    letter_image = cv2.resize(letter_image, dsize=(60, 60), interpolation=cv2.INTER_CUBIC)

    #cv2.namedWindow("Display window", cv2.WINDOW_AUTOSIZE)
    #cv2.imshow( "Display window", letter_image)
    #cv2.waitKey(0)
    # Add a dimension to the image to process it in Keras later
    letter_image = np.expand_dims(letter_image, axis=2)          
    #append the letters
    letters.append(letter_image)
    #predict the classes with the previously loaded model
    captcha_letters = model.predict_classes(np.array(letters))
    #convert numerical classes to letter
    print(captcha_letters)
    print(convertLabels(captcha_letters))
    #print the image, so the user can check
    #print("show image")
    cv2.namedWindow("Display window", cv2.WINDOW_AUTOSIZE)
    cv2.imshow( "Display window", im)
    cv2.waitKey(0)

I checked it for the letter “a” and the letter “b” and the performance is everything else than overwhelming. Somehow Keras favourite letter is “q”.

I was thinking whether I did not have enough training data. I only had 110 imags per letter to train the network. Maybe it would make a difference if I took 500 per letter?

I will go back now to the step where we trained the network and will report back 🙂

Retrained network

A little while passed by and I have to say, making the dataset bigger did not really help that much. Right now I feel like the network did not learn anything. But at the same time the figures for accuracy and loss look so good… so I am a bit confused here. If you see my mistake, please let me know down below in the comments.

For the sake of it, I will just finish this post, take a break and then maybe figure out why the network does not read captchas as good as it should.

Clean up your code

Last but not least, if you have finished your project (even if it is in this state) go through the code and clean it up. Meaning delete all the outcommented lines you do not need anymore, check your comments, do they make sense? And so on. You should probably do this on the fly. However, if I have time I like to go through it one more time. Unfortunately, I am not a really good role model in cleaning up my code and often “forget” this part due to the time pressure.

Finishing words

Even after this not soooo succesful end, I hope you did learn something and were able to see, that not everything works out as planned. So be patient with yourself, take a deep breath and just look at the problem again 🙂

Thank you for reading!

Best, Blondie

Leave a Reply

Your e-mail address will not be published. Required fields are marked *