Let's Learn a Little About Computer Vision via Sudoku!

Written by Brian Roepke | Dec 14, 2024 6:00:00 PM

Introduction

This whole thing started out as a fun little experiment to write another puzzle solver, similar to the Wordle solver I wrote about recently. Sudoku is a perfect computer solvable problem. It's a simple iterative approach to finding uniqueness. There are probably thousands of examples out there, so while I will touch on how I ended up solving the puzzle, I want to focus on a Machine Learning (ML) and Artificial Intelligence (AI) approach to the game that I took. I thought, let's add Computer Vision (CV) and Optical Character Recognition (OCR) into the mix where you can upload an image of the puzzle, the machine will read that and then solve the rest of it from there. This turned out to be an awesome learning experience that I would love to walk you through!

By now we're all probably familiar with what Sudoku is, so let’s dive into the process of getting the digits out of the image!

Step 1: Preprocessing the Image

The first step in solving the puzzle is processing the uploaded image to make it suitable for OCR. This involves several key steps in computer vision. For the first couple of parts, I used the probably most popular Python-based CV library, OpenCV, which has tons of tools suitable for applications like this, like license plate recognition, scanning documents, and more.

The process before we look for the numbers will involve three different steps to get the image into a format that is best for a CV. First, converting it to grayscale, adjusting the image to make it more consistent, and finally turning it to pure black and white.

Convert to Grayscale

Converting images to grayscale simplifies the process of finding important features like edges, shapes, and patterns. This process eliminates a lot of extra data in the image by reducing the image to a single color channel (black to white) versus three (RGB).

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

Apply Gaussian Blur

Gaussian blur smooths the image making it cleaner and more consistent. It helps CV by detecting tiny details that disrupt edge detection. Imagine finding edges amidst dots; Gaussian blur ignores them, focusing on the real edges.

blurred = cv2.GaussianBlur(gray, (3, 3), 0)

Adaptive Thresholding

Adaptive thresholding is a smart way to convert images to black and white. Adaptive thresholding ensures clear grid lines and digits. By focusing on high-contrast elements, it creates a binary image crucial for tasks like removing grid lines and reading the puzzle.

thresh = cv2.adaptiveThreshold(
blurred, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 2
)

Step 2: Removing the Grid Lines

Sudoku grids typically have thick black lines that confuse OCR systems. I removed the grid lines using what are known as morphological operations in CV. Morphological operations will help us remove small objects or narrow connections while preserving larger structuring elements.

Structuring elements are small matrices used in morphological operations, which define the neighborhood of pixels analyzed. Their shape and size determine the features detected or removed. This method below uses a horizontal kernel that detects horizontal lines and a vertical kernel that detects vertical lines.

Tip: If you run the code provided, you can play with the value and see how it effects finding or not finding the grid lines.

horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (40, 1))
vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1, 40))

horizontal_lines = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)
vertical_lines = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, vertical_kernel, iterations=2)

Finally, we combine the identified lines and and subtract them from the image.

grid_lines = cv2.add(horizontal_lines, vertical_lines)
grid_removed = cv2.subtract(thresh, grid_lines)

By this stage, the image is stripped of grid lines, leaving only the digits. Let's take a look at the final image and how all of our processing helped. You can see a little noise left from the grid lines; further optimizing parameters could help us completely remove them, but I found that this was sufficient.

Step 3: Extracting Individual Cells

With the cleaned image, the next step was to isolate the 9x9 grid and extract each cell. This part is pretty straightforward. I divided the grid into 81 cells based on the image dimensions.

cell_size = grid_size // 9
for row in range(9):
for col in range(9):
x_start, y_start = col * cell_size, row * cell_size
x_end, y_end = x_start + cell_size, y_start + cell_size
cell = grid_removed[y_start:y_end, x_start:x_end]

Next, I used a couple of other CV utilities to add a little extra padding to each cell as well as resize them all to make them consistent. The padding ensures that the number is isolated, and the size of the image ensures consistency in character recognition. You could imagine that if you added too much border here and then resized the image, the number in the cell would end up being very small; don't add too much.

cell = cv2.copyMakeBorder(cell, 5, 5, 5, 5, cv2.BORDER_CONSTANT, value=0)
cell = cv2.resize(cell, (50, 50))

Step 4: Recognizing Digits with OCR

Recognizing the digits is the core of the solution. I ended up using Tesseract OCR, which plays a pivotal role in this step. Tesseract is an open-source optical character recognition (OCR) engine that translates visual information (images of text) into machine-readable characters.

We can fine-tuned for Sudoku’s specific use case: recognizing single digits (1–9) within individual grid cells. This is achieved by configuring Tesseract with appropriate parameters:

custom_config = r"--oem 3 --psm 10 -c tessedit_char_whitelist=123456789"
text = pytesseract.image_to_string(cell, config=custom_config).strip()

Each prepared cell is passed to Tesseract for character recognition. If Tesseract returns no recognizable character for a cell (or misinterprets noise as a character), the cell is marked as empty (0) in the Sudoku grid.

# Parse OCR result

if text.isdigit():

sudoku_grid[row, col] = int(text)

Let's take a look at the final results. If we check out each line, we can see it did pretty well. Not perfect but not bad. The first two lines are perfect, but the third line is missing the 1 that comes after the 2. Line six is also missing a 1, and line seven is missing the first 3. With a little more tuning, we should be able to get this perfect.

Step 5: Solving the Sudoku

As I said at the beginning, I will keep this section light since this is a very commonly solved problem. I used a backtracking algorithm to solve the puzzle. This tries different possibilities and “backs up” when it hits a dead end. It starts by searching for the first empty cell and tries placing a number from 1 to 9 in it, checking if it’s valid. A number is valid if it doesn’t conflict with existing numbers in the row, column, or 3x3 box. The code for this is ironically simple!

def solve_sudoku(board):
for row in range(9):
for col in range(9):
if board[row, col] == 0:
for num in range(1, 10):
if is_valid(board, row, col, num):
board[row, col] = num
if solve_sudoku(board): return True
board[row, col] = 0
return False
return True

Deployment

Like all the apps I've been writing latley, I use Streamlit since it's really quick to develop some sort of user interface for it. It's not the most elegant, but it allowed me to prove out this whole method.

For deployment, I set up a couple of different methods other than running it locally. There actually are a few interesting things in these. The first is a Docker file to deploy this to a container, and the second is deploying this to Streamlit's Community Cloud. Normally, we simply add any Python dependencies in the requirements.txt file, but I learned that you can also list any Linux packages in a packages.txt file, which will be installed when your app is deployed. This was critical for my app to run since Tesseract OCR is not a Python package but rather a Linux package.

If you would like to try this out yourself, you can see it hosted on the Streamlit Community Cloud, but I would also encourage you to check out the code and try to run it yourself. Everything is documented in the README on GitHub.

My Lessons Learned

Preprocessing is Key: Proper image preprocessing dramatically improves OCR accuracy. This process took a lot of iteration to get right, or even close. It sometimes isn't perfect with this model so far. This makes me really appreciate how much effort professional solutions must go into in order to get these highly accurate!

Choosing the Right OCR/CV Technology: I I tried three different OCR methods in order to find the best one. I first tried EasyOCR, a simple Python package, but its performance would only capture about 60% of the numbers in the grid. I then used the solution above, which is Google's Tesseract OCR, but that requires an additional installation, which caused version dependency issues. I also experimented with a Tensorflow CNN that was based on a pre-trained model of the popular handwritten digits. This was both overkill and the worst performing of them all since we're not using handwritten digits! Long story short, experimentation for your use case is key!

Conclusion

Building a Sudoku solver using Computer Vision and OCR was a challenging yet fun. I wanted to share my thoughts on this project and the importance of preprocessing and choosing the right tools for the job. While the solution isn’t perfect, it shows how combining these technologies can automate complex tasks. This project taught me a lot about image processing and OCR, and it showed me that experimentation and iteration are key to creating effective solutions. I hope this inspires you to explore similar projects and push the boundaries of what’s possible with technology.

View full post