Over centuries of cultural development humans have produced a great variety of foot coverings. Those for regular, everyday activities are usually named shoes. Among these so called shoes, the boots stand out for their ruggedness and protection of the wearer’s feet. But the most armoured foot covering of all, unrivaled in their reinforcement, are the steel-toed boots.
Imagine now what would happen if one were to apply reinforcement learning to the game of tic-tac-toe…
Tic-tac-steel-toe (TTST for short) is a hastily coded python implementation of MENACE, the 1960 self-learning, tic-tac-toe-playing “machine” originally developed by Donald Michie.
It might be a stretch, but I’m convinced MENACE and, by proxy, TTST meet the minimum criteria to be classified as reinforcement learning systems. Whatever, I’m putting “Reinforcement Learning Engineer” on my CV anyway.
There are many different MENACE explanations and implementations on the internet, some of them went so far to actually do it the old-school way: with matchboxes and colored beads. Some of my favorites are Numberphile’s MENACE video and Matt Scroggs’ many blog posts on the topic.
So why did I bother making yet another one? Mostly to claim authorship over the “tic-tac-steel-toe” pun, but also to exercise my python skills and to try out some blazing fast Python+Rust integrations (more on this in a future post).
But mostly for the pun.
Check out TTST on GitHub.