How Data Scientists cheat in Wordle?

See inside for the ultimate best guess

Feb 07, 2022

I saw three posts today about strategies in Wordle, which is a sign I need to write about it as well. Now that everyone churned out of the game and was acquired by New York Times, I feel it’s OK to share what I came up with. It also provides an insight into how Data Scientists think about real-life problems. Don’t worry; the article won’t be (too) technical.

Edit: Coincidentally on the same day 3Blue1Brown (Grant Sanderson) released a video on his channel about exactly the same topic with more technical details. As always with his content it’s worth watching: The mathematically optimal Wordle strategy.

What is Wordle?

You and everyone else in the World get the same, single 5 letter English word every day (which I will call the “target” word). You can make six (real English 5 letter words) guesses (which I will call “guess” words). After every guess, you get back “clues” of how many letters you guessed right (which I will call “clues” and depict with colourful squares). If you figure out the word in six tries you win, and you can share a little image showing the game board on your favourite social media platforms. I don’t explain it more, see for yourself: https://www.powerlanguage.co.uk/wordle/.

My First experience

I‘ve been introduced to the game by my dear friend and ex-colleague Noa Tamir (@noatamir) from Candy Crush. She shared a little colourful image with a link in it and some positive comments. Because her choices in gaming used to be excellent, I decided to take a look.

Being terrible at word games - I blame not being a native speaker, but I am terrible in Hungarian ones as well - I failed and I immediately thought about how can I cheat.

What makes Wordle cheatable (for Data Scientists)?

Three key components make the game hard to play but easy to cheat:

There are limited options for target/guess words: After 5 minutes of searching, I had a list of 5000.
Simple fixed rules: If you would know the target word, you can easily play out any number of games.
Short game, explorable universe: 5000 potential starting words and 6 steps makes the game “relatively” small (for a computer). You can potentially play all games.

So what is a good strategy? The power of inverse thinking

First of all, you are not guessing the target word. You are eliminating words from a long fixed list.

When the first letter is an “A”, of course, you start listing 5 letter words beginning with an “A” in your head, but at the same time, you stop thinking about words beginning with “B” or “C” or “D” and so on. If the last letter is “S”, you forget about everything else apart from plural letters. After each guess, the list of potential “target” words shrinks.

Of course, this is not how humans think, but we are not going to make humans think, are we?

So we need to find the best strategy to eliminate words as fast as possible until you end up with only a single choice, which will be the “target” word.

To find that, we will look at how a single step works: You have a “target” word, and you have a “guess” word, and this will exactly tell you what will be the “clue” (the 5 colours you get back, e.g., 🟩⬜⬜🟨🟩 ). Each colouring will determine the remaining eligible “guess” words.

Now you “just” need to find the “guess” that leaves the fewest words, and Bob’s your uncle (I kid you not, this is a phrase in England).

But there is a problem: This only works if you know the target word, but then what’s the point at all. Well, after a lengthy setup, this is where the Data Scientists shine.

Instead of one game, we will play _ALL_ games.

Let’s look at an example

Let’s imagine we play Wordle on 5000 different days, each day a different “target” word, but we start each day with the same “guess” word (for example, HELLO). Each day we get a “clue” back, they depend on the “target” word, but they will differ and will be determined by the “guess” word. For example, for today: ⬜⬜🟨🟩⬜ .

Let’s group the target words into a bucket based on the “clues” so, for example, for ⬜⬜🟨🟩⬜ we will have the following words in the bucket:

trill, still, skill, krill, twill, swill, skull, stall, spill, grill, drill, frill, slily, scull, quill, idyll, small, slyly, laxly, and that’s it. This was a really lucky guess. After just one guess, we are down to only 19 candidates!!!

On a different day (Let’s say the “target” word is FREAK, the first guess HELLO would give you the clue ⬜🟨⬜⬜⬜ (one yellow for the misplaced “E”): and the bucket:

rates, tires, dares, tries, saner, sired, rites, dries, aired, cares, rides, dates, aides and 1000!!! more words. There are a lot of words with “E”. That’s why we were lucky because HELLO only worked on this very day and none other.

A good strategy should work every time, not just when the stars align. So what are the best words? The ones that work for all “target” words, the ones that create a lot of small buckets distributing the 5000 target words equally.

Then after making a guess, we left with only one of these buckets.

Enter the Entropy

Luckily this property is a very well defined and understood mathematical concept called “Entropy”. I won’t go into the details, but in general, it measures “messiness”. The lowest score is for words when all the target words would generate the same clue and 5000 target words in a single bucket. The highest is when each clue has roughly the same number of target words in its bucket.

And here is my strategy:

Select the word from the candidates with the highest entropy at each turn.

Time to write some code

Did I say at the beginning to imagine playing Wordle 5000 days each time, starting with the same guess? Repeat this 5000 times? For 2.5 million days? That would be 68 thousand years!!! OK, I can make 6 guesses a day, so it’s only a bit more than 12 thousand… But still, no one has time for that.

And that’s where the simple rules come into the picture. With some programming and some looping, you can run that 2.5 million on my trusty old MacBook in about 6 minutes.

So I can calculate the entropy for every guess word and play the game like the following:

Start with the highest entropy guess word (that surely won’t be HELLO, but now we just go with it)
Write it into the Wordle game
Read the clue: ⬜⬜🟨🟩⬜ and write it into the program
The program will find you the right bucket of target words still valid. In our case: trill, still, skill, krill, twill, swill, skull, stall, spill, grill, drill, frill, slily, scull, quill, idyll, small, slyly, laxly.
Find the word with the highest entropy in the remaining words: TRILL
Write it into the Wordle game
Read the clue and write it into the program: ⬜⬜🟩🟩🟩
Next list: skill, swill, spill, quill
Pick the top one: SKILL
And bang: that was the correct guess in 3 steps

And the magic word is …

Of course, there must be an overall winner word, the ultimate first guess, the one that helps you the most, that reveals the most information about the target word is. And that word is:

RATES

It makes sense, it has the two most frequent vowels and the most frequent consonant. It ends with an “S”, so plurals are covered, and it starts with an “R”, so all the words beginning with “RE-” (like reply, rerun, reset) are covered. All the words will fall into neat different buckets, and each will have numerous elements in them.

So how would today’s game flow (I open a “Private Tab” that hides my identity from Wordle so I can play again):

First guess: RATES
Clue: ⬜⬜⬜⬜🟨
- Massive amount of information: No “A”, No “E”, No “T”. What words are left anyway???)
Remaining candidates: spoil, spiny, shiny, slink, noisy + 143 other
- You can see much more remained than with HELLO but still less than 5000 and don’t forget we were extremely fortunate with HELLO while RATES works every time)
Second guess: SPOIL
Clue: 🟩⬜⬜🟨🟩
Remaining candidates: skill, swill, shill, sibyl
Third guess: SKILL
And you already know the answer.

And that’s it!

What’s the takeaway? Apart from DSes cheating even in trivial online games? Many real-life problems can be turned into a program that can be brute-forced and get better strategies than humans guessing around.

And of course, the first guess should always be RATES…

Thank you for reading, and please share it if you think others would be interested as well.

Deliberate Machine Learning

Discussion about this post