I originally worked through this problem live on stream, [which I've uploaded as an optional companion video](https://www.youtube.com/watch?v=SOuWVu1Zy44). The fundamental idea though, is you have some sort of board made of letters, and you're checking if that board _contains_ the word, for the following definition of "contains":

 1. You may start at any valid (non-space) position.
 2. You may move to any adjacent non-diagonal position.
 3. You cannot traverse invalid spaces. No gap-hopping!
 4. You cannot traverse a position twice.

If you can spell out the word by this series of traversals, the board contains the word!

All of these examples rely on a suite of common tools for testing and benchmarking:

```python
# suite.py

import string
import random
import timeit

PROPERTY_TEST_ITERATIONS = 80000
BOARD = """

  EU XKUJEN D
 DJAEN DE NMMN
 UREQWM K VEB
  DEBE XD MMM

"""

def manual_tests(fn):
    """
    A few basic sanity tests (and the intention behind them) sufficient to verify
    the reference implementation, and cheaply catch any egregious problems with
    more adventurous/optimized implementations.
    """
    print("Manually testing...")
    board = BOARD
    assert fn(board, ""), "Empty"
    assert not fn(board, "HHHHH"), "Characters that aren't present"
    assert fn(board, "U"), "A character that is present"
    assert fn(board, "VEB"), "Linear left-to-right"
    assert fn(board, "BEV"), "Linear right-to-left"
    assert fn(board, "XNWE"), "Vertical"
    assert fn(board, "XNWEBEE"), "Multiple direction changes"
    assert not fn(board, "JRJRJRJR"), "Cannot re-traverse previously used positions"
    assert not fn(board, "JREAJ"), "Cycling"
    assert not fn(board, "DENMMN"), "Gap-crossing"

def property_tests(fn):
    """
    Compare this implementation to the reference implementation for many random
    inputs, and fail on any input where the two give different answers.
    """
    print("Property testing...")
    import reference
    board = BOARD
    letters = string.ascii_uppercase
    for _ in range(PROPERTY_TEST_ITERATIONS):
        word_size = random.randint(0, 30)
        word = "".join(random.choice(letters) for _ in range(word_size))
        assert fn(board, word) == reference.board_contains(board, word), word

def benchmark(fn):
    """
    Time to complete one million executions with the same board and word.

    This is subject to a lot of the usual flaws of microbenchmarks, but since
    there isn't really a real-world use case for these word searches to answer
    meaningful questions like "how often are boards reused?", this is probably
    fine as a naive approximation, which a wise developer will take with a
    grain of salt.
    """
    print("Benchmarking...")
    bench_time = timeit.timeit("fn(board, 'XKUJED')", globals={
        "fn": fn,
        "board": BOARD,
    })
    print(f"Benchmarking took {bench_time}s")

def full_suite(fn):
    manual_tests(fn)
    property_tests(fn)
    benchmark(fn)
```

## Reference Implementation

This is the reference implementation known to be correct. It's easy to verify with manual tests. The property tests against itself are a bit silly and tautological, but it gives us a performance and correctness baseline to compare other options.

It takes about 34s on my machine.

```python
# reference.py

def _board_contains(pos_to_char, remaining, starting_from):
    if not remaining:
        return True
    if starting_from not in pos_to_char:
        return False
    if pos_to_char[starting_from] != remaining[0]:
        return False

    # At this point, we know that remaining[0] matches starting_from
    x, y = starting_from
    neighbors = [(x-1,y), (x+1,y), (x,y-1), (x,y+1)]
    
    new_mapping = pos_to_char.copy()
    new_mapping.pop(starting_from)
    return any(
        _board_contains(new_mapping, remaining[1:], pos)
        for pos in neighbors
    )

def board_contains(board: str, word: str) -> bool:
    pos_to_char = {}
    for (y, line) in enumerate(board.split("\n")):
        for (x, char) in enumerate(line):
            if char != " ":
                pos = (x, y)
                pos_to_char[pos] = char

    return any(_board_contains(pos_to_char, word, pos) for pos in pos_to_char.keys())

if __name__ == "__main__":
    import suite
    suite.full_suite(board_contains)
```

This can be improved a bit further by memoizing some up-front work, specifically the construction of `pos_to_char`:

```python
# ref_with_memo.py

from functools import lru_cache

def _board_contains(pos_to_char: dict, remaining: str, starting_from):
    if not remaining:
        return True
    if starting_from not in pos_to_char:
        return False
    if pos_to_char[starting_from] != remaining[0]:
        return False

    # At this point, we know that remaining[0] matches starting_from
    x, y = starting_from
    neighbors = [(x-1,y), (x+1,y), (x,y-1), (x,y+1)]
    
    new_mapping = pos_to_char.copy()
    new_mapping.pop(starting_from)
    return any(
        _board_contains(new_mapping, remaining[1:], pos)
        for pos in neighbors
    )

@lru_cache
def create_mapping(board: str) -> dict:
    pos_to_char = {}
    for (y, line) in enumerate(board.split("\n")):
        for (x, char) in enumerate(line):
            if char != " ":
                pos = (x, y)
                pos_to_char[pos] = char
    return pos_to_char

def board_contains(board: str, word: str) -> bool:
    pos_to_char = create_mapping(board)
    return any(_board_contains(pos_to_char, word, pos) for pos in pos_to_char.keys())

if __name__ == "__main__":
    import suite
    suite.full_suite(board_contains)
```

Which comes in at about 20s, so about 41% reduction in execution time vs the pure reference implementation, while working _almost_ the exact same way.

## Cached indexing

The performance improvement of the LRU cache made me think: "gosh, it'd probably be useful to have an `index_board(board, x, y)` function at the heart of everything." The results weren't so impressive though. The benchmark regressed to the 60+s mark and the code became painfully inelegant. If you walk through it, you can kind of see where there's a bit of work that's not being reused as well as other solutions, which feels bad on a cognitive structure level and has performance consequences.

```python
# lru_indexing.py

from functools import lru_cache

@lru_cache
def split_of(board:str):
    return board.split("\n")

@lru_cache
def index_board(board: str, x: int, y: int): # char or None
    lines = split_of(board)
    if y in range(len(lines)):
        if x in range(len(lines[y])):
            return lines[y][x]

def _board_contains(board: str, remaining: str, starting_from, visited):
    if not remaining:
        return True
    if index_board(board, *starting_from) is not remaining[0]:
        return False

    # At this point, we know that remaining[0] matches starting_from
    x, y = starting_from
    neighbors = [(x-1,y), (x+1,y), (x,y-1), (x,y+1)]
    
    visited = visited.copy()
    visited.add(starting_from)
    return any(
        _board_contains(board, remaining[1:], pos, visited)
        for pos in neighbors
        if pos not in visited
    )

def board_contains(board: str, word: str) -> bool:
    all_positions = set()
    for y, line in enumerate(split_of(board)):
        all_positions.update((x,y) for x in range(len(line)))

    return any(_board_contains(board, word, pos, set()) for pos in all_positions)

if __name__ == "__main__":
    import suite
    suite.full_suite(board_contains)
```

## Graph-theoretical approach

Now this was an incredible benchmark boost, dropping down to 13s! And through a very elegant approach. People often assume that those goals are at odds, but usually there's a symmetry between them (up to a point), and this is a nice demonstration of what I mean.

This approach probably does the most up-front work, building a network of graph traversals, starting from a "root node" (represented by `None`) and associating each node with the places you could go next (and which character that node is associated with). This up-front work is highly cacheable across executions, static during a search, and allows the search algorithm (where we spend most of our time) to be pretty fast and simple. It's no wonder this is clean and fast.

```python
# graph.py

from functools import lru_cache

def adjacent(item, x, y):
    ix, iy, _ = item
    return (ix,iy) in [
        (x+1,y),
        (x-1,y),
        (x,y+1),
        (x,y-1),
    ]

@lru_cache
def board_to_network(board: str) -> dict:
    root = []
    for (y, line) in enumerate(board.split("\n")):
        for (x, char) in enumerate(line):
            if char != " ":
                root.append( (x,y,char) )

    network = { None: root }
    for x,y,_ in root:
        network[(x,y)] = [
            item for item in root
            if adjacent(item, x, y)
        ]
    return network

def explore(network, word, start_from=None, visited=()):
    if word == "":
        return True
    next_visited = (*visited, start_from)
    return any(
        explore(network, word[1:], start_from=(x,y), visited=next_visited)
        for x,y,c in network[start_from]
        if c == word[0] and (x,y) not in visited
    )

def board_contains(board: str, word:str) -> bool:
    return explore(board_to_network(board), word)

if __name__ == "__main__":
    import suite
    suite.full_suite(board_contains)
```

I really would like to take this as a starting point, and try a recursion-free implementation of the actual search algorithm using a list as a stack. I think that would be faster while still being totally clear, since you'd avoid the function call overhead that can really penalize pure Python code in CPU-bound tasks. I can't take full credit for that idea, though, as it's something that a viewer pointed out as an approach during the stream.

# Compare them all on your machine!

This script will run each solution and give you a full list of times. Be aware that this will take several minutes to complete, end-to-end, as property testing and benchmarking are both intentionally somewhat expensive operations, especially for less efficient solutions.

```bash
#!/bin/bash
# run.sh

for solution in reference ref_with_memo lru_indexing graph; do
  echo "=> SOLUTION: $solution"
  python3 ./${solution}.py
done
```

Which, for me, produced:
```
=> SOLUTION: reference
Manually testing...
Property testing...
Benchmarking...
Benchmarking took 34.45780002500396s
=> SOLUTION: ref_with_memo
Manually testing...
Property testing...
Benchmarking...
Benchmarking took 20.912656756932847s
=> SOLUTION: lru_indexing
Manually testing...
Property testing...
Benchmarking...
Benchmarking took 68.61691900889855s
=> SOLUTION: graph
Manually testing...
Property testing...
Benchmarking...
Benchmarking took 12.597340798005462s
```