--- title: Wordsearch in Python visibility: public glu: - bash - bash run.sh - python3 graph.py --- I originally worked through this problem live on stream, [which I've uploaded as an optional companion video](https://www.youtube.com/watch?v=SOuWVu1Zy44). The fundamental idea though, is you have some sort of board made of letters, and you're checking if that board _contains_ the word, for the following definition of "contains": 1. You may start at any valid (non-space) position. 2. You may move to any adjacent non-diagonal position. 3. You cannot traverse invalid spaces. No gap-hopping! 4. You cannot traverse a position twice. If you can spell out the word by this series of traversals, the board contains the word! All of these examples rely on a suite of common tools for testing and benchmarking: ```python # suite.py import string import random import timeit PROPERTY_TEST_ITERATIONS = 80000 BOARD = """ EU XKUJEN D DJAEN DE NMMN UREQWM K VEB DEBE XD MMM """ def manual_tests(fn): """ A few basic sanity tests (and the intention behind them) sufficient to verify the reference implementation, and cheaply catch any egregious problems with more adventurous/optimized implementations. """ print("Manually testing...") board = BOARD assert fn(board, ""), "Empty" assert not fn(board, "HHHHH"), "Characters that aren't present" assert fn(board, "U"), "A character that is present" assert fn(board, "VEB"), "Linear left-to-right" assert fn(board, "BEV"), "Linear right-to-left" assert fn(board, "XNWE"), "Vertical" assert fn(board, "XNWEBEE"), "Multiple direction changes" assert not fn(board, "JRJRJRJR"), "Cannot re-traverse previously used positions" assert not fn(board, "JREAJ"), "Cycling" assert not fn(board, "DENMMN"), "Gap-crossing" def property_tests(fn): """ Compare this implementation to the reference implementation for many random inputs, and fail on any input where the two give different answers. """ print("Property testing...") import reference board = BOARD letters = string.ascii_uppercase for _ in range(PROPERTY_TEST_ITERATIONS): word_size = random.randint(0, 30) word = "".join(random.choice(letters) for _ in range(word_size)) assert fn(board, word) == reference.board_contains(board, word), word def benchmark(fn): """ Time to complete one million executions with the same board and word. This is subject to a lot of the usual flaws of microbenchmarks, but since there isn't really a real-world use case for these word searches to answer meaningful questions like "how often are boards reused?", this is probably fine as a naive approximation, which a wise developer will take with a grain of salt. """ print("Benchmarking...") bench_time = timeit.timeit("fn(board, 'XKUJED')", globals={ "fn": fn, "board": BOARD, }) print(f"Benchmarking took {bench_time}s") def full_suite(fn): manual_tests(fn) property_tests(fn) benchmark(fn) ``` ## Reference Implementation This is the reference implementation known to be correct. It's easy to verify with manual tests. The property tests against itself are a bit silly and tautological, but it gives us a performance and correctness baseline to compare other options. It takes about 34s on my machine. ```python # reference.py def _board_contains(pos_to_char, remaining, starting_from): if not remaining: return True if starting_from not in pos_to_char: return False if pos_to_char[starting_from] != remaining[0]: return False # At this point, we know that remaining[0] matches starting_from x, y = starting_from neighbors = [(x-1,y), (x+1,y), (x,y-1), (x,y+1)] new_mapping = pos_to_char.copy() new_mapping.pop(starting_from) return any( _board_contains(new_mapping, remaining[1:], pos) for pos in neighbors ) def board_contains(board: str, word: str) -> bool: pos_to_char = {} for (y, line) in enumerate(board.split("\n")): for (x, char) in enumerate(line): if char != " ": pos = (x, y) pos_to_char[pos] = char return any(_board_contains(pos_to_char, word, pos) for pos in pos_to_char.keys()) if __name__ == "__main__": import suite suite.full_suite(board_contains) ``` This can be improved a bit further by memoizing some up-front work, specifically the construction of `pos_to_char`: ```python # ref_with_memo.py from functools import lru_cache def _board_contains(pos_to_char: dict, remaining: str, starting_from): if not remaining: return True if starting_from not in pos_to_char: return False if pos_to_char[starting_from] != remaining[0]: return False # At this point, we know that remaining[0] matches starting_from x, y = starting_from neighbors = [(x-1,y), (x+1,y), (x,y-1), (x,y+1)] new_mapping = pos_to_char.copy() new_mapping.pop(starting_from) return any( _board_contains(new_mapping, remaining[1:], pos) for pos in neighbors ) @lru_cache def create_mapping(board: str) -> dict: pos_to_char = {} for (y, line) in enumerate(board.split("\n")): for (x, char) in enumerate(line): if char != " ": pos = (x, y) pos_to_char[pos] = char return pos_to_char def board_contains(board: str, word: str) -> bool: pos_to_char = create_mapping(board) return any(_board_contains(pos_to_char, word, pos) for pos in pos_to_char.keys()) if __name__ == "__main__": import suite suite.full_suite(board_contains) ``` Which comes in at about 20s, so about 41% reduction in execution time vs the pure reference implementation, while working _almost_ the exact same way. ## Cached indexing The performance improvement of the LRU cache made me think: "gosh, it'd probably be useful to have an `index_board(board, x, y)` function at the heart of everything." The results weren't so impressive though. The benchmark regressed to the 60+s mark and the code became painfully inelegant. If you walk through it, you can kind of see where there's a bit of work that's not being reused as well as other solutions, which feels bad on a cognitive structure level and has performance consequences. ```python # lru_indexing.py from functools import lru_cache @lru_cache def split_of(board:str): return board.split("\n") @lru_cache def index_board(board: str, x: int, y: int): # char or None lines = split_of(board) if y in range(len(lines)): if x in range(len(lines[y])): return lines[y][x] def _board_contains(board: str, remaining: str, starting_from, visited): if not remaining: return True if index_board(board, *starting_from) is not remaining[0]: return False # At this point, we know that remaining[0] matches starting_from x, y = starting_from neighbors = [(x-1,y), (x+1,y), (x,y-1), (x,y+1)] visited = visited.copy() visited.add(starting_from) return any( _board_contains(board, remaining[1:], pos, visited) for pos in neighbors if pos not in visited ) def board_contains(board: str, word: str) -> bool: all_positions = set() for y, line in enumerate(split_of(board)): all_positions.update((x,y) for x in range(len(line))) return any(_board_contains(board, word, pos, set()) for pos in all_positions) if __name__ == "__main__": import suite suite.full_suite(board_contains) ``` ## Graph-theoretical approach Now this was an incredible benchmark boost, dropping down to 13s! And through a very elegant approach. People often assume that those goals are at odds, but usually there's a symmetry between them (up to a point), and this is a nice demonstration of what I mean. This approach probably does the most up-front work, building a network of graph traversals, starting from a "root node" (represented by `None`) and associating each node with the places you could go next (and which character that node is associated with). This up-front work is highly cacheable across executions, static during a search, and allows the search algorithm (where we spend most of our time) to be pretty fast and simple. It's no wonder this is clean and fast. ```python # graph.py from functools import lru_cache def adjacent(item, x, y): ix, iy, _ = item return (ix,iy) in [ (x+1,y), (x-1,y), (x,y+1), (x,y-1), ] @lru_cache def board_to_network(board: str) -> dict: root = [] for (y, line) in enumerate(board.split("\n")): for (x, char) in enumerate(line): if char != " ": root.append( (x,y,char) ) network = { None: root } for x,y,_ in root: network[(x,y)] = [ item for item in root if adjacent(item, x, y) ] return network def explore(network, word, start_from=None, visited=()): if word == "": return True next_visited = (*visited, start_from) return any( explore(network, word[1:], start_from=(x,y), visited=next_visited) for x,y,c in network[start_from] if c == word[0] and (x,y) not in visited ) def board_contains(board: str, word:str) -> bool: return explore(board_to_network(board), word) if __name__ == "__main__": import suite suite.full_suite(board_contains) ``` I really would like to take this as a starting point, and try a recursion-free implementation of the actual search algorithm using a list as a stack. I think that would be faster while still being totally clear, since you'd avoid the function call overhead that can really penalize pure Python code in CPU-bound tasks. I can't take full credit for that idea, though, as it's something that a viewer pointed out as an approach during the stream. # Compare them all on your machine! This script will run each solution and give you a full list of times. Be aware that this will take several minutes to complete, end-to-end, as property testing and benchmarking are both intentionally somewhat expensive operations, especially for less efficient solutions. ```bash #!/bin/bash # run.sh for solution in reference ref_with_memo lru_indexing graph; do echo "=> SOLUTION: $solution" python3 ./${solution}.py done ``` Which, for me, produced: ``` => SOLUTION: reference Manually testing... Property testing... Benchmarking... Benchmarking took 34.45780002500396s => SOLUTION: ref_with_memo Manually testing... Property testing... Benchmarking... Benchmarking took 20.912656756932847s => SOLUTION: lru_indexing Manually testing... Property testing... Benchmarking... Benchmarking took 68.61691900889855s => SOLUTION: graph Manually testing... Property testing... Benchmarking... Benchmarking took 12.597340798005462s ```