Maven (Scrabble)

Maven is an artificial intelligence Scrabble player, created by Brian Sheppard. It has been used in official licensed Hasbro Scrabble games.

Algorithms

Game phases

Maven's gameplay is sub-divided into three phases: The "mid-game" phase, the "pre-endgame" phase, and the "endgame" phase.

The "mid-game" phase lasts from the beginning of the game up until there are nine or fewer tiles left in the bag. The program uses a rapid algorithm to find all possible plays from the given rack, and then part of the program called the "kibitzer" uses simple heuristics to sort them into rough order of quality. The most promising moves are then evaluated by "simming", in which the program simulates the random drawing of tiles, plays forward a set number of plays, and compares the points spread of the moves' outcomes. By simulating thousands of random drawings, the program can give a very accurate quantitative evaluation of the different plays. (While a Monte Carlo search, Maven does not use Monte Carlo tree search because it evaluates game trees only 2-ply deep, rather than playing out to the end of the game, and does not reallocate rollouts to more promising branches for deeper exploration; in reinforcement learning terminology, the Maven search strategy might be considered "truncated Monte Carlo simulation". A true MCTS strategy is unnecessary because the endgame can be solved. The shallow search is because the Maven author argues[1] that, due to the fast turnover of letters in one's bag, it is typically not useful to look more than 2-ply deep, because if one instead looked deeper, e.g. 4-ply, the variance of rewards will be larger and the simulations will take several times longer, while only helping in a few exotic situations: "We maintain that if it requires an extreme situation like CACIQUE to see the value of a four-ply simulation then they are not worth doing." As the board value can be evaluated with very high accuracy in Scrabble, unlike games such as Go, deeper simulations are unlikely to change the initial evaluation.)

The "pre-endgame" phase works in almost the same way as the "mid-game" phase, except that it is designed to attempt to yield a good end-game situation.

The "endgame" phase takes over as soon as there are no tiles left in the bag. In two-player games, this means that the players can now deduce from the initial letter distribution the exact tiles on each other's racks. Maven uses the B-star search algorithm to analyze the game tree during the endgame phase.

Move generation

Maven has used several algorithms for move generation, but the one that has stuck is the DAWG algorithm. The GADDAG algorithm is faster, but a DAWG for North American English is only 0.5 MB, compared to about 2.5 MB for a GADDAG. That makes a significant difference for download games, whereas the speed advantage is not important. (Note that unimportant does not mean that the difference is small, merely that users cannot tell the difference. The GADDAG is perhaps twice as fast, but both algorithms are fast enough.)

Rack evaluation

The first (1986) version of Maven used a set of about 100 patterns to value racks. Every single tile had a value (27 patterns). Each duplicate had a value (22 patterns). There were patterns for triplicates and quads for letters that have enough representation in the bag. Finally, the QU combination was a pattern.

Soon after the first version, Maven acquired rack evaluation terms for vowel/consonant balance and Q/U distribution. Vowel/consonant balance was a table lookup based on the count of vowels and consonants left in the rack. Q/U distribution varied the values of Q and U using a table lookup indexed by how many of each remained in the bag.

Shortly thereafter, Maven acquired a tile duplication evaluator. The idea was to vary a rack depending on the chance of drawing duplicates. For example, A is generally better than I as a tile, but if there are 7 A's and only 2 I's left in the bag, then maybe we should prefer to keep the I.

Parameter fitting was accomplished by tuning the values to predict the total of future scores. Nowadays this would be called Temporal Difference Learning.

This rack evaluation design was original to Maven. It was very successful in competing with the human champions of the day.

The design was later extended by other researchers. Mark Watkins championed what he called "tile synergy patterns." These are combination like ADES which form the basis of many high-scoring words. This is a natural extension of the design, which does significantly improve play. Maven's pattern set gradually expanded from the base set of 100 to well over 400.

Maven has since switched to a completely different architecture, proposed by John O'Laughlin and implemented in Quackle. This is the "exhaustive" architecture, where the program has a different rack evaluation parameter for each of the 3 million possible combinations of 0 to 7 tiles. With the advances in computer power over the last decade, it has become possible to tune such large parameter sets.

The downside of using an exhaustive approach is that Maven lost the ability to vary evaluations as a function of the tiles that remained in the bag. The point is that the exhaustive rack evaluator does not have terms that relate a rack's value to the possible draws from the bag.

Maven's version of exhaustive rack evaluation has added that ability. In Maven, each rack has its own liner evaluator, where the value of that rack varies as a function of the chance of drawing a duplicate, the chance of drawing a vowel, and the chance of drawing Q and U. This system has 5 parameters per rack, for about 15 million parameters in all.

Simulation

The great human champion Ron Tiekert had studied Scrabble by playing out individual positions dozens of times, and tabulating results. He suggested that with Maven's speed, it should be possible to automate that process in overnight runs. Brian Sheppard named this process "simulation", though it goes by the name "rollout" in backgammon, and "playout" in Go.

The process was to select N candidate moves using the score+rack heuristic. Then play out those moves hundreds or thousands of times to see which candidate performs best. You can vary the depth of the playout to suit your purpose; play two or four moves ahead to get a good idea about point differential, or play to the end of the game to measure winning chances.

By the mid-1990s, computers had become fast enough that Maven used simulation to choose moves in competitive games under tournament time controls. Algorithmic improvements were important to scaling simulation for this purpose. The most important innovation was to vary the number of trials given to candidates so that more successful candidates receive more effort. It was also helpful to control the racks so that all candidate moves were sampled against the same, unbiased distribution.

Analysis of games played by Maven's simulation engine suggest that Maven has surpassed the skill level of human champions.

Endgame

Strong play in Scrabble endgames is much harder than it looks. In theory, endgames are a game of perfect information, so the Alpha-beta pruning algorithm should work. But in practice Alpha Beta works badly on Scrabble.

The problem with Alpha Beta is that some Scrabble endgames require 14 moves to play out, and it is not possible to search that deeply. This is not merely a theoretical possibility. When one player is "stuck" with a tile, then playing out all remaining tiles is impossible. In that situation the optimal strategy for both sides is usually to play one tile on each turn.

Maven uses a different approach. The B* search algorithm is a selective-depth, progressive-widening algorithm that guarantees to find optimal solutions to two-player games when one can compute upper and lower bounds on the values of each position.

It turns out that it is possible to estimate upper and lower bounds on endgame positions. These bounds are correct (that is, the true value lies within the interval) for the overwhelming majority of positions. Since B* is reasonably robust in the presence of a small percentage of error in the bounds, Maven can solve endgames that other approaches cannot.

A further refinement makes Maven's endgame solutions asymptotically optimal even in the presence of errors. When the B* search terminates with a proof that one move is best, and there is still time remaining, then Maven widens its estimates by 1 point and searches again. These re-searches are usually very quick, because the tree from the previous search is still largely valid. Repeated use of this policy will progressively identify errors, starting with the smallest (and presumably most likely) errors.

Exhaustive pre-endgame

When only 1 or 2 tiles remain in the bag ("PEG-1" or "PEG-2"), it is possible to perform exhaustive searches of the state space.

The case of a PEG-1 is important, because almost half of all games pass through that state. Maven can play out such states exhaustively in almost all cases. That is, for all legal moves Maven can play out the resulting endgames (up to 8 for each legal move), and calculate which side will win the game in each case. Because there are some situations (e.g., two blanks, stuck-with-Q) that require extra effort, the calculation is performed progressively. That is, Maven expands its analysis first where the decision is close and relevant.

In a PEG-2 it is normally not possible to exhaustively examine all move sequences, so Maven goes as far as it can in the time available.

One feature of these low-tile situations is that it is very hard to safely prune the list of legal moves. For example, the optimal play is ranked behind more than 50 other moves by the score+rack heuristic more than 1% of the time.

This policy does not produce play that is theoretically perfect, because it is impossible to know what the true initial distribution of unseen tiles should be. Assuming a uniform distribution does well, and it is possible to calculate inferences about unseen tiles that marginally improves on that assumption.

Another limitation is that Maven is not addressing the "hidden information" aspect of such situations. That is, in theory there are situations where players maximize expectation by randomly choosing moves according to a probability distribution. Maven is choosing pure strategies at each node.

References

  1. pg14, Sheppard 2002
  • Sheppard, Brian (2002). "World-championship-caliber Scrabble". Artificial Intelligence. 134 (1–2): 241–275. doi:10.1016/S0004-3702(01)00166-7.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.