{\huge The Crown Game Affair}
What constitutes evidence of cheating? \vspace{.2in}
Faye Dunaway is an Academy Award-winning actress who co-starred with the late Steve McQueen in the 1968 movie "The Thomas Crown Affair." She plays a freelance insurance fraud investigator, Vicki Anderson, who believes that millionaire playboy Thomas Crown is guilty of instigating a $2.6 million bank heist, but falls in love with him anyway. The most famous scene in the movie shows her both defeating and seducing Crown in a game of chess.
Today I write about the difficulty of detecting fraud at chess, and the nature of statistical evidence.
The New York Times this morning joins several chess media outlets covering the allegations against a Bulgarian player who was searched during a tournament in Croatia last month. When we mentioned this case in our ``Predictions and Principles'' post earlier this month, I had the issue of principles regarding statistical evidence high in my mind, and this is reflected in my exemplar of a formal report. It accompanies a cover letter to the Association of Chess Professionals, raising the issue of what do you do when there is no physical or observational evidence but the statistical evidence is apparently strong, and who should have oversight of standards and procedures for statistical tests.
Dunaway also had a small role in the 1999 remake, in which Crown again escapes uncaught but with a different endgame. Crown is played by Pierce Brosnan of James Bond fame. There is a James Bond quality to current speculation about possible cheating methods, from embedded chips to special reflective glasses among items considered at the end of this 70-minute video by Bulgarian master Tiger Lilov, which was also covered by ChessBase.com. But none of this speculation is accompanied by any evidence. The real action may be not with the kind of gadgeteers to interact with M or Q or even Miss Moneypenny, but rather the actuaries down below who track the numbers.
Arbiter By Numbers
Cheating and gamesmanship at chess are nothing new---only the possible source of illegal information during games has changed from `animal' and `vegetable' to `mineral.' The following lyrics from the ``Arbiter's Song'' from the 1986 musical ``Chess'' come from actual incidents at championship matches before then.
\noindent If you're thinking of the kind of things---
That we've seen in the past:
Chanting gurus, walkie-talkies, walkouts, hypnotists,
Tempers, fists---
Not so fast.
But now there are calls for directors and judges and arbiters at chess tournaments and matches to take high-tech measures against possible cheating even while the games are going on. One limitation of testing moves in the manner needed by my model is that the games must be analyzed to evaluate all reasonable move choices with equal thoroughness, which takes a fast processor core more time than the duration of the game itself when it was played.
Stll, the need for move-analysis tests is recognized by many. Indeed the very first comment posted in the Dec. 30 breaking-story came from British master Leonard Barden, who has been chess columnist of the Guardian newspaper for 54 years---and the Financial Times for a mere 34 years. Barden put the present issues plainly:
Either 1 Borislav Ivanov is probably the first adult (as opposed to a junior talent) with a confirmed low rating ever to achieve a 2600+ GM norm performance in an event of nine rounds or more... or 2 [he] is the first player ever to successfully cheat at a major tournament over multiple rounds without the cheating mechanism being detected.
Here 2600 is a chess rating that usually distinguishes a ``strong grandmaster,'' while my own rating near 2400 is typical of the lesser title called ``International Master,'' and Ivanov's pre-tournament rating of 2227 is near the 2200 floor to be called any kind of master. Although Magnus Carlsen recently broke Garry Kasparov's all-time rating record to reach 2861, my program for ``Intrinsic Ratings'' clocked Ivanov's performance in the range 3089--3258 depending on which games and moves are counted according to supplementary information in the case, all higher than any by Carlsen and his two closest pursuers enumerated by me here or here or here.
Barden bears with him the memory of a British prodigy born the same year as he, coincidentally named Gordon Thomas Crown, who passed away of illness in 1947 shortly after defeating Soviet grandmaster Alexander Kotov in one of two games during a ``summit match'' between Britain and the USSR. He continued:
There are no examples known of devices successfully transmitting chess moves in competitive play via contact lenses, the skin, the brain or other such concepts ... [T]he cheating mechanism in this case remains unexplained.
That's why it is important that somebody with access to Houdini or another top program examines the nine [games] with a program. Such a program check of the games may help to establish whether the player used computer assistance.
Thus all the tech-talk takes a back seat to simple numbers. The question remains, are they enough?
The Issue
It took me two days to run the main test with two top programs, Rybka 3 and Houdini 3, run several supporting analyses, and then run my statistical analyzer. Writing the report took another week, however, as I felt responsible also for articulating issues of how to evaluate this kind of evidence, and spelling out scientific particulars for due process. The drift of reactions to others' early scattershot tests also moved my originally-advised intent of writing my conclusions briefly and simply for chess players to writing for experts in statistical fields---and for a student audience such as in a seminar I am running this coming term.
My report gives examples addressing when and why and how odds of ``a million to one'' should be treated differently from ``a thousand to one.'' The latter typifies my results in some cases where there was also physical or observational evidence, but here there is as yet none. Here is a different example to the same effect.
Mark Crowther of London has provided an incredible service called The Week In Chess (TWIC), which collects for free download several thousand games played in tournaments over the preceding week. The current week, TWIC 948, has games by over a thousand players---1,010 to be exact---typically 4--6 per player for a weekend tournament up to 9 for an all-week event such as the Zadar Open itself. If one were to dredge all their games, one would expect to find a statistical deviation that would translate to 1,000--1 odds against some kind of ``null hypothesis'' about cheating. Clearly Inspector Javert should have left the other characters in Les Misérables alone and taken up statistics. The fear of players being fingered this way is referenced by today's New York Times article:
If every out-of-the-ordinary performance is questioned, bad feelings could permanently mar the way professional players approach chess.
Hence my policy has been that such statistical results have meaning only when there is evidence against the player that is independent of performance or move-match tests with computers by others.
With results citing million-to-one odds, however, the considerations are different---at least for chess. To find such a deviation by natural causes, one would need to dredge 20 years of TWIC---and the indefagitable Crowther has just started his 20th year.
A second fail-safe is that my tests are not invariantly correlated to quality of performance. My co-author Guy Haworth---who gave me heroic multiple detailed feedbacks on my report helping it achieve clarity and fairness---alerted me to discussion of a similar mercurial performance by Scottish master Alan Tate, also in Croatia, in 2010. I ran Tate's games through a screening test, and found only 51% move-matching, compared to figures near 70% in the present case. Indeed, Tate's defeated opponents had higher concordance to the computer in those games. My tests have also rendered negative results; my letter notes that in two major international Open tournaments they were determinative for awarding a delayed prize.
Thus I claim specific value for my tests beyond being a metric of performance, which buttresses my point in asking the chess world, what shall we do about all this?
The Letter
In a series of fortunate events after breaking his leg playing soccer, Grandmaster Bartlomiej Macieja of Poland traveled into the city to meet me during MFCS 2011, became co-author on a paper with me and Haworth, married and fathered a child, was hired as a coach by the University of Texas at Brownsville, and became General Secretary of the ACP---not all in that order. Hence it was logical to address my letter to him as well as ACP President, Grandmaster Emil Sutovsky of Israel. Here are some excerpts:
\bigskip I pose two questions, of which at least the first should be an immediate concern of ACP in conjunction with FIDE and national organizations. The second is a deeper issue that I believe needs consultation with experts in statistics and computer sciences, and with representatives of bodies in other fields that have established protocols for using evidentiary statistics in fraud detection and arbitration.
... The point of approaching ACP is to determine how [...] contexts and rules should be set for chess. The goals, shared by Haworth and others I have discussed this with, include:
More simply than my letter states, I share the worry of many that a few cases of people ``being clever'' may ruin much pleasure. This extends to accusations I believe have been ill-informed, such as the one noted in the introduction to my ``Fidelity'' public site. (The data files behind my results are kept private; whether to open them is another hard question.) I hope that certain little details in my report, such as getting clear positive results despite there being ten consecutive non-matches in one game and seven in another, will be noticed and deter others from trying to be ``cleverer.''
Open Questions
What cases of statistical evidence in your field may best inform this one?