This page begins with testing a specific allegation that Eugen Varshavsky cheated with confederates transmitting moves during the 2006 World Open, and most specifically that this cheating occurred during his Round 7 upset of GM Ilya Smirin. (I had thought the allegation included his having a receiver in his hat, but none was found on inspection.) Varshavsky at the time had a USCF rating of 2160, which compared to Smirin's rating above 2600 would leave Varshavsky expecting to score under 5% in the long run---especially with the Black pieces as in this game. This topic, accusation, and game were featured in the 4/8/07 NY Times Chess Column (now behind subscriber-only curtain) by Dylan Loeb McClain. This article also reported on the March 2007 Chess Life story by Jon Jacobs on anti-cheating efforts (including this site). Primary sources on this particular controversy include:
The 4/8 McClain column was also linked in this 4/8/07 post in the Susan Polgar chess blog. For human-computer similarity testing, the operative words in McClain's article are:
"...After Varshavsky won, Larry Christiansen, a grandmaster, found that the last 25 moves matched those chosen by a commercially available computer program called Shredder. With no copy of Shredder, I compared Black's moves with those suggested by Fritz 9. From 14 ... a5 (the first move that varied from what had previously been played) to the end, Fritz agreed with 34 out of 44 moves."
(This continues the uniform pattern that results of scientific experiments are reported in the chess world with no provision of data, methodology, logs, reports, anything to permit reproducibility of tests by others... These scientific fundamentals are overlooked amid need for due process with persons directly named and reputations involved. Neither the ply-depth of testing, the mode of testing (single-line or multi-line, or the "retrograde" game-analysis modes in the Fritz GUI itself), nor even the version of Shredder, has been given by any source I've seen on this story---which is still reverberating after 9+ months. This site attempts to remedy these lacks---you can dispute my methods but at least they're reviewable!)
Long test file of
Smirin-Varshavsky, 0-1, round 7 of the 2006 World Open, with
Shredder 9.1 and the basic methodology used on the
Corus 2007 testing page.
Tabulated results in
this file.
These results show 23 of the last 25 moves matching, and 17 of the
previous 19 at some high ply depth as well, plus 1 tie.
Testing with Shredder 10, which was released 6 weeks prior to the game,
is in-process. Larry Christiansen tells me he used Shredder
Classic/Solid, a prior version, running for about a minute per
move to (at least) ply-depth 10. Queries and testing on whether
differences between Shredder versions are as large as those between
Fritz 9 and 10 shown in Corus 2007
rounds 2 and 3 here are in-process.
The lone move reported by no one as a match to "Shredder"
is 29...Rf8?!, which appears to give away Black's advantage and is
not near the top 10. However, the long test file
has a materially-relevant
hypothetical explanation even for this non-match: In single-line
mode, the move 29...Qa7 is initially preferred, until at 16 ply
Shredder (9.1) uncovers a "surprising" big swing to White's advantage,
after 2-1/2 minutes of running on my 2Ghz laptop. It then takes
Shredder almost 25 minutes to resolve the resulting confusion
about the best move---too long for real-time advice to a player (assuming
the confederates' hardware was not greatly superior to my laptop).
So they may have had to say ``play a move'' as matters shook out.
Bottom line: The results substantially confirm the
testing by GM Christiansen and the above reports of it (except for
commenters in the Chess Ninja items ascribing the "25 matches"
to Fritz 9), and further indicate a consistent narrative of
cheating during the entire game.
Most to the point, many of the matches are in close
situations, in contrast to Topalov-Kramnik Elista 2006 game 2.
Our statistical calculations, when finalized, will show
high information gain and will assert that
the results meet court standards of statistical evidence of
improbability for the null hypothesis (of no cheating).
In other words, if this is not a "smoking gun", nothing is...
Long test with Fritz 9,
with tabulated results
here. Close but not perfect
agreement with the match rate reported by McClain.
Inspection of the results leads us to believe that the formal
statistical testing will show both significant evidence of
collusion with strong programs in general and a significant
difference between this and the Shredder (9.1)
test results.
(NEW, 5/24/07)
Long test file of Bartholomew-Varshavsky,
Round 5 from the same tournament, with tabulated results
in this file.
These results also show a high information gain,
with only 4
clear non-matches out of 48 moves, and 28 significant ones
(plus 9 matches on clearly forced moves and 7 unclear/partial matches).
Note that this test was conducted entirely after conclusions
on this page from the game with Smirin were written, and hence
confirms the preliminary finding of significant evidence of
collusion.
Current working consensus in the chess world, reflected by
National Director Steve Immitt at the end
of (the move-match section of) the March 2007 Chess Life article, is
that match-rate statistics must be accompanied by some other primary
evidence---such as physical or eyewitness evidence.
This site supports this policy.
One temporal reason is the preliminary
state of both the theory and the gathering of necessary data.
A second, permanent, reason is
Littlewood's Law.
Here this "Law" says that if you play 1,000 games, chances are at
least one of them will match a given engine in a way that in isolation
would be deemed to have a less than 1-in-a-thousand chance of happening
without collusion. Hence other factors that in court cases go under
the headings of "motive" and
"probable cause" must be brought into play. In this case
the game was distinguished by being against a top player in a
big-money event. Some evidence of odd behavior is given in the
Chess Ninja blog comments linked above, but nothing physically concrete.
Finally, a public appeal for help doing the testing in a
scientifically rigorous manner. The only automated/scriptable modes of
testing
provided currently by chess engines (in the Fritz/Chessbase GUI)
work in reverse from the end of the game---and preserve hash evaluations
of later positions in the main line that tangibly affect evaluations
of current positions in ways not available to prospective cheaters.
In the March 2007 Chess Life cover story
I am quoted as requesting greater
``scriptability'' of commercial chess engines (as could be
provided in a revision to the
UCI standard), but until then, realistic tests
require manual operation
over a similar hours-long timeframe as the actual game and activity
they are modeling. Tough for one busy prof to do---but
lovers of chess who have the discipline to do science faithfully can
really help out.
Preliminary public-service messages
And both determinations can be made
to accepted court standards of evidentiary statistics.