Evidence against widespread cheating in Titled Tuesday competitions

The anti-cheating people at Chess.com have begun releasing evidence pertaining to allegations voiced by the former world champion Vladimir Kramnik and some others that a tangible fraction of players in their flagship Titled Tuesday competitions cheat---and regularly so. Their first report, released today April 25, focuses on the performances by the lower-rated player ("underdog") in games.

First, I broadly concur with these findings. Moreover, I have reproduced them within my own framework. One point of commonality is that we base "underdog" on FIDE ratings, not Chess.com's own Blitz ratings. This is to avoid the supposition that players may have inflated their Chess.com ratings via cheating, so they would not appear as underdogs in as many games. An earlier analysis last November by Dorian Quelle, a PhD student in applied mathematics at the University of Zurich, shows that the FIDE Standard rating is reasonably close to the Chess.com Blitz rating in predicting results of games (see this section), before reaching its own conclusion of no-rampant-cheating from overall game results. My own framework avoids various issues with predicting underdogs' game results---such as those highlighted by FIDE statistician Jeff Sonas---by using only raw metrics of the quality of the moves made by these players. These are combined into a "Raw Outlier Index" (ROI) on a standardized scale where 50.00 is the expectation given one's rating and the standard deviation is 5.00.

This same index is used for over-the-board chess, for which there is common agreement that widespread cheating does not occur. Over large OTB populations---with formulas trained separately for Classical, Rapid, and Blitz time controls---the ROI scores conform to the standard distribution. As such, scores above 60 are expected with a little more than 2% frequency and it takes individual scores toward or above 70 to raise any spectre of cheating. That said, if a nonnegligible fraction of players are cheating, one should expect to see some daylight between the overall observed mean ROI score and the 50.00 mark. Instead, the results are:

The results below 50 may occur because the average-centipawn-loss component of the ROI metrics tends to punish the loser of a game disproportionately, and underdogs lose more often than they win. Or perhaps underdogs play slightly worse. rather than "rise to the challenge." In any event, that is the opposite of what would happen for cheating underdogs. The flashing fact is:

None of these numbers is above 50, let alone with daylight above 50.

This doesn't try to say complete absence of cheating: the error bars on the averages are in multiple hundredths and the existence of a handful of disqualifications stays within that. There is likely some mild systematic skew. On the matter of skew, it should be noted that the formulas make several suppositions:

  1. The formulas are trained entirely on over-the-board blitz chess played only in 2019---before the pandemic.
  2. They are based only on FIDE Classical ratings.
  3. For players born in the year 2000 and later whose growth in official ratings was stunted by the lack of FIDE-rated chess during the pandemic, I adjust based on expected growth curves (or not, as above).
  4. I subtract 60 Elo to account for the difference between the G/3+2" time control used in the World Blitz Championship and most other OTB blitz events (on which the formulas were trained) and the G/3+1" control of Titled Tuesday.

That said, if you believe there is significant skew from one of these suppositions being wrong---or from systematic cheating in Titled Tuesdays---then you must believe it is magically almost entirely offset by opposite skew from one or more of the other suppositions also being wrong. The success of analogous suppositions in gauging Rapid and Classical chess argues further that all four of them are correct. It is famously hard to "prove a negative"---that is, to disprove the allegation that an individual result is a positive case of cheating. Allegations of mass positivity, however, can be rejected on this evidence.

My raw data are shared identically with officials of both Chess.com and FIDE, but owing to sensitivity regarding individual players are not released publicly. I have no financial or otherwise-fiduciary relationship with Chess.com, beyond the limited sharing of data (such as occurred during the pandemic and the Niemann case). As with Dorian Quelle, this represents confirmation by an independent researcher, using an independent methodology. The results were obtained using computer time freely given for research by the University at Buffalo Center for Computational Research (CCR).