Tests with Rybka 3 to depths 13 (compares to other engines' depth 16) and 17 (=20), Rybka 2.2 (the free one) to depth 17, Deep Junior 10.1 UCI to depths 18 and 21, HIARCS 12.1 to depth 15, Toga II 1.4 beta7 (privately made for KWR) to depth 18. Run using single cores of a quad-core Dell XPS 420 with Intel 9400 2.66Ghz processor, under 64-bit Vista with 64-bit Rybkas, Junior, and HIARCS, 32-bit Toga II. Using single cores gives better reproducibility and makes my machine approximate a typical 32-bit dual-core laptop. CCRL Crafty 19.17 BH 32-bit benchmark for 1 core on my PC: 29 sec. (48 sec. is par) "Spreads" run by Rybka 3 to depth 13 (20-PV) and Toga II 1.4b7 to depth 18 (10-PV). Based on the Rybka data, and regressed against the games in the title events since San Luis 2005, my model "predicts" 8.35 matches to Rybka from the 10 Black moves, and also 7.12/10 for White. Since both values are well above the actual of 57% for all moves in the database, this indicates that this particular game had a relatively forcing character---for both players, and especially for Black. The prediction should be adjusted downward for a 2600 not 2700+ player, but not by much. Clips of output from my statistical program (not ready for release) are at the end. CONCLUSION: GM Mamedyarov didn't understand that on this particular game a high match-rate for the opponent would not be unusual, and he made an all-but-accusation without checking relevant facts. Saying in his letter, "The next moves [12 thru 21] from him were given as first choice by Rybka, which quickly allowed him to win the game." indicates /confirmation bias/, taking his test result as confirming his belief that his opponent's absences (to smoke outside, eyewitnesses say) were to get computer moves. Better awareness of what my website has boldfaced as the MAIN STATISTICAL PRINCIPLE since Oct. 2006 is needed to prevent these damaging accusations. [Event "Aeroflot"] [Site "Moscow"] [Date "2009.02.22"] [Round "6"] [White "Mamedyarov"] [Black "Kurnosov"] [Result "0-1"] [ECO "D70"] [WhiteElo "2724"] [BlackElo "2602"] [Annotator ",Microsoft"] [PlyCount "42"] [EventDate "2009.02.23"] [SourceDate "2009.02.23"] 1. d4 Nf6 2. c4 g6 3. f3 d5 4. cxd5 Nxd5 5. e4 Nb6 6. Nc3 Bg7 7. Be3 O-O 8. Qd2 Nc6 9. O-O-O First move out of Deep Junior 10.1 UCI's book. 9...f5 10. h4 10...fxe4 11. h5 11...gxh5 12. d5 "Surprised" Black, coat behaviour noted here, so treat as end of theory. Engine Move Eval Spread of other options in 10- or 20-line mode Rybka3d13: 12.Rxh5 +0.32 Rybka3d17: 12.Rxh5 +0.23 No changes at any depth Ryb2.2d17: 12.Rxh5 0.00 No changes; Rybka 2.2 shows all depths regardless of match DJ10.1d18: MATCH +0.42 DJ10.1d21 MATCH +0.37 Deep Junior shows no changes in Arena, and takes *ages* for depth 21 HI12.1d15: 12.Rxh5 +0.25 TogaIId18: 12.Rxh5 +0.20 12...Ne5 Rybka3d13: IK-MATCH +0.06 12...Na5 0.13 worse, next 0.68 worse. Rybka3d17: IK-MATCH 0.00 No changes Ryb2.2d17: IK-MATCH -0.29 12...Na5 at depths 5-11 DJ10.1d18: 12...Na5 +0.72 DJ10.1d21: IK-MATCH +0.55 HI12.1d15: IK-MATCH -0.29 TogaIId18: IK-MATCH -0.27 13. Bh6 Rybka3d13: MATCH +0.06 Rybka3d17: MATCH 0.00 13.Bd4 at depths 9-13 in this run Ryb2.2d17: 13.Qc2 -0.26 13.Bh6 at depths 3-5 and 12-14, 13.Bd4 at depth 6 DJ10.1d18: MATCH +0.49 DJ10.1d21: MATCH +0.53 HI12.1d15: MATCH -0.06 TogaIId18: MATCH -0.32 13...Nec4 Rybka3d13: IK-MATCH 0.00 13...Nbc4 0.42 worse, 13...Bxh6 0.57 worse Rybka3d17: IK-MATCH 0.00 No changes (depth 2 doesn't count) Ryb2.2d17: IK-MATCH -0.15 No changes DJ10.1d18: 13...Bxh6 +0.49 DJ10.1d21: 13...Bxh6 +0.56 HI12.1d15: IK-MATCH -0.24 TogaIId18: IK-MATCH -0.42 14. Qg5 Rybka3d13: MATCH -0.13 Rybka3d17: MATCH -0.05 Ryb2.2d17: MATCH -0.15 No changes DJ10.1d18: 14.Bxc4 +0.39 DJ10.1d21: 14.Bxc4 +0.23 HI12.1d15: MATCH -0.25 TogaIId18: MATCH -0.43 14...Rf7 Rybka3d13: IK-MATCH -0.13 Only-move Rybka3d17: IK-MATCH -0.05 No changes, instant (didn't clear hash for remaining moves) Ryb2.2d17: IK-MATCH -0.25 No changes DJ10.1d18: IK-MATCH +0.58 DJ10.1d21: IK-MATCH +0.43 As for d18, the eval is upticked from the previous ply HI12.1d15: IK-MATCH -0.25 TogaIId18: IK-MATCH -0.28 15. Bxc4 Rybka3d13: 15.Bxg7 -0.20 Rybka3d17: 15.Bxg7 -0.05 No changes, only depths 15-17 shown Ryb2.2d17: MATCH (!) -0.23 15.Bxg7 at depths 9-11 DJ10.1d18: 15.Bxg7 +0.36 DJ10.1d21: 15.Nxe4(!) +0.26 HI12.1d15: 15.Bxg7 -0.29 TogaIId18: 15.Bxg7 -0.41 15...Nxc4 Rybka3d13: IK-MATCH -0.11 15...Qd6 0.42 worse, nothing else. Rybka3d17: IK-MATCH -0.12 No changes, all depths shown since after non-match Ryb2.2d17: IK-MATCH -0.24 No changes DJ10.1d18: 15...Qd6 +0.20 DJ10.1d21: 15...Qd6 +0.07 HI12.1d15: IK-MATCH -0.35 TogaIId18: IK-MATCH -0.49 16. Rd4 Actual end of theory Rybka3d13: 16.Bxg7 -0.11 Rybka3d17: 16.Bxg7 -0.12 No changes, depth 15-17 shown Ryb2.2d17: 16.Bxg7 -0.22 16.d6 at depths 4-6 DJ10.1d18: 16.Bxg7 +0.29 DJ10.1d21: 16.Bxg7 +0.04 HI12.1d15: 16.Bxg7 -0.56 TogaIId18: 16.Bxg7 -0.31 16...Qd6 Rybka3d13: 16...Nxb2 -0.99 16...Qd6 0.10 worse, 16...Nd6 0.56 worse, then > 1.00 Rybka3d17: 16...Nxb2 -1.07 No changes, all depths shown Ryb2.2d17: 16...Nxb2 -0.57 No changes, except 16...Qd6 at depths 2-4 DJ10.1d18: IK-MATCH -0.21 DJ10.1d21: IK-MATCH -0.42 HI12.1d15: 16...Nxb2 -1.14 TogaIId18: 16...Nxb2 -0.90 17. Bxg7 Rybka3d13: MATCH -0.72 Rybka3d17: MATCH -0.99 No changes since depth 4 Ryb2.2d17: MATCH -0.60 No changes DJ10.1d18: MATCH -0.53 DJ10.1d21: MATCH -0.85 HI12.1d15: MATCH -1.08 TogaIId18: MATCH -1.02 17...Rxg7 Rybka3d13: IK-MATCH -0.72 Only-move Rybka3d17: IK-MATCH -0.99 No changes, depths 16-17 shown Ryb2.2d17: IK-MATCH -0.63 No changes DJ10.1d18: IK-MATCH -0.53 DJ10.1d21: IK-MATCH -0.80 HI12.1d15: IK-MATCH -1.07 TogaIId18: IK-MATCH -0.89 18. Qxh5 Rybka3d13: MATCH -0.72 Rybka3d17: MATCH -0.99 No changes, depths 16-17 shown Ryb2.2d17: MATCH -0.76 No changes DJ10.1d18: MATCH -0.83 DJ10.1d21: MATCH -0.90 HI12.1d15: MATCH -1.03 TogaIId18: MATCH -0.98 18...Qf4+ Rybka3d13: IK-MATCH -0.97 Only-move Rybka3d17: IK-MATCH -0.99 No changes, depths 16-17 shown Ryb2.2d17: IK-MATCH -0.77 No changes DJ10.1d18: IK-MATCH -0.65 DJ10.1d21: IK-MATCH -0.71 HI12.1d15: IK-MATCH -1.10 TogaIId18: IK-MATCH -0.97 19. Kb1 Rybka3d13: 19.Kc2 -0.97 Rybka3d13: 19.Kc2 -0.99 No changes, depths 16-17 shown Ryb2.2d17: 19.Kc2 -0.77 No changes DJ10.1d18: MATCH -0.95 DJ10.1d21: MATCH -1.01 HI12.1d15: 19.Kc2 -1.10 TogaIId18: 19.Kc2 -0.94 19...Bf5 Rybka3d13: IK-MATCH -1.23 19...Nd6 0.94 worse, 19...Ne3 1.36 worse Rybka3d17: IK-MATCH -1.26 No changes, all depths shown Ryb2.2d17: IK-MATCH -0.88 No changes DJ10.1d18: IK-MATCH -0.76 DJ10.1d21: IK-MATCH -0.80 HI12.1d15: IK-MATCH -1.35 TogaIId18: IK-MATCH -1.14 20. fxe4 Rybka3d13: 20.Ne2 -1.23 Rybka3d17: 20.Ne2 -1.43 No changes, depths 15-17 shown Ryb2.2d17: 20.Ne2 -1.09 20.fxe4 at depths 4-6 DJ10.1d18: MATCH -1.06 DJ10.1d21: MATCH -0.90 HI12.1d15: MATCH -1.50 TogaIId18: 20.Ne2 -1.18 20...Bg4 Rybka3d13: IK-MATCH -1.45 20...Qf2 0.98 worse, 20...Qf1+ 1.01, rest are = Rybka3d17: IK-MATCH -1.72 No changes, all depths shown Ryb2.2d17: IK-MATCH -1.31 No changes DJ10.1d18: IK-MATCH -0.79 DJ10.1d21: IK-MATCH -0.74 HI12.1d15: IK-MATCH -1.65 TogaIId18: IK-MATCH -1.40 21. Nge2 Rybka3d13: 21.Qh6 -1.45 Rybka3d17: 21.Qh6 -1.77 No changes, depths 15-17 shown Ryb2.2d17: 21.Qh6 -1.20 No changes DJ10.1d18: 21.Qh6 -1.04 DJ10.1d21: 21.Qh6 -0.91 HI12.1d15: 21.Qh6 -1.60 TogaIId18: 21.Qh6 -1.36 21...Qd2 0-1 Rybka3d13: IK-MATCH -1.98 21...Qe3 1.83 worse, so like only-move. Rybka3d17: IK-MATCH -2.34 No changes, all depths shown Ryb2.2d17: IK-MATCH -1.61 No changes DJ10.1d18: IK-MATCH -1.89 DJ10.1d21: IK-MATCH -2.04 HI12.1d15: IK-MATCH -2.21 TogaIId18: IK-MATCH -1.85 From 10 Black moves, I expect my model to predict about 8.7 matches when I run the full context data thru my big program, for *any* 2700+ human GM or similar-quality engine. Totals for Black: Rybka3d13: 9 Rybka3d17: 9 Ryb2.2d17: 9 DJ10.1d18: 7 DJ10.1d21: 8 HI12.1d15: 9 TogaIId18: 9 Rybka 3 depth 17 agrees with Rybka 3 depth 13 (separate run) on all White and Black moves, although when run with clean hash at White's move 13 it prefers a different move at depth 13. In runs on many other test games, Rybka 3 and 2.3.2a show many fewer mind-changes than any other engine I've tested. Rybka 2.2 depth 17 agrees on all Black moves, disagrees on two of White's. Times for Rybka 3 to reach depth 17 on Black's move, all but the first 2 with the help of saved hash: 9:14, 6:49, 0:03, 6:51, 19:48, 0:49, 7:57, 3:08, 8:32, 1:44 Total 64:45, so still only half what a supposed accomplice in the 2-1/2 to 3-hr. timeframe of the game may have been able to spend, but pretty close. Since times for individual moves are not recorded, it's hard to judge this anyway---partly why I use a fixed-depth standard. Rybka 2.2 is faster 3:50, 1:48, 1:25, 0:26, 01:52, 0:29, 1:45, 1:04, 1:06, 0:18, but its reported depths were most-often said to undercount by 2 not 3. **** The main extra ingredient for my *prediction model* is analysis not just of what's the "best" move, but of all of the top 20 or so options at each turn. This is in the file Mam-KurAeroflot2009R3d13XX.log and when fed into my model, leads to a prediction (by regression against nearly 10,000 similarly-analyzed moves in recent super-tournaments and World Championshiop matches) that any player of 2700+ calibre would achieve 8.35 from 10 matches to Rybka's moves on average. My program prints these full breakdowns for each player: For Kurnosov: ----------------------- From 10 total turns (10.00 weighted), it is consistent to expect the level of players sampled to have the following stats: Unwtd. first-move matches: 8.35, stdev 1.12, 83.47% Two-sigma range: 6.11--10.58, 61.12%--105.82% Unwtd. eq.top-move matches: 8.35, stdev 1.12, 83.47% Two-sigma range: 6.11--10.58, 61.12%--105.82% Predicted Unwtd. index frequencies of played moves: 0: 8.35, 83.47%; two-sigma range 6.11--10.58, 61.12%--105.82% 1: 1.21, 12.12%; two-sigma range -0.74--3.16, -7.36%--31.61% 2: 0.25, 2.53%; two-sigma range -0.73--1.23, -7.28%--12.34% ...trace for the rest. Histogram of indexes of played moves, capping blunder deltas at 4.00: 0: 9, 90.00%, wtd. 9.00, 90.00%; wtd. mean delta = 0.00 1: 1, 10.00%, wtd. 1.00, 10.00%; wtd. mean delta = 1.07 ... Finally, expected and actual falloff, with blunder cap as set at 4.00: Expected scaled weighted: 1.17, 0.117 per scaled, wtd. move. Actual scaled weighted: 0.10, 0.010 per scaled, wtd. move. ------------------------ The 0.10 is for the one mis-match, 16...Qd6 not 16...Nxb2. For Mamedyarov: ----------------------- From 10 total turns (10.00 weighted), it is consistent to expect the level of players sampled to have the following stats: Unwtd. first-move matches: 7.12, stdev 1.40, 71.22% Two-sigma range: 4.31--9.93, 43.14%--99.30% Unwtd. eq.top-move matches: 7.12, stdev 1.40, 71.22% Two-sigma range: 4.31--9.93, 43.14%--99.30% Predicted Unwtd. index frequencies of played moves: 0: 7.12, 71.22%; two-sigma range 4.31--9.93, 43.14%--99.30% 1: 1.90, 19.03%; two-sigma range -0.54--4.35, -5.43%--43.50% 2: 0.48, 4.80%; two-sigma range -0.85--1.81, -8.45%--18.05% ... Histogram of indexes of played moves, capping blunder deltas at 4.00: 0: 4, 40.00%, wtd. 4.00, 40.00%; wtd. mean delta = 0.00 1: 3, 30.00%, wtd. 3.00, 30.00%; wtd. mean delta = 0.35 2: 1, 10.00%, wtd. 1.00, 10.00%; wtd. mean delta = 1.29 3: 1, 10.00%, wtd. 1.00, 10.00%; wtd. mean delta = 1.29 4: 0, 0.00%, wtd. 0.00, 0.00%; wtd. mean delta = 1.54 5: 0, 0.00%, wtd. 0.00, 0.00%; wtd. mean delta = 1.20 6: 1, 10.00%, wtd. 1.00, 10.00%; wtd. mean delta = 1.28 ... Finally, expected and actual falloff, with blunder cap as set at 4.00: Expected scaled weighted: 1.04, 0.104 per scaled, wtd. move. Actual scaled weighted: 2.67, 0.267 per scaled, wtd. move. ------------------------- Quite a dropoff, but what do you expect when White loses in 21 moves?