lichess.org
Donate

How insanely strong has Stockfish this year become? (TCEC)

The stockfish fork for lichess is kept in sync with official stockfish, but it might be a few Elo weaker compared to the original one due to the changes that enable it to play variants. These changes do not change its functionality for standard chess, but they might slow down stockfish by a few percent which is equivalent to a few Elo points.
I don't think variants slow SF down. It might not be optimized for them but when it's not playing variants the code for it just never gets called.
I'm pretty sure SF level 8 still is severely handicapped, otherwise it would be way stronger. (though it might just have a fraction of a second to move)
But the analysis SF is full strength and I think up to date.

Also, handicap games aren't useful for rating comparison, 1. because we simply don't know how big of a rating diff a handicap would be equivalent to at such high levels, 2. Engines don't play "very well" in those games since they aren't optimized for it. E.g. they will trade pieces even in material odds games where a human would not (to keep material on board).
The implementation of variants can slow down stockfish, e.g. because of if-statements to check whether the variants code should be called and because of additional class member objects that have to be calculated. However, benchmarks tests show that the difference has to be very small. I just mentioned it for the sake of completeness. I think a few rating points (at most) do not really matter except for engine competitions. As you said the time control is much more important.

I agree that handicap games do not give very accurate results, but they at least allow to calculate an approximate lower bound. It takes many more games without handicap to get the same amount of information, because the rating differences are so big.

So basically there are 3 methods with 3 different problems:
- match against strong engine: large statistical uncertainty due to the rating gap
- handicap match against strong engine: uncertainty due to the effect of the handicap
- match against weak engine, strong engine vs. weak engine extrapolation: extrapolation is inaccurate
Oh, I didn't know that. Sounds weird, why not simply use different program versions for different variants so that each program can only play it's own variant? But I guess it really doesn't matter.
@MoistvonLipwig, I wrote "Also read some sf code is used/stolen for houdini."

You misinterpreted what I poorly said. Houdini has used some of SF code. I heard that somewhere, cant site it, so it's unreliable anyways.
@MoistvonLipwig: Before compiling you can choose (in the Makefile) which variants it should be able to play and the code of variants that are not chosen is ignored by the preprocessor during compilation, so that be no problem. However, since having several different engines on the servers would complicate matters it is preferred to compile a version that supports all variants.

@Rairden: Of course, they try to improve Houdini by implementing ideas that work well in other engines. Sometimes it will work, sometimes it won't. When stockfish developers get to know something interesting about Komodo's or Houdini's algorithm, they do not ignore it either. There is a difference between creating a clone of an engine and using only a few ideas from it.
According to the Computer Chess Rating List (CCRL) at http://www.computerchess.org.uk/ccrl/404/ the current version of Stockfish Stockfish (160716 64-bit) used at the TCEC tournament is currently rated **only** 3445! While this rating is not quite 3500, I think it is probably an accurate representation of its current playing strength.
Sure, 3445 is accirate. But 3445 CCRL rating points. And there is no formula to transform CCRL rating points to human elo. The only thing one can do is compare e.g. to old Fritz versions which still played man vs. machine matches but as pointed out in #9 that doesn't give accurate human ratings.
If we were to try to transform it to elo it would probably be below 3445 because in engine vs. engine matches the rating difference is generally "inflated" i.e. the stronger engine will score more points than it should according to the skill difference, so the rating difference is bigger than it should be. (that is because Carlsen on a bad day can lose twice to Naiditsch, but Stockfish even on a bad day usually won't lose to a so much weaker engine twice since it's, well, a machine ;))

This topic has been archived and can no longer be replied to.