The xG of Chess: Shark Points

I don't think it is about replacing those curves, but maybe about finding what is missing in them to have more analytical measures than just one number. It seems that the odds could have other information type from the board beside material imbalance notions however remote such conversions that make such signal detected in the SF search trees.

This is how human players came to think that SF even a while ago was actually "understanding" positional concepts, while it was that deep enough it could find some material imbalance signal. So as human we do have positional forecasters earlier (shallower) than SF could detect. But what about accumulation of advantage not cashing in for a while. .depth to conversion. does that matter? I would say so, as I suspect the score amplitude between SF11 and SF15+ ramp might have been about hyperparameters docking NNue with shallower leaf evaluations (NNue is approximating deeper searches of an order of magnitude I would say). How to make deeper conversion compare with shallower conversions (yes exaggerating, but looking at classical evalua component matrix, I would not be far about which component dominate in parameter space).

So what happens for those stretches of tension or other non-consumed (or material count altering) placement changes in position transitions, is there still something that can be happening some evolution of odds based on other signals form the board?

Also, with the idea of postponing conversion, come the notion of depth of anything. Those conversion curves, are they about any depth of position or any game containing the same position at any depth, and then the proportions of outcome from all those games.

Are there point cloud versions of those curves.. I don't seem to recall. The quantity Pwin I understand being a ranged value.
I do not understand the exact nature or differences between "eval" "expected score" and "max min adverse excursion" and finally "shark points". beside other misunderstandings. But I thought I understood something from the text.. The figure with the graph seems not having a legend. for those concept. So It might be why some of us do not follow yet.

from the graphs below the network figure... it does seem that Pwn is same as expected score.

I think there is a semantic discussion to be had about assigning one dimension value or more to a position as forecasting about outcomes in terms of odds (proportion in general, and possibly only outcome in perfect play, but what would odds mean then, if there are many best moves, it would still be having same outcome, per best play from same position, otherwise not best play?.

The fact that chess outcome for singular games are only of 3 ordered but not numerical values, and that we are juggling with only one number we call odds of winning.. without any depth dependency..

That talk about sharking around existing advantage without conversion yet, is introducing the notion of non converted advantage contribution to odds.. Could we have other dimension than our material counts as visible or analytical measure, however deep those can be. I see a pedagogical interest in separating information into visible things. or measurable independently not just a one number score. I may have misunderstood from own curiosity preferences, that I find lacking for a while. in how we use analytical tools, or engine for chess. 42 is not satisfying, we need the question. And I think this blog is pushing on that aspect.. What is a score?

i deleted a bunch of posts. for this partial version. I wanted to focus on that core question of defining carefully.. eval from score from odds.. And then maybe I could understand the max min adverse excursion. I think I have it all wrong... not the sharking I thought.. The shark point obviously not seen. either I understood the concept before the figure and then not the measure.. or both..

however.. very interesting premise.. besides the shark analogy distraction. I actually prefer the chess version, it is enough for me.. but analogies when understood do help.. If this is about considering more analytical measure "dimensions" for any position.. I find it great.. in any case, even if I am completely off, then I had a nice rambling journey.. might dump all the stuff i delete elsewhere.. for their own sake. not as related to this blog, if I am not sharking in the right direction.

Equations and definitions with equations of the object being plotted would be nice.. even if not easy to input as text here. I prefer symbolic equations to numerical examples.. the interpretation of the symbols to concept being more direct and centralized in the page visual chunk. like my chess board. I zoom it at 50%. to reduce my lazy eye wandering (not really...).

The football figure indicating some information processing between the object would then be more palatable and shareable.
Then the figure would not just have to be swallowed, blindly but some critical thinking could happen in the readers mind that would possible provide for helpful comments here. I am a bit at same stage as previous. finally. equations. As I think the ideas are worth it.

warning: this is patchwork. I went back to edit some paragraphs up there. because I can't help my verbosity.. until exhaustion. now.

dboing

edited

#13

alternative interpretation.. I saw something about wasted points. Delaying conversion being bad not an alternative path...
So using delay to cashin on an advantage as a waste of advantage.

> Ding had a win in the position as Qxd8 Nxc1? is met by Qg5!! Ding played Qxe2 and the game eventually ended in a draw.

Ding did not cash in. So too much sharking around?

nope. I am lost.. I tried.. but nice first half that I thought I got. And possibly I made an alternative blog here.. arguing the opposite of the lesson to be learned.. don't be a shark? or be a shark? which one.. the blood bathing dillydallying one?

Maybe the box and arrow figure, could have operations on the arrows between the concepts. It might help.

Toadofsky

#14

A less popular name for the same concept: blunder/mistake count. This is why I strongly recommended Lichess normalize blunder/mistake/inaccuracy based upon win% even before the Stockfish team had the same idea:
thechessmind.substack.com/p/news-you-can-use-stockfish-normalizes

TotalNoob69

#15

How would you do it, @Toadofsky ? Can it be inferred from client information and the normal server cp values or does it need to be a server database normalization?

Toscani

edited

#16

If creating an inaccuracy helps to get a better position than it cannot be an inaccuracy. It goes the same for an error or a blunder. Sometimes you need to give to get more. But how much is an engine willing to give to win?
Depth is not everything, but helps to create a 100% accuracy game. Especially at depth 50.

By move 13... Qb8: The value dropped to +0.0/50 depth and remained there.
If only we knew what piece had priority to gain a better position. Maybe in this game it would have been better to find an pruned or undetected inaccuracy before move 13.

github.com/official-stockfish/Stockfish/discussions/4754
github.com/official-stockfish/Stockfish/commit/ad2aa8c06f438de8b8bb7b7c8726430e3f2a5685
github.com/official-stockfish/Stockfish/issues/4155
www.chessprogramming.org/Pawn_Advantage,_Win_Percentage,_and_Elo

Toadofsky

#17

@TotalNoob69 said in #15:
> How would you do it, @Toadofsky ? Can it be inferred from client information and the normal server cp values or does it need to be a server database normalization?

I trust the counts provided by Lichess, although given thechessmind.substack.com/p/news-you-can-use-stockfish-normalizes I wonder if Lichess code is over-correcting.

dboing

edited

#18

@Toadofsky said in #14:
> A less popular name for the same concept: blunder/mistake count. This is why I strongly recommended Lichess normalize blunder/mistake/inaccuracy based upon win% even before the Stockfish team had the same idea:
> thechessmind.substack.com/p/news-you-can-use-stockfish-normalizes

I think SF is doing the same conversion for reasons I can,t understand yet about how they train their NNue (what target data matrix or vector, what is NNue output objective functoin minimizing, another engine search score? or is it outcome).

In spite of that unknown or unclear, I understand that the normalization SF16 is using to be conceptually needed. Given the hike in amplitude from SF12 to SF15+. Or maybe they are gearing toward a more agnostic stable scoring range, no matter what the internal scoring model is. It could still be that the simple evaluation is still, mathematically, the same frozen components classical leaf evluatoin, and that they are gain leaf evaluatoin quality improvement as per SF12, but with posiitions input training set from leelas data...

I still have to dig through the new wiki on the NNue part to find the big picture of its machine learning information flow (what goes in and what comes out just for the NNue building block).

But normalization to same range as odds does not mean the internal model of SF scoring is based on odds. It may be that we have a reversal of the situation from when LC0 had to conform to exhaustive engine tournment format borne UCI standards, which were demanding cps... Then LC0 has to convert odds to cps... save kind of curve.. same kind of loss of information too (I claim, or suspect cps is about a partial view of chess world anyway, and that we might need to look at the point cloud data of those curves systematially).. anyone can explain how those curves can relate a position score to game outcome ratios.. It seems we only see the averaged curves or estimated curves, rarely the variation (statistics not chess, word hell).

But it would allow SF taking a more stable function image based language.. But we need our human chess feedback. material count is still a valuable ROT system that we can wield wihtout engine.. I think to learn more fron engine point of view (I insist we should keep its human crafted origin in mind and hence its point of viewness....ELO should not blind critical thinking, anyone reading this?).

So back to lichess mistake/blunder few-bins error classification function. I thought it was already using such conversion curve... and translating cp unit differences into its current position odds effect hence, when game near high odds**.
So that a whole game set of mistakes to review would be not all game outcome agnostic cps amplitude errors but a selected digest subset (usual already a bunch of work enough over one game), aware of whole game outcome odds along the game. so modulating the learn from your mistake with the long term value of such mistakes.

so I am not sure to understand what added concern you are proposing for lichess error filter function?

** high odds: and it will as game nears outcome, even SF can detect specifically certain classes of non-material imbalances near mate (endgame? only?), actually I might be wrong but I think the limited input feature set used for NNue, is actually such king centered safety function of board signals, possibly related to related similar classical eval components (in their model inception before parameter opmitization).

while on the lichess pedagogical suggestion, and extracting human valuable infromation from SF. If lichess were to have limited depth runs of SF for each error called out pedagogically, it could have another teaching moment opporunity, to have blunder depth to visible consequences.. Find the input SF depth at which the human blunder would be converted materially (since this is likely still the major leaf signal or it could be if running without NNue for pedagogical and analytical purposes).

so one learner with thirst to learn more about long term consequences to upstream error, than short term tactical blunders, or at least be aware of that dimesnion of nature of errors. (and the complement: chess beyond calculation things to learn), coudl choose to focus on the more positional concepts of their play.. learning from error being based on chess being logical and zero sum.. that the completment of error might be better play.... something like that.

so easy.. and with the blog general intent of using more than a flat unknown depth to learnable position infromation conseqeunce dumb error calling out.. what is a score, I insist.? there is score and there is score.. about time we look at the entrials.. and if lichess could relax its blind cloud pursuit of depth, and look back at human worthy information in shallow SF depths it would be great. rememeber, we are not seeking to make SF win any game.. we are trying to make it useful for the user. so, it is more important to characterize the human chosen error, and its depth to conversion, and SF finding the best 42 answer....

dboing

#19

i seems i have a lot of points to bring to bear in my conversations.. bear with me.. I have done my internal ramblings from many angles, and I still have some points of view to express that can't be packaged, it is research to me... not engineering, yet. needs all the angles sometimes to seek some common increased understanding.. or I see fog in many places...

dboing

#20

still not sure about the shark analogy advice from Carlsen.. I might have my shark behavior understanding wrong too.. that and cats never taught to the kill by their mother, appearing to toy with they prey, which is their play behavior development program they get to practice autonomously with sibling, by the way, spreading some knowledge, kind of, while rambling, does not hurt does it?

so sharks.... Is it like in the french saying "Un tiens vaut mieux que 2 tu l'auras"? how does one translate such things (I actually normally do it all the time, subconsciously, my excuse for non-native english to non-native english miscommunication. in english). beside my mind turbulences of course. can't blame the langugaes all the time, but can't I?

so... was Carlsen saying win in a hurry? I guess time controls might make this a common sense assumption.....

I get not wanting to waiting for stalling egos in a tournament... but maybe the balancier has gone craze in the speed of light is the limit direction? This was an interlude from the land of dreamland chess, a.k.a. correspodance chess, where we are pretty sure to sleep on some positoin from on going games each time we sleep.... Long live the correspondance games...!