9 Comments
User's avatar
Zvi Mowshowitz's avatar

What would Benter mean in practice? What's the bet? I notice I am confused there.

I haven't checked Brier scores but keep in mind that you don't need a superior model to the market to win. If you have one you are a true monster.

Results this good are never sustainable at this level, period. And lack of line movement is a very bad sign. But there ARE ways to win without it, and over large enough samples money talks and bullshit walks. Question is how big

Expand full comment
krixusthegaul's avatar

Benter's idea is that if you have two models (yours and the market) that derive their probabilities from different data, and they both agree, then the actual probability that that event happens is more likely than each of the models says individually. Meaning, in our case, that if both your model and the market say under 8, then it's more than a 50% chance that that would actually occur. I think the full description is in his paper from 1992 or sometime like that.

I can check the Brier scores and post them. Not sure exactly when but I'll try to do it either for this post or the next results post.

Expand full comment
Zvi Mowshowitz's avatar

If you have a link to the paper, that sounds fascinating. I think I get where he's coming from, but also I think it requires more than different data, it requires the different data be causally independent?

I don't think his thesis applies here even if it applies elsewhere.

If nothing else, it implies impossible things. For example, take your Under 8. Let's say that I say 57% and the market says 58%. But then what about Over 7? Both our models once again say more than 50%! But that implies that our estimates for P(exactly 7) and P(exactly 8) are too low, which seems like a crazy thing to conclude, especially since you'd conclude it regardless of what our estimates were.

Different data is presumably key here. I'm getting to the result via a different method than the market, but it's not like I'm looking at a truly distinct set of information in the sense that would make Benter's idea apply.

On Brier score, the market *will* beat us, but I'm curious to see how badly, and to compare it to other models that are out there that don't look at today's prices. We get to play offense against the market, so if we beat the market on Brier score, we'd completely crush it on money.

Expand full comment
krixusthegaul's avatar

I'm surprised you haven't seen it before. It's Benter (1994), not 92: https://www.gwern.net/docs/statistics/decision/1994-benter.pdf

I'm not sure I understand your Under 8/Over 7 example. Why would over 7 be more than 50% necessarily? Maybe I'm misunderstanding a couple of things, so I apologize for the probably too detailed explanation/question/assumptions below:

- if market says "under 8", then there's ~50% chance (minus take) of both under 8 and over 7 (since we must include 8)

- if both your model and the market say "under 8", and your model derived information from other data - which yes, I do think it has to be causally independent, but I think this would apply for any market where "regular people" can bet - Benter's findings indicate that the odds of under 8 are actually greater than the ~50% given - maybe at e.g. 55%

- that means over 7 is 45% probability

So I'm not sure how you could have the models both say over 7 > 50% and under 8 > 50% unless there's something funky going on with the models and they don't provide accurate probability estimates.

Do you know of any other models that post predictions publicly that I could also run the Brier score for?

It looks like your results table doesn't include the win% that Aikido notes for the game, only the market's win% at time of bet. That column has a Brier of .246 (.25 is guessing 50% every time, .33 is guessing randomly iirc). It also seems a little overconfident (based, again, on the market-at-bet column): https://imgur.com/a/IPJYzwB last year's (.269 Brier) looks like it was underconfident but in general very off-calibration. I'm not sure any conclusions can be made from this without the Aikido odds themselves though.

Expand full comment
Zvi Mowshowitz's avatar

OK, so roughly speaking, the chance of a game "landing 8" is ~8%, and the chance of a game "landing 7" is ~14%. So if Under 8 is 57% to win, that's roughly 53% to win, 39% to lose, 8% to push. But 14% of those wins are games where it lands 7. So Over 7 would be 47% to win (39+8), 39% to win and 14% to push. A clear big favorite.

I think the pushes are messing you up, so let's think about 7.5 and 6.5 instead. It's easy to see that many times, Under 7.5 is >50% to happen, and Over 6.5 is >50% to happen, but we shouldn't adjust both odds in opposite directions when multiple models agree on that. Basically, I'm saying that what you call 'the line' where we set odds determines what the favorite is, and this seems like a problem for Benter (I haven't read it yet).

I unfortunately don't know of anyone else who is bold enough to post their own odds on games like this. There are people who post *picks* but that's of course very different. One thing you could check is the opening lines of the games, as compared to closing ones.

Also note that if you do totals, you'll need to figure out how to deal with conversions (e.g. I say over 9.5, you say Over 8.5, and if we both have 50/50 and you don't convert them, we'll both have the same briar score).

Expand full comment
krixusthegaul's avatar

Both models will only output one of "under 7.5" or "over 6.5" right? All that Benter is saying is that if both of them say "under 7.5" with 50% probability, then (with conditions) the actual probability is actually greater than 50%. That's his "combined model" (pages 5-7 of the PDF), which does actually adjust odds to be more spread out than originally.

I'll try to go back and fill in the Aikido odds for your picks at some point (next 2-3 weeks) to see if the effect is present or not. (feel free to call me out if I don't post anything until the 15th) I also think I'm still very confused about the odds in general so maybe that's why I'm not understanding your example. :(

Expand full comment
Karl's avatar

Sorry this is late...gmail started burying these for some reason even though I've told it not to!

Back when I was working on handicapping all the time instead of just picking off bad lines, I was firmly in the camp that closing line value was basically all that mattered. As such, I'm a little nervous about long-term viability. I think a breakdown of CLV on dogs vs favorites might be interesting.

Expand full comment
krixusthegaul's avatar

To be clear, the line movement being random means that the units over the long run will not be sustainable, correct?

Did you run some brier score metric over the current results? With ~10 bins there should be enough to judge somewhat whether the model is calibrated or not. I know there's also some formulas for calculating confidence intervals for identifying bins where the sample size is too small to tell whether the model is calibrated or not.

Are you considering betting Benter-style at all? (in that, if your model and the market agree, then you vote towards what both your models are saying)

Expand full comment
krixusthegaul's avatar

(sorry for the repeat comments, seems like my internet connection was not great...)

Expand full comment