This substack will be a place for me to talk about sports, sports gambling and sports modeling.
I don’t intend to talk in detail about my past in the industry. What I will say is that I have a variety of extensive experience dealing with sports betting across a variety of sports. At various times I’ve done profitable modeling and gambling on baseball, basketball, football, soccer and tennis.
The primary initial motivation is to have a place to share my baseball projections. I have a program that takes past information and uses it to predict the odds on future games. Usually the odds on games are close to the odds the program predicts.
When the odds are different, the odds tend to move towards the odds the program predicts.
This means that if one bets early lines on the sides that the program likes, you’ll tend to get a good price. Whereas if you think the program is mistaken, and you want to bet on the other side, you are better off waiting until later.
I try to measure success of a model largely by the movement of the gambling odds. If the odds move in my favor, I know I likely made a good bet. If the odds move against me, I know I likely made a bad bet.
That does not mean that I think the closing line is always right, or that the program’s odds offer no information value at that point. It does mean it’s a lot harder to tell. The program still at least provides valuable perspective on what the odds represent.
The program uses past gambling lines, but not the same lines it is predicting (cause that’s cheating, and also means you can’t go first), as inputs.
What The Program Doesn’t Currently Know
The program does not know the starting lineups. Once those come out, the program will be at a key information disadvantage.
The program does not know bullpen availability. To the extent that this matters, you’ll need to adjust. It is worth noting that while a rested bullpen feels important watching as a fan, mostly it is not so important, because the replacements are not that different from the replaced. Teams with particular exceptional closers can be somewhat of an exception.
The program doesn’t know about weather, which can have big impacts on totals.
The program’s biggest weakness is that, barring manual adjustments, it takes a while to update when things change.
That applies across the board. If Mike Trout got hurt, the program’s opinion of the Angels offense would not fully reflect that change for several weeks. If he was somewhat traded for minor leaguers, the same would apply to both ends. I’d have to go in and make manual adjustments. If he lost his swing and struck out fifteen times in a row, it would be far slower than the market to think that predicted future performance.
In a year full of injuries and illnesses, this is a big problem.
The good news is that you know the program doesn’t know.
If the program tried to adjust for Trout’s injury, but did a randomly bad job of it, you wouldn’t know what to do to fix it. Did it overshoot, or not adjust enough?
If you know the program doesn’t know, then you can calculate the change to figure out the new odds. Use your favorite projection system to figure out Trout’s WAR per game, compare that to his replacement, and move the odds accordingly. It won’t be perfect. Effects are not fully linear and there will be secondary warping issues this won’t solve. But it will be pretty good.
Similarly, if a pitcher hasn’t started in the last month, the program treats the team’s entire pitching staff as a generic ‘pitching staff with a minor league pitcher making his first start’. It’s your job to adjust from there.
Part of this is the nature of the program. Part of this is that the program is a side project, which means I have to keep it somewhat simple.
Part of this is by design. I intentionally exclude things the program’s inputs can’t fully understand.
Finally, the program does not know what lines were actually posted. By its nature, the program implicitly has some amount of respect for the posted odds even at the beginning, and gains more respect as the line is hammered into shape over time.
What the program does know is anything that is incorporated into past gambling lines for teams and pitchers. It also has at least reasonable park factors, and takes into account left/right splits for starting pitchers and teams, but is not as sophisticated on such fronts as it could be. There are a number of areas that could be improved.