Similarity Scores, Part 1
Kobe Bryant is the next Jordan. Dwight Howard is the next Alonzo Mourning. Mardy Collins is the next Jason Kidd. Comparing two players allow us to communicate lots of information with a few words. If someone says that LeBron James is like Oscar Robertson, you would imagine LeBron being strong, versatile, agile, great, etc. Or perhaps that’s how you might picture the Big O, depending on how old you are.
Comparing two players is also useful when you’re evaluating players. Find a historical player similar to a youngster, and you have a good idea of how he might develop. However identifying similar players can be difficult and subjective. Is LeBron the next Jordan, Magic, or Robertson? In order to take some of the guesswork out of the equation, I’ve created a similarity score using statistics. Since per-game and accumulated stats are dependent on playing time and don’t adequately reflect a player’s skill level, I’ve decided to go with standardized (z-scores) per minute stats. Originally I used just about every stat the NBA officially keeps track of, but the results didn’t pass the smell test. It didn’t make sense for personal fouls to be worth the same as points. Therefore I decided to use weighted stats, and broke them into three categories.
The first and most important category is scoring. No other historically recorded statistic is more integral to a player’s worth. Some players are expected to run the offense and have a high number of assists, while others are on the floor primarily to rebound, but few do both. However just about everyone on the court is expected to score at some point or another. Even players that score infrequently or inefficiently should be more similar to those of the same ilk. Hence I made scoring worth approximately half a player’s comparison score.
Originally I had added many aspects of scoring, but I found that they tended to take away from the main focus: efficiency and volume. Oddly I also saw better results when I limited scoring to just three stats: TS%, eFG%, and PTS/36. Since the first two are compilations of different aspects of scoring, I feel justified leaving things out like free throw percentage or three pointers attempted. And the results seemed to get better when I gave more priority to the percentages, and less to points. This is due to a wider variety in efficiency than volume. Lots of players can average 20pts/36, but few can do it at 60% TS%. Currently TS% and eFG% are both worth twice as much as PTS/36.
I split the rest of the stats into two sections which I call (for lack of better terms) “Small Man” and “Big Man”. “Small Man” is worth about a third and consists of three parts: AST/36, STL/36, TO/36. I found that assists tend to separate contrasting players better, and ranked it equal to the other two combined. “Big Man” is worth about a fifth and is OREB/36, DREB/36, BLK/36 and PF/36. Rebounding combined (but not individually) is more valuable than blocks, and fouls are minuscule, but present.
In the end, I’ve come up with a system that although has subjective elements, should provide objectivity across the board. The similarity scores use the same equation for every player, so there isn’t any bias in that respect. In other words I could try to make Jamal Crawford more similar to Michael Jordan, but that would likely make other players that are more close to him get even closer. In future I may tweak the weights, but essentially the process is the same.
Since I plan on adding these to the report cards, let’s start with the guy I missed, Chris Duhon’s 2009 season compared to others at the age of 26.
|0.044||Vinny Del Negro||G||1993||SAS||73||13.9||.563||.514||12.8||3.8||6.9||1.0||2.2|
The first thing to notice is the z-sum table, which is the similarity score. The lower the number this is, the more similar the players are. Duhon is most similar to Del Negro and Davis, with a drop off to Henson & the others. So what does something like this tell us about Duhon? Looking over the list we see lots of mediocre players and no All Stars. So the chance that Duhon will develop into something superior to his current form is rare. As for the comparables, in two of the next three years, Del Negro would have his most productive seasons. And much like Duhon, Davis languished as a reserve before catching on in his 26th year. He would become the starter for the Mavericks, and ride out a few bad seasons until the team turned things around in the mid-80s.
Stay tuned for Part 2…