Note: The original version of this article incorrectly stated that there were 39 shows in the Autumn list, not 38. This has been corrected.
There’s a lot of gray area that can surround the performance of shows early on in a season, before hard stalker numbers roll in and make prior predictions fairly irrelevant with their impressive model. I’m interested in what some of these early and less-rigorous numbers can mean, both for disk sales and lower-cost indicators, and I’m slowly putting together a list of them for Fall 2013 to test against how the first volumes of disks actually sold, along with how big a sales boosts related print media got.* This post deals with very little of those, just outlines three dummy models against which all other indicators will be judged when I get around to it.
The first criterion for a bit of data having any real meaning is its ability to do better than guessing that is either random or very dumb, outperforming what is traditionally known as the null hypothesis. Let’s say you knew roughly what the anime market looked like in 2013, and were trying to build a model to try and guess how the 38 shows to on that list linked above would do. Let’s say you’re just interested in whether a show will sell more or less than 4000 copies, and you judge the success of your model by how well it’s able to peg which shows go over and which shows go under. For reference, 13 of the 38 had their first volume sell over 4000, so the “actual” over odds are about ~34%.**
Method 1: Flip a Coin (Randomly assign probability based on a 50% chance)
In this case, you’re most likely to end up with 19 in each of the over/under categories. Since the odds of an over bet being right are 1/3 and the odds of an under bet being right are 2/3, this means that the most frequent number of correct guesses one would have here would be 19*.66+19*.34=19~19 correct guesses. That’s about a 50% accuracy rate.
Method 2: Play the Naive Odds (Assign weighted probability based on knowing, from historical precedent, that there’s about a 2/3 chance any given show will sell under 4000)
In this case, you end up with 13 in the over category, and 25 in the under category. Probabilities are the same as before, so you’ll wind up with an expected 25*.66+13*.34=20.9 correct guesses. That’s a ~55% accuracy rate.
Method 3: Play the Naive Odds Full Hog (Guess all of them will end up under 4000)
In this case, all 39 are in the under category, and we’ll see exactly 39*.66=25.7 of these 39 guesses to be correct, for a ~66% accuracy rate.
This is one of the basic checks I’ll be making against the various sets of data I’m currently collecting – can it do better than a smart blind man putting all his chips on black? It’s not the only way to measure usefulness; for example, if a dataset pegs the 5 top-selling shows of a season dead to rights in order, that’s a definite point in its favor, even if the rest of its top 10 are also-rans.*** However, if an indicator can’t beat dumb savvy, that’s a definite mark against it.
*As soon as there’s enough “after” data to really call that one – Tokyo Ravens’ first proper post-anime volume didn’t come out until this week.
***I’ll be looking to see what percentage of the top 5 series in normalized google traffic became the top 5 v1 sellers, what percentage of the top 10 became top 10 sellers, etc. Other metrics I know I’m including are Torne DVR Rankings [collected], Nico stream rankings (total viewers, percentage of 1s given, and total 1s given) [collecting, slowly], and myanimelist popularity and ranking for shows [will collect]. Those I know I can collect for enough of over the Fall 2013 sample to make it meaningful.