Fun With Numbers: Myanimelist Stats Predicting the Fall 2013 Disk Over/Unders

Edit: Link to source document fixed.

Like those of any other free-access site, or any one set of numbers, statistics from myanimelist should be utilized with caution. But that doesn’t mean they shouldn’t or can’t be used. In this article, I’m looking the current rank and popularity of 38 shows from the Fall 2013 on myanimelist to determine how they stack up against other methods of doing the same. As before, note that any accuracy below the best null hypothesis of .66 is fairly insignificant.

By the by, the myanimelist stats used in this piece were taken in late April, 2014. Odds are they’ve changed a bit since the outset of the season, though shifts within the season tend not to be particularly cataclysmic. Most can probably be viewed as in a similar boat to December Torne/eps 9-12 Nico ranks, in that they’re made when the audience has more information about the show (i.e. has seen more episodes). I used both rank/score and popularity/total number of users to order the shows, and then checked how many v1 sales using the top 5/10/15 shows on those lists correctly pegs as over or under 4000 disks. Data can be found on this doc, and the results are below (indicators greater than the null hypothesis in green):

mal-compTurns out, both indicators do about the same, both beating out the null hypothesis for all 3 list sizes. Out of the metrics I’ve examined so far, that’s the best overall performance. Of course, keep in mind that this is a fairly small sample, and there are other ways (notably source material sales) that these indicators may be telling. It’s still fairly early in the project to say much, beyond the fact that mal stats are at least a little more indicative than random chance.

Fun With Numbers: Nico Stream Rankings Predicting Fall 2013 Disk Over/Unders

Niconico stream rankings are one of the most frequently talked-about stats when it comes to early-season indicators of how a show might be performing. It’s been somewhat frustrating to me that they seem to get brought up at the beginning of every season, but how accurate their predictions were has never really been revisited. By comparing their basic effectiveness with the eventual disk sales results, hopefully we can get a little more context for how those numbers should be interpreted.

Nico ranks come with a couple of numbers attached, so I’m looking a couple of them to see which ones worked best. I use 3 metrics: the number of viewers at the end of the episode stream, the percentage of viewers who gave the episode a 1 (the max score), and the product of those first two (i.e. the number of users who gave the episode a max score). All 3 of these metrics were averaged over episodes 1-4 (October), episodes 5-8 (November), and episodes 9-10,11, 12, 13 (December). I then determined how accurate guesses based solely on these method would be if the top 5, 10, and 15 items on the list were assumed to have their first volume sell over 4k disks.

It should be noted that only 25 of the 38 Fall shows I’m using v1 data for have available Nico rank data, 9 of which had v1s sell over 4000 disks. In the interest of not polluting the sample, I’m just checking the accuracy of the rankings within the sample, excluding the other 13. This does leave the sample with a different over/under ratio than the full seasonal sample (36% instead of 34%).

Anyway, here’s how the various methods performed on the Fall data (detailed data here, breakdown doc here). Numbers for the ones that beat the .66 accuracy afforded by the best null hypothesis are shown in green:

nicoran-compTotal number of viewers arguably performed the best, besting the null hypothesis outright in October and December. Max rating % did very well when only taking the top 5, but fairly poorly otherwise (the top 15 models barely beat out a coin flip model). Max rating number was fairly spotty, except as a broad-strokes indicator in October and a top-tier indicator in December.

This is data taken from a fairly small sample, but these numbers do suggest that the top-tier of the rankings and the total # of stream viewers are at least better than random chance at picking which shows will succeed, something that couldn’t be said for most Torne DVR lists. How impressive that is depends on how well other indicators do and how well it holds up to an extension of the sample size.

Via Newtype USA: [inside] Madhouse

Along with an understanding of the broader context of the subject, the most vital ingredient to good anime coverage is a reliable source. So when US journalists actually interview people on the production side in Japan, it’s generally worth noting unless the interview consists entirely of fluff. This is the latest of what will hopefully be a couple more posts archiving articles from Newtype USA’s [inside] series of articles written by Amos Wong. This studio Madhouse feature includes Masayoshi Kawajiri and Kazuo Koike talking about their mentor-student relationship (and the differences between movies and TV), Producer Yuichiro Saito talks about the merits of doing all types of projects, and Hiromichi Masuda talks production-side differences between eastern and western animation.

Note: Pictures are scans of the article made on my crappy scanner, which cover the article text but not the entire page. They’re also in greyscale, because I’m interested in archiving interview text and color scans make the process more of a headache than it needs to be. Apologies for that. Scans after the jump, along with comments on the contents of the article.

Continue reading

Via Newtype USA: Hiroshi Nishikiori (August 2003)

Hiroshi Nishikiori, one of only 17 directors to helm multiple non-sequel 10k hits, offers some commentary on what was then his latest work, Gad Guard. One sidebar reveals that Gad Guard is actually inspired by Easy Rider. Which makes perfect sense.* (spoilery comment on said point after the article pics)

Continue reading

Via Newtype USA: Carl Macek (August 2003)

Carl Macek, creator of Robotech, talks about his production philosophy and what he views as compatibility issues between American and Japanese, as well dubbing Aura Battler Dunbine and the nuances of ‘old’ and ‘new’ when it comes to anime.

Continue reading

Via Newtype USA: NISIOISIN (July 2003)

I’m going back through the issues I already scanned some articles from, and it turns out that just about every issue has an interview with a novelist, some of them now extremely prestigious. This issue includes one with NISIOISIN a good two years before the Monogatari series, where he mentions about his personal writing speed record – 170 pages in a day.

Continue reading

Fun With Numbers: Torne DVR Rankings Predicting Fall 2013 Disk Over/Unders

Note: The original version of this article incorrectly stated that there were 39 shows in the Autumn list, not 38. This has been corrected.

This is the first proper article in what will hopefully be a few individual examinations of the power of various indicators to predict disk sales in a rough manner. This segment deals with Torne rankings, a list ranking shows based on how often they were recorded on PS3 DVR. The original data can be found here, and is collected on this doc. I’m comparing these ranks with the rank of v1 disk sales to see how predictive the former were of the latter, based on an over/under 4000 accuracy model.

See this post for details, but basically, one can get 66% accuracy just by predicting that every v1 will sell under 4000 disks (25 of 38 shows in the season were under). Hopefully, good models will do better than that. In order to test how well each metric performs, I take the top 5, 10, and 15 seasonal shows from the metric and guess they will be over 4000 while everything else will flop, then see how many of those “guesses” end up being right. Clerical note: I’m excluding shows from different seasons (eg Monogatari, HxH) that end up in the Torne rankings, and my top 15 shows are the top 15 to rank from Fall 2013 only.

Since Torne rankings are available monthly, I also compared the accuracy of different months; one might expect later months to be closer to the final sales total, as people have had more time and information (read: episodes of the show) with which to make an informed decision. Assuming the metrics are indicative at all, of course.

Anyway, here’s how accurate the top 5/10/15 guess models were for October/November/December Torne Rankings (numbers outperforming the null hypothesis in green):

torne-correctedThe top 15 model doesn’t seem to be particularly precise in general, but accuracy improved each month, and the December rankings were actually notably accurate. We’ll see how these indicators stack up against others (and how well they compare with source boosts) later.