Fun With Numbers: Myanimelist’s Sequel Problem Should and Can Be Fixed

If you’ve looked at the myanimelist Top Anime list recently, you’ll notice that it contains anything but an even distribution of franchises across time. As of this writing, 14 of the top 30 come from the past 5 years, and 9 of the top 60 were pieces of animation which aired last year. The 2010-2014 period saw a lot of anime being produced, but as impressive as that amount was, it was hardly 45% of the historical total. Obviously, these rankings have a decently strong recency bias, skewing somewhat heavily towards newer anime. That in itself it perfectly fine. These rankings aren’t meant to be a paragon of good taste, but just to accurately represent how the site’s large userbase as a whole feels about them. This recency bias is a product of several factors: the site’s young userbase, the increasing number of anime being produced (and becoming available via simulcast) in recent years, and (possibly) a change in the quality of the on-screen product. In other words, it’s a product of the value accurately representing what it’s built to measure – overall popular consensus.

The same cannot be said for the same rankings’ strong preference for sequels. Not counting the two which are full-on retellings (FMA: Brotherhood and HxH 2011), 14 of the top 30 are either a continuation or a spinoff of a pre-existing franchise. This sequel bias, a product of non-fans tending to drop a show while fans almost always continue, actively interferes with the pure score’s ability to represent what the average viewer likes, as there is not a single sequel that gets watched by an average sample of the anime-viewing population.

Granted, that no series ends up getting seen by a truly random sampling of people. Time is limited, so the average viewer relies on gauging PVs, promo art, and word of mouth against their own tastes to determine what they want to watch. Thus, 99% of a given show’s audience are people at least a little interested in what it has to offer. However, there’s a significant difference between the interest level of someone who’s willing to try a series for one episode and one who’s already been watching it for upwards of 4 hours. That’s a difference in interest floors any season 2 or 3 or 4 gets to reap the benefits of, as the sixes, sevens and eights fall away much faster than the nines and tens do.

As an literal indicator of what a series was scored out of 10 by its average viewer, the current MAL scoring system is fine. As a way of sussing out which series own the popular consensus, and which newcomers checking it for ideas should try, sequel bias creates a notably deficient system. Taking one example, it’s particularly unfair in scoring things that run all at once; the same show built for a 24 episode run will have less of a chance of getting into a toplist with a given cutoff than one consisting of 2 separate 12-episode cour. It’s certainly unfair if two shows of theoretically-equal quality and runtime have unequal odds of making the top whatever simply because one didn’t air continuously. It defies common sense to allow that sort of arbitrary trait to be the margin that separates two otherwise identical-quality shows.

The most obvious solution to the problem of sequels getting the fans-only treatment, merging all franchises into a single entry, is bad for a number of reasons. Myanimelist is, first and foremost, a database that allows people to list the anime they’ve watched, and continually appending seasonal entries it a headache for those tracking the episode count, to say nothing of how franchise movies would get mixed into that brew. Too, if you want to take to task the 50% of sequels which are below average, you still have to respect the other fifty. Occasionally it’s the second season that wins the peoples’ hearts, standing far above the rest, Clannad: After Story being perhaps the most oft-cited case. And there’s also a meaningful difference between a second season that at least keeps the balls in the air and one that drops them into a sea of fire; it’s not like fans rate sequels on fundamentally different criteria than they rate the first season, so the differences in one person’s given scores remains a meaningful value. I personally can name two franchises where my rating jumped 5 and dropped 6 points between seasons, for good reason, and appreciate being able to separate those two.

The good news is that there is a fairly simple way to accommodate great sequels while noting the ones putting for pars or bogeys, and the necessary stats for it are already more or less built in to myanimelist’s system as it stands. In addition to scoring all shows they view, the site allows users to designate 5 series in particular as “favorites”, effectively allowing them to differentiate between their 10/10 scores and their 10+/10 scores. This allows us to get a better look at how sequels perform among fans; in theory a sequel surpassing the original should get proportionally more favorites, whereas a

To check whether or not this was true, I measured the number of total members, number of favorites, and score for shows at positions 1-30, 301-330, and 601-630. I then split each group of 30 up into sequels and non-sequels and calculated the average percentage of total members which had favorited the show for each group. The results, shown in more detail here, strongly support the assumption that favorites are far more frequently given to a show’s first season, even when accounting for a sequel’s overall decrease in popularity:

mal-fav

Controlling for score and popularity, an average sequel series on myanimelist is 2 to 5 times less likely to be favorited than a non-sequel counterpart. This correlation doesn’t hold for every case; Clannad: After Story and K-On!! both see strong favorite percentages relative to the rest of their group. That’s expected, since some sequels will just be better. The difference is robust enough that it can be applied towards a new, better ranking system.

There are a couple of mathematical ways to apply this idea, none fundamentally better than another. I settled on on averaging in the favorites F for a series with average score S and total members M as 11/10 scores, like so:

X= (S*M+11*F) / (M+F)

Let’s take a look at how this would work on the current toplist. This is what the top 30 looked like when I pulled it a few days ago:

t30-old

You’ll notice there’s a good number of sequels to both Gintama and Mushishi ranked slightly above the original. Not that both series aren’t great, but it’s dubious as to whether they’d fare any better than the originals against a neutral panel of judges. Here’s how the same 30 would look like after being re-sorted using the favorites as a corrective factor as described above:

t30-new

Here, the different installments of the Mushishi and Gintama franchises end up positioned closer to each other (as one would expect of series that maintain consistent quality from season to season), and two new series (Tengen Toppa Gurren Lagann and Death Note) jump up into the top 20.

This method works just as well outside of the top 30. Here’s what the ordering of series series in the original 301-330 group looks like before (note that Stardust Crusaders s2 is an oddity because of its newly-aired status at the time of the sampling):

t330-old

And after:

t330-new

Utena, formerly stuck in the middle of the pack, rises all the way to the top, and the top 5 goes from only having one original to being 100% original. That seems to me like a better way to build a list of shows people should be watching.

There are two legitimate concerns about changing the scoring system that I can think of. The first is that it would require additional processing power, adding in a step of calculation every time the site recomputes scores for the 10,000 or so titles it has listed. The site always seems to have problems of one sort or another, and they recently made it more difficult for users to search their entire database primarily for bandwidth reasons.* Though the math still wouldn’t a huge drain on resources, it is at least a valid point of discussion.

The second is that it could change the way people use their limited favorites from a passive, personal activity to a combative one that takes into account what other users are doing. While not trivial, there’s reason to argue the worst of that worry won’t come to pass. First, the majority of people vote their own feelings, not actively trying to give a certain show a certain rank. Second, favorites are limited commodities (only 5 allowed per user) which would limit the effectiveness of their usage – one user could focus on a franchise and try to boost all of it, but they would face a cost, hurting their other favorites. Such actions would be unlikely to move the needle much unless people undertook them by the thousands.

Rules changes usually produce long-term, unforeseen consequences, and this particular change is certainly something that should be considered carefully before having any chance of being implemented. I argue that it is at least worth some consideration, as we should be very meticulous in preserving the accuracy of statistics people use to size up anime at a glance.

*A full database search can still be run, provided one enters in a pair of blank spaces into the search bar.

Advertisements

6 thoughts on “Fun With Numbers: Myanimelist’s Sequel Problem Should and Can Be Fixed

  1. There are two further issues with this, both of which I feel to be more significant than either of the ones you mention.

    Firstly, and more notably, the fact that everyone has the same number of favourites, no matter how many anime they have watched. This means that people who have watched fewer shows can favourite more of the series that they have seen than those who have seen a lot. This wouldn’t be an issue were it not for the fact that the people who have watched only a few anime tend to have watched things from the same, relatively small, pool of series. Thus these series tend to have not just a higher number of favourites but a higher favourite rate than the less mainstream stuff. This is even more pronounced with manga, where the majority of people have only read a handful of series, and favourited all of them, whereas those series that are read by few people are usually read by those who have read a large number of series. I would feel confident attributing the entirety of the difference between Shingeki no Kyojin (12.5% favourites, 8.64 score) and Hi no Tori (4.5% favourites, 8.66 score) or between Tsubasa: Reservoir Chronicles (11.5% favourites, 8.40 score) and Nukoduke! (3.2% favourites, 8.38 score) to this effect. Naruto (8.15 score) is favourited by 19.6% of its readers – yes it has a large number of fans, but a lot (probably even the majority) of these people don’t actually read enough to fill up their favourites list in the first place.

    Secondly, this system would merely flip the problem – so that those series with multiple parts end up worse off. Some people (myself included) will only put one thing from any given franchise in their favourite list or may find themself having to pick one part of the franchise to put at the bottom of their favourites when any of the series would have made it in there. There are also a handful of well-loved franchises that have more than 5 parts – thus even if someone used up all their favourites on just one thing they wouldn’t be able to favourite all parts of it, which would inevitably lead to the parts of that series being worse off than a similarly popular series which only took up one favourites spot.

    • With regards to the first concern, that doesn’t really play out in practice. People with fewer overall series may be drawing from a smaller overall pool, but that means that series less-often watched by said people have lower total member counts. This number also figures into the weight formula. Thus, the number of favorites required to effect meaningful change on those series’s scores is also lower. Ginga Eiyuu Densetsu, for example, has the sixth-highest favorite percentage on the 1-30 list despite having the fifth-lowest member total (the only sub-50k total among non-sequels in the group), and ends up benefiting quite a bit from that. If anything, I see entry-level popular material like Mononoke Hime and Spirited Away end up with lower percentages than most of the other original series.

      What you’re talking about doesn’t immediately show up for a lot of shows, at least at the top level, but I can see where you’re coming from. I will take a look at that sometime over a broader sample size to get a better idea of how strong the effect is in general. Even if it were a major influence (something that is not clear at this time), it would be possible to correct for by allowing more favorites based on a user’s total number of completed series (say, +1 for every 100 series finished or so). So long as there was still a cap on it, it would still assign more value to a favorite than a typical 10/10.

      I disagree strongly with your second point, for three reasons. The first is that 5+ part franchises are relatively rare compared to 2 and 1-part shows, and thus a change would, at worst, result in fewer series being short-changed. Second, the data we have now shows that favorites end up spread out across longer franchises as different people like different parts, with the differences in favorite percentages between parts corresponding to which ones were standouts (i.e. the fifth Kara no Kyoukai movie, only one in the franchise with over 1000 favorites). Last and most importantly, these 5+ part series get even more sequel-drop boosting than the average 2-part show does (it’s a straight line up for Mushishi and Natsume Yuujinchou between full seasons), more than enough to compensate for the lower number of favorites from increasing numbers of drops by less-interested parties. Changing things doesn’t penalize them, but levels a playing field that was massively unfair in their favor to begin with, which is the whole point of the proposal.

      If you think of anything else, I’d be happy to hear it. I doubt this would ever actually happen, but I’ve given this a lot and would love to improve the idea wherever possible.

      • “People with fewer overall series may be drawing from a smaller overall pool, but that means that series less-often watched by said people have lower total member counts. This number also figures into the weight formula. Thus, the number of favorites required to effect meaningful change on those series’s scores is also lower.”

        I was already factoring this in. Take a hypothetical example a manga that has 10000 readers with 3 things on their lists, another 5000 readers with 10 things on their list and another 1000 readers with 100 things on their list, and another manga that just has the 1000 readers with 100 things on their list. Let’s imagine also (for the sake of simplicity) that every series has exactly the same chance of getting onto the favourites list of a reader and that every reader fully uses their favourites list. The first manga thus gets 12550 favourites from 16000 members, whereas the second one gets 50 favourites from 1000 members. You see where I am coming from?

        Obviously this is an unrealistic example – largely thanks to the fact that a lot of people don’t use their favourites. However, the proportions with numbers of readers is, for some of the more read series, if anything weighted too heavily towards the higher end.

        This situation is far less significant for anime than it is for manga, as the average person has watched far more anime than manga – MALgraph’s figures (factoring in % unrated) suggest 6 times more. And there are considerably more manga on the database than anime as well. The 100th most watched anime has 39% of the watches of the 5th most watched, whereas the equivalent figure for manga is less than 22%. Thus the issue I am bringing up will be far less significant for anime – yes, the SnK anime is on 10 times more lists than LoGH, but the SnK manga is on 27 times as many lists as Hi no Tori.

        Even looking at the LoGH anime figures, though – yes it is only one rank lower out of those 30 based on what you would expect from its score, but it is very nearly beaten by code geass and death note, both of which are well above their rank, and both of which are in the 3 top anime on MAL by popularity. The other, SAO, incidentally, is 355th, but comfortably beats any of the 30 in the second list terms of favourite % (with the score formula it would beat all but 3 of these), and almost 3 times that of the rank 7 non-sequel by score (ookami kodomo no ame to yuki).

        Here’s a graph of the change in score from this formula’s application for the top 20 manga (I was too lazy to do more) against the member count for each manga: http://oi62.tinypic.com/9bm9lk.jpg

        Adjusting the number of favourites to be based on the number of things you’ve seen would go someway to address this issue, but a logarithmic scale would probably work better than a flat one in doing so as you need significant differentiation among different numbers of read/watched within the single or double digits – as it is these people that the distortion comes from.

        • I want to make sure I’m being clear. There is effectively zero need to apply this formula to manga in the first place, as the percentage of sequels you deal with on that list drops off rather sharply compared to anime. I am not arguing for any changes in the manga scoring system, just the anime scoring one. They’re qualitatively different, and only one is notably broken. And, with anime, there’s a large enough userbase already in place that people with lists only 10 or 20 series in length are never numerous enough to break the system.

          In your example, you presuppose that the two series have equally likely chances of being favorited by the (presumably more selective/critical/experienced) group. This almost never plays out in practical circumstances – people with a lot of manga experience are less likely to gravitate towards blockbuster titles as their top 5. That’s not an indictment of it, because you are correct that manga gets higher favorite percentages in general, but just something I wanted to mention.

          “Even looking at the LoGH anime figures, though – yes it is only one rank lower out of those 30 based on what you would expect from its score, but it is very nearly beaten by code geass and death note, both of which are well above their rank, and both of which are in the 3 top anime on MAL by popularity.”

          What does “only one rank lower out of those 30 based on what you would expect from its score” mean? I may have misunderstood you, but I said it had the #6 favorite percentage of the group in spite of the #26 (i.e. fifth-worst of the list, worst non-sequel on the list) popularity count. That’s a good 20 slots higher than one would expect if it was simply a function of popularity.

          SAO, which you use as an example, is within a 301-360 interval that is separated by a grand total of 0.06 rating points (from 8.20 to 8.14). Its moving up as many places as it does has a lot to do with a) ~45% of the series it’s passing are sequels dropping due to leveling of the playing field and b) the margins are really narrow to begin with. It’s not quite the huge adjustment you make it out to be.

          The difference between #300 and #400 is actually a fairly small gap to the point of being statistically insignificant. It’s something that can change on a month-to-month basis, even after a show has finished airing. The average change in score for Fall 2013 shows in January versus July of 2014 was highly variable, with a standard deviation of roughly 0.18: https://animetics.net/2014/07/26/fun-with-numbers-short-term-versus-long-term-mal-rankings/

          Granted, this value includes shows that hadn’t finished airing at that point, but also 1-cour shows that had finished airing and saw their scores change by more than 0.06 points in the period long after the last episode aired (Kyousogiga, Coppelion, Galilei Donna, and others).

          • So you would have a different rating system for anime to that for manga?

            And the one rank lower thing was meaning it is (only) one rank lower than the score rank, not one rank lower than the popularity rank.

            Finally in the SAO example – who knows how many other series it would overtake? Its score increases by about 0.16. Looking at those figures the average seems to be about 0.04 increase (just from a glance, no calculations), which would mean it ends at about 240th perhaps?

            • I would have a different ratings system for anime and manga. They’re different mediums with different percentages of accessible.translated material, different fanbases (that do overlap somewhat), and the sizes of those fanbases are quantitatively different enough that they’re qualitatively different systems anyway, same rules or no.

              The point of this system is that myanimelist’s scores are generally fairly accurate and only need to be corrected to adjust for sequel drop effects. LoGH has the high score in the first place, and it hardly needs the level of favorites it has to gain a top 10 spot. Favorite adjustments count at a tenth of the level that the original score did, which is enough to sort out series with huge differences in them but hardly enough to do more than reshuffle the positions of series that have less than a factor-of-3 difference in favorite percentage. A difference that large is itself a meaningful one.

              The average boost for all items on the 301-300 list is deceptively small because half of them are sequels, whose boosts are small by design. For a series with a score of 8.18 and a favorite percentage of 2.5% (the average for non-sequels in that range), the new score would be (8.18*1+0.025*11)/(1.025)=8.25, or greater by difference of about 0.07 points. SAO gets a bigger boost than its peers, and its position will go up because of it, but it’s not going to suddenly start passing series currently in the top 200 – there’s a hard wall at 200th place where the average score is 8.30. For a non-sequel with an average favorite percentage within that score range, a show currently at 8.23 (currently #265) represents the highest score it could beat with its current 6% favorites and 8.14 score. If it beats anything beyond that, it only does so because of that series’ below-average favorite percentage.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s