sorv

sample of randomized voters

Home

About

Understanding the "pile-on lottery"

Advantages of the sorv system

How to implement scalable fact-checking

Why transparency at the algorithm level is not enough

Using sorv to fight social-media-induced depression

With sorv, government wouldn't need special privileges to report rules violations

The pile-on lottery (aka the Salganik effect)

A post a lot of silly jokes and fun facts on Reddit. Most of them get between 0-5 upvotes, and once in a while, one of them will blow up to about 50,000. If I show people the list of jokes, most people can't guess based on "quality" which were the ones that blew up.

One would be tempted to think this is a quirk of the Reddit algorithm. But in 2004, Matthew Salganik and some colleagues at Princeton carried out an experiment that showed how such a result can happen "organically".

Salganik approached a music-sharing website and got them to divide their user base randomly into eight "artificial worlds". Users in all eight worlds would all have access to the same songs. Users could rate songs, and they could see the average rating that each song had - but the average would only be the average of ratings given by other users in the same "world". And they could recommend songs to other users, but only to other users in the same world. For comparison, they also released each song to a random sample of users and asked them to give it a rating, without telling them what other users thought -- this was considered the "merit" rating of the song. They then looked at the ratings and downloads that each song received in each of the eight different worlds.

Your intuition might be that in each of the worlds, downloads and ratings would be distributed roughly evenly among the songs, or at least the songs that were "good enough" according to their merit score, with higher-merit songs getting a bump. But that intuition turns out to be wildly wrong. In the author's words: "T]he 'best' songs never do very badly, and the 'worst' songs never do extremely well, but almost any other result is possible." Some songs that spiked to superstardom in one world would fizzle completely in all of the others. What appeared to happen is that if some critical mass of people happened to like a song in one world, then they would recommend it to other friends in the same world, who would see that the song was already popular and would be more inclined to check it out and in turn recommend it to more users, creating a snowball effect that we could call the "pile-on lottery", or the "Salganik effect".

The eerie implication, of course, is that we're living in one of those "artificial worlds" -- where the songs, beliefs, individuals, and products have spiked to "superstardom" did so as a result of a random process separate from any independent measure of their "merit". The things that broke through to superstardom all had to be "good enough" to be eligible for the pile-on lottery, but not the "best".

The Salganik study was a smoking gun, but there was always circumstantial evidence toward this conclusion as well:

  • There is the fact that "investors" (and even with a few thousand dollars of their own money to spend) are generally unwilling to front up the money for talented people to become "influencers" or social media personalities, in the hopes of recouping the investment from future profits. (Even if you cynically thought that a particular influencer became rich mainly because "she's pretty", that would still lead to investors seeking out pretty people and paying for them to become influencers and recouping the profits later.) The most straightforward explanation is that there is an element of luck involved, which makes it too risky for an investor to front up the costs for someone to become an influencer.

  • Common sense should suggest that in the real world, if two songs are available online and both are reasonably good quality, but one of them has 100 million plays and the other has only 100, that's not really a reflection of the percentage of people who would prefer the first song over the other.

Conversely, there are a number of fallacies that might lead a person to underestimate the role that the "pile-on lottery" plays in social media (and other areas of life):

  • Survivor bias -- you've heard of the person who nurtured their talents and spent 12-hour days creating content, and attained a large following. But if someone else developed the same talents and produced content that would have scored just as well in a double-blind rating, but they didn't happen to attain the same large following, by definition you are less likely to have heard of them.
  • Fundamental attribution error -- the fallacy that an outcome attained by a "thing" (whether a person, an idea, a social media post) is the result of attributes of that "thing" (as opposed to external circumstances, or randomness).
  • Hindsight bias -- the tendency to believe that, if a "thing" is successful (or not successful), that you "saw it coming" in advance, based on the attributes that you think led to the success or failure.

Pile-on lottery vs. network effects (the "rational pile-on effect")

Economists have long understood the "network effect", where users are incentivized to use the same product that everyone else is using, because the more other people are using it, the more useful it is. Obvious examples would be PayPal and Venmo (you can only send money to other people who are using the same platform), and social media sites like Facebook and Instagram. This could be considered a "rational pile-on effect" because it makes sense to use the same product that everyone else is using.

By contrast, the "pile-on lottery" that took place in the Salganik experiments is less intuitive. If you are sitting in a room by yourself choosing songs to listen to, there is no particular benefit to listening to the same song as lots of other people -- and yet the Salganik experiment shows that's what happens anyway (even though there's a lot of randomness in which songs become popular, so that which "popular song you are listening to" depends on which of Salganik's artificial worlds you happen to be in).

Reconciling with Econ theory

The pile-on lottery is also counterintuitive because it appears to contradict the lessons of Economics 101 -- if a seller has a product that a buyer wants to buy at a mutually agreed price (where the "seller" is a social media creator and the "buyer" is a user paying with their attention), then in Econ 101 theory the "sale" will always take place, unless either the buyer or seller finds a better deal elsewhere. If post X and post Y both exist on Instagram, and in a double-blind test, 60% of users prefer post X and 40% prefer post Y, then in Econ 101 it makes no sense that post X would have 100,000 views and post Y would only have 10.

The difference is that in Econ 101 theory, the cost of sorting through your options is considered negligible compared to the cost of taking action (making a purchase):

[cost of sorting options] << [cost of taking action]
e.g. in a standard textbook Econ 101 problem, if a bag of apples costs $10 at one store and $12 at another store, the cost to the buyer of sorting through options (driving between stores to compare prices, comparing the quality of the apples to make sure they are really comparable, etc.) is considered to be $0 (and the buyer will always buy the $10 bag instead of the $12 bag). Obviously the cost of sorting options is never truly $0, but as long as this assumption holds approximately true, then the assumptions of Econ 101 will be a useful approximation of reality, such as the "Law of One Price", which states that in a free market, the price of similar goods will be about equal (because if people can sort through options with $0 effort, nobody will ever pay the higher price for an identical good). In reality, a laptop in one store might cost $1,000 while a very similar laptop in a store in the same city might cost $1,100, because a buyer might not consider it worth $100 worth of effort to drive between two stores and compare all the options. But it would be rare to see the same laptop priced at $3,000.

However, social media flips that logic on its head because generally the cost of "taking action" -- viewing a post/video, or "liking" or "sharing" it -- is small compared to the cost of sorting through options. If something is shown to a user by default by the Reddit or YouTube or Twitter algorithm, the user could spend hours searching through far more obscure posts looking for something that the user would enjoy even more in a "true double-blind test". But that would be vastly more effort than viewing the default selection, so that's what most people do. The pile-on lottery is only possible when:

[cost of sorting options] >> [cost of taking action]

The solution

The pile-on lottery is highly random, highly unequal, and highly non-meritocratic. But sorv solves all of this with its three-step process:

  1. Release the content to a random sample of the target audience;
  2. Measure the average response of that random sample;
  3. Promote the content to the rest of the target audience in proportion to the response that the content got from the initial random sample.
(This refers to the version of sorv used for identifying and promoting "good" content; similar steps are used for identifying valid abuse reports, or valid fact-check rebuttals.) The larger the random sample of the target audience, the smaller the role that luck plays in determining the outcome.

The point of sorv is to create a system where a creator can create content without relying on the pile-on lottery. A person can create a song performance, a recipe video, or a political essay, and be assured that if the target audience likes it -- as measured by the response from the initial random sample -- they'll get content views, new connections/subscribers, and (depending on the system) even revenue in proportion to the quality, without having to rely on luck, or favors from other high-profile users, or algorithmic manipulation. (Where "algorithmic manipulation" includes pushing out content frequently for months or years because the algorithm favors high-volume creators independent of quality -- something that would no longer be a factor with sorv.) It also means that if the content isn't successful, the creator can be assured that it was because of the random sample's average opinion (which, depending on the implementation, may also come with constructive feedback), and not because the creator got screwed by the pile-on lottery or "the algorithm".