sorv

sample of randomized voters

Home

About

Understanding the "pile-on lottery"

Advantages of the sorv system

How to implement scalable fact-checking

Why transparency at the algorithm level is not enough

Using sorv to fight social-media-induced depression

With sorv, government wouldn't need special privileges to report rules violations

Beyond fact-checking - using sorv to handle other types of misleading posts

Suppose we succeed beyond our wildest dreams in getting X/Twitter (for example) to implement sorv for their Community Notes system as described here), so that for the first time ever, any account with a medium-or-bigger following that publishes a provable falsehood will probably see a Community Note attached to it, possibly within a few minutes after it being published.

One consequence is that posters might now to evade Community Note labels by avoiding easily provable lies, but deceiving users in other ways (for example, publishing statements without a source, so that the burden is on the Community Notes volunteer to disprove it).

To mitigate this, X/Twitter could expand Community Notes criteria to include other types of posts that lower the quality of the discourse:

  • Statements where a source should obviously exist, without providing one. Under the current rules, a statement can only have a Community Note attached if you can show that it's false. This leads to asymmetric warfare, since ss long as the burden of proof is on the person submitting the Community Note to show that the tweet is false, a deliberate liar can create false tweets much faster than volunteers can disprove them.

    For example, on September 2, 2024, Twitter user @akafacehots sent a tweeted to their 370,000 followers:

    BREAKING: CNN just released their fact check of Kamala Harris' speech at the DNC Convention. It turns out Kamala Harris lied 113 times within her 37 minute speech. Retweet so all Americans see this vital fact check on Kamala Harris.
    Several people (including me) posted replies saying this was false, but none of the replies got close to the 4.6 million views for the original false tweet. And it still took me a few minutes of research just to post a reply saying it was false -- and the only reason it was that easy, was because the author of the original tweet decided to use the correct duration (37 minutes) for Harris's speech, which made it easy to Google CNN's article about it. Even if that only took a few minutes of work, that's still much longer than the 20 seconds or so that it would have taken the author to write the original hoax tweet. (That tweet did eventually get Community Noted, but not until after millions of people had already seen it.)

    Compare that to the situation where Community Notes rules say that if you make a statement that should obviously have a source (e.g. a claim about a CNN article), you have to link to the source. Then as soon as a fact-checker sees the hoax tweet, they just click a button to report: Makes a claim without a source, boom.

  • Statements that are not literally false but are at least partly negated by additional information. One thing that I think Twitter got exactly right is that it's Community Notes labels are applied with the words "Readers added context:". Sometimes the "context" is literally calling out the tweet as false, but other times, the tweet is not literally false but the additional context would cause most people to change their perception of the statement.

    On September 25 and October 3, 2024, content creator Benny Johnson tweeted out videos of Trump-supporting crowds on college campuses with slightly differently worded captions, both ending with, "Trump is winning the youth vote." The videos are almost certainly "real". On the other hand, an October poll of college students showed Harris beating Trump by among college voters, 57% to 19%.

    Now perhaps most people wouldn't be silly enough to think that the conclusion "Trump is winning the youth vote" follows logically from a video of a crowd. But misinformation like this, even if it's easily debunkable, can be part of an information bubble that isolates the user ever further from the truth, and a simple fact-check label can mitigate that.
  • Plausibly deniable non-endorsement endorsements of a viewpoint. Radio host Larry Elder once took a call from a vaccine skeptic who said, among other things, that Bill Gates knew the COVID-19 vaccine was harmful but was pushing it on minorities in the name of "population control". Elder posted it to his YouTube channel with the title "You'll want to hear this physician’s take on the vaccines." (Elder is widely known to have conservative positions, but is not usually associated with views as fringe as this.)

    But the wording "You'll want to hear..." is weaselly enough that a literalist could claim he never actually said he agreed, he was just saying "You will want to hear this, and then make up your own mind!" To prevent people from evading a fact-checking penalty with these kinds of word games, the rules can simply say that if a reasonable person would interpret a phrase as an endorsement, then it counts as an endorsement.

  • Accounts promoting only stories about crimes committed by a specific demographic. This applies to accounts like @MrAndyNgo and @LibsOfTikTok which overwhelmingly promote news stories about crimes committed by immigrants and/or gay and trans people. Some of their claims are blatantly false (see this post from @LibsOfTikTok owner @ChayaRaichik10) but a lot of them are true stories that, collectively, still present an overwhelmingly false narrative, by disproportionately blaming crime on the targeted groups. (In real life, most studies have found that undocumented immigrants commit less violent crime than people born in the U.S.). In their own way, these highly skewed selections of stories are just as deceptive as outright lies.

    This could be mitigated by simply amending Community Notes policy to allow Notes in those situations -- a note which says, "This is a true story, but this account exists to publicize crimes or incidents involving certain demographics, which has the effect of misleading the audience." This also means the Notes can stop being applied if the account ever starts focusing on other types of content. (Theoretically, the Notes could also stop being applied if the account starts publishing accounts of crimes committed by all demographics, but I've never seen that happen, probably because a feed of stories about regular crimes would be boring to almost everyone without the racism or transphobia angle.)

    Alternatively, the system could be modified to apply a Community-Note-type warning to the account itself (which would then be displayed on all tweets from that account) -- essentially a "credibility banishment", at least to people who give less credibility to tweets with Community Notes on them. And it can be revoked if the account ever changes their practices. But this would require not just a policy change in Community Notes but also a code change in Twitter, to allow a fact-check warning to be applied to an entire account.

The goal is not to catch every type of deceptive post that a user might try to make after the Community Notes rules have been tightened. The goal is to create an environment where crafting a dishonest statement takes enough effort that it's actually less trouble to just go ahead and write a good-faith argument in the first place.