Automated Skeptics

Rethinking machine learning for misinformation at scale

Feb 07, 2023

Fact checking is a cornerstone of a healthy information ecosystem.

Before the internet, meaning before the proliferation of information became more or less decentralized, media organizations did their own fact checking. Print media started employing full-time fact checkers in the early 20th century as a way to prevent embarrassment, keep public trust, and prevent lawsuits. The process was slow and difficult, but still doable since newspapers and magazines did it all themselves. Print, and later radio and television, which for the most part at least attempted to make sure they got the basic facts right, were the means by which people got information.

The new media landscape is a completely different beast. In response to the rapid-fire stream of information which proliferates online much like the rumors you used to hear in middle school, independent fact checking organizations have become the norm. In many ways this is a great step forward – independence means a greater incentive towards accuracy instead of engagement. But the process is still slow, and as independent fact checking organization PolitiFact mentions, due to the sheer volume of information on the web they “can’t feasibly check all claims” so they only focus on the most relevant and news-worthy ones.

The go-to prescription nowadays for increasing the speed of cumbersome human processes is machine learning. Appropriately, the ML community has spent a lot of time looking into automating the fact checking process. The four main components of automatic fact checking – claim detection, evidence retrieval, verification, and ruling justification – have the potential to greatly speed up the fact checking process. In current practice, some have more utility than others, particularly tasks like finding claims to fact check, finding evidence, and detecting previously fact checked claims. These tasks are great for helping human fact checkers perform their jobs more quickly and easily. Where these systems fall short is in their ability to be distributed en masse, more or less fully autonomously, to debunk fake news in real time.

The reason lies in the problem of veracity – categorizing content into different shades of truthiness. Truth is a dubious concept even in the political domain, where context, framing, and subjectivity have an impact on what “truth” even means for an individual [1, 2]. Because of this, it has even been argued that for human fact checkers, multiple checkers should be used to ensure bias doesn’t seep into the fact check [2]. The idea of truth becomes even less concrete in domains like science, where the “facts” are essentially collections of evidence which are malleable in the presence of new evidence, and it may be better to ask “how consistent is this claim with the current evidence” as opposed to “is this claim true?” One can even go so far as to question whether or not objective facts exist in science.

Because of this, I think the full-fledged fact checking process is best left to humans (or at the very least, keeping humans in the loop) – especially for highly sensitive types of information (e.g. info related to what politicians have said or health related information, which recent elections and the pandemic have made abundantly clear). I don’t mean to say that there isn’t a place for automated fact checking as it currently stands – it clearly helps make fact checkers’ jobs easier. But even a decentralized fact-checking process aided by machine learning is concentrated around fact-checking organizations, who will by necessity only check what they deem as important, and though they can potentially check more claims with the help of ML, their fact checks still need to somehow be disseminated to the public.

Is there a different way that we can approach the misinformation problem that can be massively available to any and every user in real time?

Another way to pose this question is: what can make a person less vulnerable to misinformation? One angle to think of this is in terms of news media literacy. As described in [3]:

The purpose of media literacy is to help people to protect themselves from the potentially negative effects [of mass media]. The purpose of becoming more media literate is to gain greater control over influences in one’s life, particularly the constant influence from the mass media. This is not to say that all media literacy scholars believe the media exert a powerful effect on individuals. However, there seems to be a consensus that even weak and subtle media influences are important to consider, given the pervasive nature of media influence throughout our culture along with the high rates of exposure of all people to various forms of media habitually over the course of one’s life.

Increased news media literacy has many positive benefits: those with high news media literacy are less likely to accept low-quality information such as conspiracy theories, will acquire news from more trusted sources, and are less likely to post content online or engage with content on social media [4]. This has implications for the ability of misinformation to spread – if people are more news literate they are less prone to accepting, sharing, and engaging with bad information. So if more people become more news literate, the rate of spread of misinfo should in theory decrease.

Its been demonstrated that skepticism is negatively associated with acceptance of media content, as well positively associated with news media literacy [5]. The reverse has also been shown; lack of news literacy is associated with a lack of skepticism [4]. It makes sense that with an attitude of questioning what one sees, there’s more friction between reading a sketchy article and hitting the share button.

I believe this is an avenue from which machine learning can help. By building systems which are “automated skeptics”, we can potentially deploy them at scale to serve as something like information co-pilots. We don’t need systems which flat out say when information is incorrect (though this could be one component of it by e.g. detecting previously fact checked claims), but which nudge the user to think about what it is they are seeing as opposed to unquestioningly absorbing it. Similar to having a highly news literate and skeptical friend at our side while we navigate the internet, we could have systems which are really good at saying “hmmm this article you’re reading is pretty suspicious for reason 1), 2), 3)....”. Such a system could then help users to be just a bit more skeptical about what they see and foster greater news literacy.

Such a system can be built in directly to a browser or deployed by social media companies to make their content safer. This could be similar to how browsers warn users when they attempt to navigate to unprotected websites or how Twitter will nudge users to read an article before sharing it. These types of soft interventions have been shown to have great potential in preventing the spread of misinfo [7].

I think these systems can be a lot safer as well. It is one thing to build a system which predicts if something is categorically false, which has a high cost in terms of trust for being incorrect, vs. a system which predicts how trustworthy a piece of content is. Such a system isn’t sending an authoritative message, which inherently needs to be correct to maintain trust and safety, but one of varying levels of skepticism with reasons for that skepticism, which can trigger a user to think about and question what they consume.

This “automated skepticism” would need to be fine-tuned to be appropriately skeptical and contextualized. From [5]:

...the possibility that news media skepticism may be ill-informed – one can be skeptical of news based on faulty assumptions or lack of knowledge about news gathering practices, for example – means that media skepticism alone would be a poor indicator of news media literacy.

A well-behaving system wouldn’t be wantonly skeptical about every piece of information, and would include contextual information, such as information about the content authors and evidence used, to help improve the user’s media literacy.

One component of this is establishing who to trust and on what topics. This is particularly important in science, where experts are actually needed in order to accurately disseminate and contextualize scientific findings [6]. This is largely due to the inherent uncertainty of new scientific developments, the caveats and nuances of which are best left to be understood by experts. So an automated skeptic should have some knowledge of an author's expertise on whatever topic they are writing about, and deliver its intervention accordingly, though of course even the most well established expert acting in good faith can get it wrong sometimes. Essentially: is the author of a piece of content actually an authority on the subject matter of that content?

In practice, an automated skeptic could help by acting as a kind of educator and assistant, suggesting to the user what and whom to trust. This is similar to what has been called for in the context of science education to help people become more savvy about science news (from [6]):

Education should, therefore, aim to make us ‘competent outsiders’ to professional science. In such a context, then, the question for the competent outsider is, can these claims to know [sic] be trusted? In short, is this information, and those who assert it, credible?

I’m a big fan of the idea of enabling people to become “competent outsiders.” Machine learning has the potential to do this at scale by providing small interventions indicating who and what to trust given an author and a piece of content.

A possible meta-problem arises here: how does one establish trust in an automated skeptic? At the very least, such a system absolutely should provide explanations for any predictions (which is of course the legal precedent in EU countries for high-risk systems). I would also argue that such a system would need to be developed and rolled out conservatively – there’s a potential high cost if such a system could be used to falsely discredit reputable people or justify disreputable ones.

Without being able to question what one sees, its easy to be influenced by misinformation. I see an opportunity here to use ML to build systems which give sensible nudges to help us navigate the internet safely.

References

[1] Coleman, Stephen. "The elusiveness of political truth: From the conceit of objectivity to intersubjective judgement." European Journal of Communication 33.2 (2018): 157-171.

[2] Ceci Stephen J. “The Psychology of Fact-Checking.” Scientific American. (2020, October 25)

[3] Potter, W. James. "The state of media literacy." Journal of broadcasting & electronic media 54.4 (2010): 675-696.

[4] Vraga, Emily K., and Melissa Tully. "News literacy, social media behaviors, and skepticism toward information on social media." Information, Communication & Society 24.2 (2021): 150-166.

[5] Maksl, Adam, Seth Ashley, and Stephanie Craft. "Measuring news media literacy." Journal of Media Literacy Education 6.3 (2015): 29-45.

[6] Osborne, Jonathan, et al. "Science education in an age of misinformation." (2022).

[7] Kaiser, Ben, et al. "Adapting Security Warnings to Counter Online Disinformation." USENIX Security Symposium. 2021.

Calibrating Uncertainty

Discussion about this post