There’s misinformation on Facebook. Here’s how you deal with it.

I’ll keep this short and to the point. There’s a sudden backlash on Facebook for hosting misinformation [1], and polar politics [2] after the recent elections in the USA. Is this new? NO.

Let me take you to back in time, to March 2014. The deeply tragic incident of the Malaysian Airlines Flight MH370 wiped off an entire aircraft and all on board [3]. A sea of prayers and solidarity followed on all social networks including Facebook. What also followed was a series of fake, misinformative posts, links, and videos claiming to show you the footage of the aircraft crashing [4], and rumors claiming that the plane had been found in the Bermuda triangle (see image of one such post below). Such footage never existed.

http://www.hoax-slayer.com/images/malaysia-airlines-MH370-scam-1.jpg

Following this incident, there have been a series of events where miscreants have exploited the context of a popular event to spread hoaxes, misinformation, rumors, fake news, etc. From the rumor of the death of comic actor Rowan Atkinson (a.k.a. Mr. Bean) to the suicide video by late legendary actor Robin Williams, misinformation has plagued Facebook for years, and is continuing to do so. While Facebook has recently acknowledged misinformation to be a serious problem, we at Precog had already started working on it when we first came across instances of misinformation. So how do you really deal with misinformation and rumors and hoaxes and fake news on Facebook?

There have been a few attempts to solve this problem. Facebook posted a series of blogs vowing to improve their algorithms to reduce misinformation, hoaxes, rumors, clickbaiting, etc. [8, 9, 10, 11, 12]. A recently conducted hackathon by Princeton University also witnessed a group of 4 students attempting to fix this problem [13]. Well, as it turns out, we took a dig at this problem over 2 years ago, and came up with a robust solution of our own. In August 2015, we publicly launched Facebook Inspector, a free, easy-to-use browser extension that identifies malicious content (including the type we just discussed above) in real time. At this moment, Facebook Inspector has over 200 daily active users, and has just crossed 5,000,000 hits (it’s 5 million; but it’s just fun to write it with so many zeros xD). We leveraged multiple crowd sourcing mechanisms to gather a pool of misinformative and other types of malicious posts, and harnessed them to generate a model to automatically identify misinformative posts, hoaxes, rumors, scams, etc.

Give it a try. Download the Chrome version at https://chrome.google.com/webstore/detail/facebook-inspector/jlhjfkmldnokgkhbhgbnmiejokohmlfc

Firefox users, download at https://addons.mozilla.org/en-US/firefox/addon/fbi-facebook-inspector/

To read the entire story behind the inception of the idea, and incarnation of Facebook Inspector, read the detailed technical report here.

So we spotted a problem a couple of years ago, took a dig at solving it (and I’d like to believe we succeeded), and apparently, the entire world is after Facebook for the same problem today. But misinformation, hoaxes, and rumors aren’t the only big problems that Facebook is surrounded by. Lets talk some more about the US elections. Facebook’s algorithms have been accused of reinforcing “political polarization” by Professor Filippo Menczer in a popular news article [2]. Apparently, Facebook is home to a big bunch of political groups which post polarized content to influence users towards / against certain political beliefs. Whether such content should be allowed on social networking websites, is debatable. After all, free speech is a thing! But the question that demands attention here is, did these politically polarized entities suddenly appear on Facebook around the election time? I mean, if they would’ve been around for long, Facebook would’ve known, right? And the effects of social network content on elections are well known and studied [5, 6, 7]. So Facebook would’ve definitely done something to at least nudge users when getting exposed to polarized political content. But polarized political content was never a point of concern for Facebook. So it probably didn’t exist until right before the elections. Right? Wrong!

Well, this is a literal “I told you so moment.” Last year, we conducted a large scale study of malicious Facebook pages, and one of our main findings was the dominant presence of politically polarized entities on Facebook among malicious pages. We analyzed the content posted by these politically polarized pages, and found that negative sentiment, anger, and religion dominated within such content. We reported our findings in the form of a technical report: https://arxiv.org/abs/1510.05828v1

It is good to know that what you work on, as part of research, connects closely to relevant, present day, real world problems, but it isn’t really a good feeling to realize that something you already knew could happen, happens anyway. We at Precog always push towards trying to make a difference and making the online world better and safer. We try our best, but we can only do so much.

To conclude, not bragging here (well, it’s not bragging if it’s true!), but we saw not one, but two real problems coming, more than a year before Facebook did.

You see, we’re called “Precog” for a reason. *mic drop*

References

[1] https://techcrunch.com/2016/11/10/facebook-admits-it-must-do-more-to-stop-the-spread-of-misinformation-on-its-platform/

[2] https://www.theguardian.com/technology/2016/nov/10/facebook-fake-news-election-conspiracy-theories

[3] https://en.wikipedia.org/wiki/Malaysia_Airlines_Flight_370

[4] https://www.scamwatch.gov.au/news/scammers-using-videos-of-malaysian-airlines-flight-mh370-to-spread-malware

[5] Williams, Christine B., and Girish J. Gulati. “Social networks in political campaigns: Facebook and the 2006 midterm elections.” annual meeting of the American Political Science Association. Vol. 1. No. 11. 2007.

[6] Williams, Christine B., and J. Girish. “Social networks in political campaigns: Facebook and the congressional elections of 2006 and 2008.” New Media & Society (2012): 1461444812457332.

[7] Douglas, Sara, et al. “Politics and young adults: the effects of Facebook on candidate evaluation.” Proceedings of the 15th Annual International Conference on Digital Government Research. ACM, 2014.

[8] https://newsroom.fb.com/news/2015/01/news-feed-fyi-showing-fewer-hoaxes/

[9] http://newsroom.fb.com/news/2016/08/news-feed-fyi-further-reducing-clickbait-in-feed/

[10] http://newsroom.fb.com/news/2014/11/news-feed-fyi-reducing-overly-promotional-page-posts-in-news-feed/

[11] http://newsroom.fb.com/news/2014/08/news-feed-fyi-click-baiting/

[12] http://newsroom.fb.com/news/2014/04/news-feed-fyi-cleaning-up-news-feed-spam/

[13] http://www.businessinsider.in/It-only-took-36-hours-for-these-students-to-solve-Facebooks-fake-news-problem/articleshow/55426656.cms

The complete picture: Visual Themes and Sentiment on Social Media for First Responders.



Researchers and academicians all over the world have conducted numerous studies and established that ​social media plays a vital role during crisis events. From citizens helping police to capture suspected terrorists Boston Marathon [5], to vigilant users spreading  situational awareness [6], OSNs have proved their mettle as a powerful platform for information dissemination during crisis.

Most of the aforementioned work has relied on textual content posted on OSNs to extract knowledge, and make inferences. Now the thing is, that online media is rapidly moving from text to visual media. With the prevalence of 3G, 4G technologies and high-bandwidth connectivity in most Internet enabled countries, images and videos are gaining much more traction than text. This is also natural, since the human brain is hardwired to recognize and make sense of visual information more efficiently [1]. Just using text to draw inferences from social media data is no longer enough. As we discussed in our previous blog, there is a significant percentage of social media posts which do not contain any text. Moreover, there’s also a large percentage of posts which contain both text, and images. The point to keep in mind here is, that images and text may be contradicting each other, even if they’re part of the same post. While text in Figure 1 inspires support and positive sentiment, the image (or more precisely, the text in the image) is pretty negative. This is what current research methodology is missing out on http://followersguru.net/.

Example of Facebook post

Figure 1. Example of a Facebook post with contradicting text and image sentiment.

Continuing our work on image and online social media, we​ decided to dig further into images posted on social networks, and see if images could aid first responders to get a more complete picture of the situation during a crisis event.​ We collected Facebook posts published during the attacks in Paris in November 2015, and performed large scale mining on the image content we captured. Typically, monitoring the popular topics and sentiment among the citizens can be of help to first responders. Timely identification of misinformation, sensitive topics, negative sentiment, etc. online can be really helpful in predicting and averting any potential implications in the real world.

​We were able to gather over 57,000 images using the #ParisAttacks and #PrayForParis hashtags put together, out of which, 15,123 images were unique. Analyzing such a big number of images manually is time consuming, and not scalable. So we utilized state-of-the-art techniques from the computer vision domain to automatically analyze images on a large scale. These techniques include Optical Character Recognition (OCR) [2], image classification, and image sentiment identification using Convolutional Neural Networks (CNNs). Figure 2 shows how a typical CNN model processes and classifies images [4].

Figure 2. Typical CNN model for object identification in images. Image taken from http://cs231n.github.io/convolutional-networks/

With all these “weapons”, we set out to mine the sea of images and see if we could discover something useful. And we struck gold right away. We used Google’s Inception-v3 model [3] for generating tags for images automatically, and looked at a few of the most popular tags. Interestingly, we found numerous instances of misinformative images, images containing potentially sensitive themes, and images promoting conspiracy theories among popular images. By the time we identified them, these images had gathered millions of likes, and hundreds of thousands of comments and shares. Some of these examples are listed below (Figure 3 – 6) at http://followersguru.net/buy-instagram-likes/.

Figure 3. Eiffel Tower turns off its lights for the first time in 63 years. This information was incorrect. Eiffel Tower’s lights are turned off every night between 1 am and 6 am following a ruling by the French Government.

Figure 4. Image incorrectly quoting the cause of death of Diesel, a police dog that helped the police during the attacks. The French Police later clarified that the actual cause of death was gunshot wounds from the French Police fleet itself, and not the suicide bomber.

Figure 5. Donald Trump’s insensitive tweet just after the Paris attacks. As the time stamp of the tweet suggests, this tweet was posted months ago, but resurfaced just after the attacks to defame the politician.

Figure 6. Picture claiming that a muslim guard named Zouheir stopped a suicide bomber from entering the Stade de France football stadium and saved thousands of lives. As later clarified by the security guard himself, such an incident never took place. Zouheir, the security guard was stationed at a different spot.

Applying OCR on the images in our dataset, we were able to extract text from about 55% of the images (31,869 out of 57,748 images). We wondered if this text embedded in images would be any different than the text that users post otherwise, in the orthodox manner. Upon analyzing and comparing the sentiment of image text and post text, we found that image text (extracted through OCR) was much more negative than post text (the orthodox text). In fact, not only was image text more negative, it was also different from post text in terms of topics being talked about. Table 1 shows a mutually exclusive subset of the most common words appearing in image text and post text. While post text was full of generic text offering prayers, support and solidarity, image text was found to mention some sensitive issues like “refugees”, “syria”, etc.

Top words in posts Top words in images
S. No. Word Normalized frequency Word Normalized frequency
1. retweeted 0.005572571 house 0.00452941
2. time 0.005208351 safety 0.004481122
3. prayers 0.005001407 washington 0.004297628
4. news 0.004713342 sisters 0.003940297
5. prayfortheworld 0.004431899 learned 0.003863036
6. life 0.004393821 mouth 0.003853378
7. let 0.004249789 stacy 0.003751974
8. support 0.004249789 passport 0.003708515
9. god 0.00401139 americans 0.003694028
10. war 0.003986557 refugee 0.00352502
11. thoughts 0.003882258 japan 0.002887619
12. need 0.003878946 texas 0.002781386
13. last 0.003797825 born 0.002689639
14. lives 0.003734914 dear 0.002689639
15. said 0.003468371 syrians 0.002607549
16. place 0.003468371 similar 0.002573748
17. country 0.003319372 deadly 0.002568919
18. city 0.003291227 services 0.002554433
19. everyone 0.003281294 accept 0.002554433
20. live 0.003274672 necessary 0.002549604
Table 1. Mutually exclusive set of 20 most frequently occurring
relevant keywords in post and image text, with their normalized
frequency. We identified some potentially sensitive topics among
image text, which were not present in post text. Word frequencies
are normalized independently by the total sum of frequencies of the
top 500 words in each class.
We also uncovered a popular conspiracy theory surrounding the Syrian “passports” that were found by French police near the bodies of terrorists who carried out the attacks, and were allegedly used to establish the identity of the attackers as Syrian citizens. Text embedded in images depicting this theme questioned how the passports could have survived the heat of the blasts and fire. This conspiracy theory was then used by miscreants to label the attacks as a false flag operation, influencing citizens to question the policies and motives of their own government. The popularity of such memes on OSN platforms can have undesirable outcomes in the real world, like protests and mass unrest. It is therefore vital for first responders to be able to identify such content and counter / control its flow to avoid repercussions in the real world.

Figure 7. Example of a picture containing text relating to a conspiracy theory questioning how the Syrian passports survived the blasts. We found hundreds of images talking about this topic in our dataset.

Images posted on OSNs are a critical source of information that can be useful for law and order organizations to understand popular topics and public sentiment, especially during crisis events. Through our approach, we propose a semi-automated methodology for mining knowledge from visual content and identifying popular themes and citizens’ pulse during crisis events. Although this methodology has its limitations, it can be very effective for producing high level summaries and reducing the search space for organizations with respect to content that may need attention. We also described how our methodology can be used for automatically identifying (potentially sensitive) misinformation spread through images during crisis events, which may lead to major implications in the real world.

Here is a link to the complete Technical report on this work. Big credits to Varun Bharadhwaj, Aditi Mithal, and Anshuman Suri for all their efforts. Below is an infographic of work.

References:

[1] https://www.eyeqinsights.com/power-visual-content-images-vs-text/

[2] https://github.com/tesseract-ocr/

[3] https://www.tensorflow.org/versions/r0.11/tutorials/image_recognition/index.html

[4] http://cs231n.github.io/convolutional-networks/

[5] Gupta, Aditi, Hemank Lamba, and Ponnurangam Kumaraguru. “$1.00 per rt# bostonmarathon# prayforboston: Analyzing fake content on twitter.” In eCrime Researchers Summit (eCRS), 2013, pp. 1-12. IEEE, 2013.

[6] Vieweg, Sarah, Amanda L. Hughes, Kate Starbird, and Leysia Palen. “Microblogging during two natural hazards events: what twitter may contribute to situational awareness.” In Proceedings of the SIGCHI conference on human factors in computing systems, pp. 1079-1088. ACM, 2010.