Killed it with a #Killfie: Journey from an Idea to a Global Media Phenomenon

31,000+ likes, 34,000+ shares, 1,000+ Tweets!

Most research goes through some natural phases; formulating the problem statement, collecting and analyzing data, submitting a research paper to a conference, writing a technical report, and then hoping the paper will get accepted at the conference and the work will be appreciated/acknowledged by the community (happily ever after!). I had never imagined that one such research topic, which went through some initial natural phases, will take such an interesting turn at some point and receive such an overwhelming amount of attention!

A lot has been said and written about our recent work (you can infer that from the title, and see ‘Who is talking about this research’) both in the technical community and press. I want to share my behind-the-scenes experience of going through this amazing phase of research – when it gets hard to count the number of mentions about your work returned by a quick google search! A news article about someone dying just after taking a selfie was posted on the Precog mailing list on June 2, 2016. Definitely not a conventional cause of death, this disturbing news made some members of the group to dig into the what, how and why of selfie deaths around the world. It was just a small idea that we started working on, discussions trickled, and some compelling observations followed. All culminated into a well written paper, submitted to a conference and the technical report going online on Arxiv on 7th November.

The report was first picked up by Sun UK news and some twitter handles like VickiTurk  on 9th November and what followed was a whirlwind of news articles and technology blogs across the globe, and across all media. It had become a sensation! It seemed to have touched all time zones from California (GMT-8) to New Zealand (GMT+13). The news buzz peaked on 18 November when three of us, Hemank Lamba, Megha Arora and myself, went on a spree of giving interviews. We had news reporters wanting answers over email, phone and skype, following up with us through the day. Here is what 18th November entailed for all three of us:

  • 0700 hrs IST: Call with CBC Canada, me sitting in IIT Kharagpur guest house and Hemank and Megha taking the call from Pittsburgh
  • 1000 hrs IST: Call with BBC UK Radio, I was taking it alone from in IIT Kharagpur, CSE department
  • 18 00 hrs EST: BBC World TV News, Hemank took this alone from Pittsburgh
  • 2130 hrs IST: NBC US, I took the call during my transit from Delhi Airport to IIITD
  • 2330 hrs IST: CMU, Hemank and Megha sitting in Gates building in CMU and I was at home in Delhi

This does not include the 25+ unique emails that we probably sent out answering questions or fixing timeslots for more interviews. While 3 of us were engrossed in this craziness, Mayank Vachher, Varun Bharadhwaj, and Divyansh Agarwal had their hands full providing backend support, collating the hits that we were getting in news and social media, and getting more specific insights from the data which the reporters were interested in. Three of them played an integral role in ensuring that we had a smooth run. In the meantime, we also had our group meetings to discuss the feedback that we are getting from people around the world.

The news about this research has been spreading across many newspapers, and online social networks like Facebook, Twitter, and Youtube. As of this moment, the following numbers summarize the traction garnered by this research:

  • Total Articles written (unique ones): 160
  • Total Facebook posts: 100+
  • Total Facebook likes: 32,108
  • Total Facebook shares (shares of the articles + posts): 33,937
  • Total Facebook comments: 2,795
  • Total Twitter tweets: 1000+
  • Total Twitter RTs (of all the above tweets): 1075
  • Total videos created on the project: 15
  • Radio interviews: 11
  • TV interviews: 2
  • Total number of requests for the dataset: 6

Below is a tag cloud capturing all the major news agencies which featured our work and the work was featured in 17 different languages.

Lessons learned through this media frenzy:

  • For a research to get popular, the topic has to be relevant to ‘people’
  • Reporters ask interesting ‘research’ questions, be prepared
  • Sociological/psychological studies around ‘who’ and ‘why’ of the research are important
  • Feedback from people is helpful in identifying potential issues in the research
  • Having captivating titles for the paper helps

Below is an infographic capturing the research work.

For those interested in knowing more about this research, here are some useful links:

There’s misinformation on Facebook. Here’s how you deal with it.

I’ll keep this short and to the point. There’s a sudden backlash on Facebook for hosting misinformation [1], and polar politics [2] after the recent elections in the USA. Is this new? NO.

Let me take you to back in time, to March 2014. The deeply tragic incident of the Malaysian Airlines Flight MH370 wiped off an entire aircraft and all on board [3]. A sea of prayers and solidarity followed on all social networks including Facebook. What also followed was a series of fake, misinformative posts, links, and videos claiming to show you the footage of the aircraft crashing [4], and rumors claiming that the plane had been found in the Bermuda triangle (see image of one such post below). Such footage never existed.

http://www.hoax-slayer.com/images/malaysia-airlines-MH370-scam-1.jpg

Following this incident, there have been a series of events where miscreants have exploited the context of a popular event to spread hoaxes, misinformation, rumors, fake news, etc. From the rumor of the death of comic actor Rowan Atkinson (a.k.a. Mr. Bean) to the suicide video by late legendary actor Robin Williams, misinformation has plagued Facebook for years, and is continuing to do so. While Facebook has recently acknowledged misinformation to be a serious problem, we at Precog had already started working on it when we first came across instances of misinformation. So how do you really deal with misinformation and rumors and hoaxes and fake news on Facebook?

There have been a few attempts to solve this problem. Facebook posted a series of blogs vowing to improve their algorithms to reduce misinformation, hoaxes, rumors, clickbaiting, etc. [8, 9, 10, 11, 12]. A recently conducted hackathon by Princeton University also witnessed a group of 4 students attempting to fix this problem [13]. Well, as it turns out, we took a dig at this problem over 2 years ago, and came up with a robust solution of our own. In August 2015, we publicly launched Facebook Inspector, a free, easy-to-use browser extension that identifies malicious content (including the type we just discussed above) in real time. At this moment, Facebook Inspector has over 200 daily active users, and has just crossed 5,000,000 hits (it’s 5 million; but it’s just fun to write it with so many zeros xD). We leveraged multiple crowd sourcing mechanisms to gather a pool of misinformative and other types of malicious posts, and harnessed them to generate a model to automatically identify misinformative posts, hoaxes, rumors, scams, etc.

Give it a try. Download the Chrome version at https://chrome.google.com/webstore/detail/facebook-inspector/jlhjfkmldnokgkhbhgbnmiejokohmlfc

Firefox users, download at https://addons.mozilla.org/en-US/firefox/addon/fbi-facebook-inspector/

To read the entire story behind the inception of the idea, and incarnation of Facebook Inspector, read the detailed technical report here.

So we spotted a problem a couple of years ago, took a dig at solving it (and I’d like to believe we succeeded), and apparently, the entire world is after Facebook for the same problem today. But misinformation, hoaxes, and rumors aren’t the only big problems that Facebook is surrounded by. Lets talk some more about the US elections. Facebook’s algorithms have been accused of reinforcing “political polarization” by Professor Filippo Menczer in a popular news article [2]. Apparently, Facebook is home to a big bunch of political groups which post polarized content to influence users towards / against certain political beliefs. Whether such content should be allowed on social networking websites, is debatable. After all, free speech is a thing! But the question that demands attention here is, did these politically polarized entities suddenly appear on Facebook around the election time? I mean, if they would’ve been around for long, Facebook would’ve known, right? And the effects of social network content on elections are well known and studied [5, 6, 7]. So Facebook would’ve definitely done something to at least nudge users when getting exposed to polarized political content. But polarized political content was never a point of concern for Facebook. So it probably didn’t exist until right before the elections. Right? Wrong!

Well, this is a literal “I told you so moment.” Last year, we conducted a large scale study of malicious Facebook pages, and one of our main findings was the dominant presence of politically polarized entities on Facebook among malicious pages. We analyzed the content posted by these politically polarized pages, and found that negative sentiment, anger, and religion dominated within such content. We reported our findings in the form of a technical report: https://arxiv.org/abs/1510.05828v1

It is good to know that what you work on, as part of research, connects closely to relevant, present day, real world problems, but it isn’t really a good feeling to realize that something you already knew could happen, happens anyway. We at Precog always push towards trying to make a difference and making the online world better and safer. We try our best, but we can only do so much.

To conclude, not bragging here (well, it’s not bragging if it’s true!), but we saw not one, but two real problems coming, more than a year before Facebook did.

You see, we’re called “Precog” for a reason. *mic drop*

References

[1] https://techcrunch.com/2016/11/10/facebook-admits-it-must-do-more-to-stop-the-spread-of-misinformation-on-its-platform/

[2] https://www.theguardian.com/technology/2016/nov/10/facebook-fake-news-election-conspiracy-theories

[3] https://en.wikipedia.org/wiki/Malaysia_Airlines_Flight_370

[4] https://www.scamwatch.gov.au/news/scammers-using-videos-of-malaysian-airlines-flight-mh370-to-spread-malware

[5] Williams, Christine B., and Girish J. Gulati. “Social networks in political campaigns: Facebook and the 2006 midterm elections.” annual meeting of the American Political Science Association. Vol. 1. No. 11. 2007.

[6] Williams, Christine B., and J. Girish. “Social networks in political campaigns: Facebook and the congressional elections of 2006 and 2008.” New Media & Society (2012): 1461444812457332.

[7] Douglas, Sara, et al. “Politics and young adults: the effects of Facebook on candidate evaluation.” Proceedings of the 15th Annual International Conference on Digital Government Research. ACM, 2014.

[8] https://newsroom.fb.com/news/2015/01/news-feed-fyi-showing-fewer-hoaxes/

[9] http://newsroom.fb.com/news/2016/08/news-feed-fyi-further-reducing-clickbait-in-feed/

[10] http://newsroom.fb.com/news/2014/11/news-feed-fyi-reducing-overly-promotional-page-posts-in-news-feed/

[11] http://newsroom.fb.com/news/2014/08/news-feed-fyi-click-baiting/

[12] http://newsroom.fb.com/news/2014/04/news-feed-fyi-cleaning-up-news-feed-spam/

[13] http://www.businessinsider.in/It-only-took-36-hours-for-these-students-to-solve-Facebooks-fake-news-problem/articleshow/55426656.cms

Teaching #PSOSMonNPTEL in a country of a billion: Experiences and take aways

Recently finished teaching my first course on NPTEL (National Program on Technology Enhanced Learning). NPTEL is like a Coursera of India. It is a joint initiative of the Indian Institute of Science (IISc) and the Indian Institute of Technology (IITs) and is managed by faculty from IIT, Madras.

I taught my signature course Privacy and Security in Online Social Media (PSOSM). The course was assigned noc16-cs07 number. I have taught this course previously at IIITD (CSE648, 4 times) and at Federal University of Minas Gerais (UFMG), Brazil (2 times). Below is the flier and here is the teaser video we created and used for the promotion of the course. The registration started on May 1 and went till July 15, by the end of this deadline, I had about 2200 registrations, but that number went up manifold when the registration date was extended by a couple of days. All efforts in promoting the course paid well, I had 5250+ students signed up for the course.

I had four amazing TAs assisting me on this course, all being my own Ph.D. students. Anupama Aggarwal, Prateek Dewan, Srishti Gupta and Niharika Sachdeva. They not only helped with tutorials, quizzes and tests but also functioned as tech support throughout the course. Special thanks to Prateek who took care of editing the videos and responding to the mailing list (there were even emails to prateek, referring him as faculty of the course!) and Niharika for managing the entire NPTEL portal.

I was getting mentally prepared for spending more time in preparing for this course, but it took way more time than what I had foreseen. It was my first time using Camtasia for recording lectures. Previously I have had my lectures recorded while I taught physically in a class. It feels very natural teaching a class full of curious students, interacting with them, asking / answering questions, but it is quite a different feeling teaching a class consisting of only one laptop and that too in your own office!

After some initial teething problems with recording and uploading, the course went on smoothly. As of writing this blog, I have 23,000 views on all the lecture videos that we uploaded as part of the course. Apart from videos, I also had one AMA (Ask Me Anything) session and one physical meeting at IIITD, where students could ask questions, clarify doubts or share their concerns directly with me and the TAs.

What the students felt, I will share later in this blog but personally it was a very satisfying experience. Many students all over India got to know me. Students from many smaller towns have taken this course. I received emails from college principals from tier II, tier III cities saying they had made this course a part of their curriculum and they have their best students taking this course. In my opinion, this is the biggest advantage of such online courses, it breaks geographical barriers and makes quality education and knowledge accessible to a larger audience. Out of the 5250 students, 152 students registered for the final exam and appeared for the exam; students have to pay some nominal fee to take this exam. I was super excited to have so many students pay for the course and take the exam.

NPTEL maintains a mailing list of all students registered for the course and that acts as a good medium for all of us; faculty, TAs and students to interact on a regular basis. This is where I was told that I have an American accent when I speak L or that in some videos my voice was very feeble. Also, as a practice, NPTEL requests students to fill a feedback form and shares the feedback with the faculty teaching the course and students also sent some through the mailing list. It feels very heartening to see some comments from the students and I take this opportunity to thank and congratulate them for their time and effort in finishing the course and giving a constructive feedback on the course. Some comments:

  • “Thanks for giving me a sense of satisfaction of doing a course.“
  • “Thanks a million to the whole Team. One of the best online course I ever had. There were days when I started posting queries at 10PM in the forum and TA’s helped me till I get what I wanted, some of the discussions went on till 1AM too. This shows how dedicated the team is!.”
  • “Feedback for an awesome course like this is really worth. Thank you PK sir for opening up such a treasure of knowledge. The best part of the course and it actually made the course different was the meet up at IIITD and also the hangouts session. The tutorials are really nicely presented and challenging for us.”
  • “I have gone through couple of other NPTEL certifications in recent years but this one was the best I would say…. Special thanks to Dr. PK. He was very interactive and an enthusiast. “
  • “firstly i am happy for taking this course, i did well in exam and very very thanks to all.. teaching faculty.. all teaching faculty did beyond the expectations..now i realise what are the skills  i have..  and thank you PK sir..and  lastly i say thank u NPTEL team.”
  • “5/5… Thank you IIIT-D, PK sir and the awesome TAs.

Below is the certificate that my TAs got for helping with the course.

Lessons learned / suggestions for doing a good job with teaching on NPTEL:

  • Prepare the lectures and record it before-hand (well before the date of uploading)
  • Have wonderful TAs, they are the secret for success!
  • Try to have Ask Me Anything or physical meeting sessions at least a couple of times
  • Keep the mailing list very active
  • If you are teaching a course that you teach otherwise in campus, please be aware that the students taking the course are not so well equipped compared to students in your class in campus.

I would definitely love to teach a course on NPTEL again! Until then goodbye to the NPTEL community!

Me, Myself and My Killfie: Characterizing and Preventing Selfie Deaths

Authors: Hemank Lamba, Varun Bharadhwaj, Mayank Vachher, Divyansh Agarwal, Megha Arora, Ponnurangam Kumaraguru

Our world is becoming smaller with time, bringing us closer and bestowing upon us a number of avenues to easily showcase ourselves in any manner we want. Perhaps the biggest facilitating agent in this regard, is Online Social Media (OSM). In a way, OSM replicates our world, with friends, interactions and constant information exchange. The world of OSM seems to have developed an interesting currency of its own too – LIKES and COMMENTS, the dollars and cents of the virtual realm; something which everyone aspires to have in abundance.

We are also familiar with the popular “selfie” phenomenon. Recognized as the “word of the year” by Oxford dictionary in 2013, the “selfie” is defined as a “photograph taken of oneself, and uploaded to a social media website.”  In recent years, there has been a sharp increase in the number of selfies posted on OSM. However, one particularly disturbing trend that has emerged lately is that of clicking dangerous selfies; proving to be so disastrous that during the year 2015 alone, there have been more deaths caused due to selfies than shark attacks all over the world [1]. Figure 1 shows examples of such selfies taken moments before the fatal incident. A selfie-related death can be defined as a death of an individual or group of people that could have been avoided had the individual(s) not been taking a selfie.

The level of threat that adventurous selfie taking behaviour exposes people to, is being acknowledged slowly by governments as well. Russian authorities came up with a public awareness campaign to enlighten citizens of the hazardous implications of taking selfies [2]. Similarly, Mumbai police recently classified 16 zones across the city as No-Selfie zones, after a rise in the number of selfie casualties [3].

The reason for this outrageous trend of dangerous selfies becomes clear when we combine the thoughts above. Since the advent of online social networks, people have developed an insatiable urge to be the most “popular” in their community. In medicinal terms, this has been long compared to forms of narcissism and in relation to selfies, termed as Selfitis [4,5,6]. This becomes the prime reason why people resort to performing risky feats while taking a selfie to garner more appreciation in the form of likes and comments from their friends online.

We, at Precog@IIITD chose to analyse the issue from a technical perspective and to dive deeper into what characterizes a selfie casualty/death, what kind of information we can extract from selfie images and how selfie casualties can be prevented.

Over the past two years, we found that a total of 127 deaths have been reported to be caused due to selfies, of which a whopping 76 deaths occurred in India alone! [7] Table 1 shows the country-wise distribution of selfie casualties across the world. The reasons for these selfie casualties were found to broadly belong to the following categories (Figure 2) at https://views.guru/:

  • Height Related – Selfie casualties caused due to people falling from an elevated location. [8]

  • Water Related – Selfie casualties caused due to drowning. [9]

  • Height and Water Related – Selfie casualties involving falling from elevated locations into a water body. [10]

  • Vehicle/Road Related– Selfie casualties caused due to vehicle accidents. [11]

  • Train Related– Selfie casualties caused due to being hit by a train.[12]

  • Weapons Related– Selfie casualties caused due to accidental firing of a weapon.[13]

  • Animal Related– Selfie casualties caused due to attack by an animal while taking the selfie with or near the animal.[14]

  • Electricity Related- Selfie casualties caused due to electrocution from live wires.[15]

Figure 2: (a) Number of Deaths and (b) Number of Incidents due to various reasons

Using a collective dataset of 138,496 tweets collected between August and September 2016, we implemented a three-fold architecture based on Image features, Location features, and Text features to quantify the danger level of selfies in our dataset.  Our machine learning model takes into account a variety of features to identify dangerous selfies along with their potential risks, and analyses common characteristics in these images. These features are supplied to four different classifiers with similar parameters to avoid bias in the results. Table 2 shows the sets of features we used for each feature type.

Table 2: Location-Based, Image-Based and Text-Based features used for classification of selfies

After thorough analysis, we found that the image-based features are the best indicators that accurately capture the dangerous nature of a selfie, in comparison to other feature-types. This seems logical as image features attempt to infer meaning directly out of the image, in a sense replicating our visual senses. Our model resulted in an accuracy of 73.6% for the task of identifying a dangerous selfie.

To further capture the risk type of a dangerous selfie, we used specific features that were relevant only to a particular risk type and supplied the data to our classifier. In particular, we concentrated on singling out dangerous selfies that belonged to height, water and vehicle related risks. We found that the set of features performing the best for this task was a combination of all 3 feature types – Image, Location and Text based features, and the best accuracy was obtained on the Water-related features. With remarkable accuracy, we have been able to establish a method to identify and capture the “danger level” of a selfie along with its risk type.

With the growing trend of dangerous selfies, it becomes important to spread awareness of the inherent hazards associated with people risking their lives simply for the sake of recognition on a virtual forum. As Shakespeare coins it, this type of “Bubble Reputation” induced by a dangerous selfie posted on OSM has claimed multiple lives lately. This work is a small contribution towards making the world safer, by making the people aware.

Our full report / paper on this work. You can access the portal and our dataset here.

References:

[1] http://www.telegraph.co.uk/technology/11881900/More-people-have-died-by-taking-selfies-this-year-than-by-shark-attacks.html

[2] https://www.theguardian.com/world/2015/jul/07/a-selfie-with-a-weapon-kills-russia-launches-safe-selfie-campaign

[3] http://metro.co.uk/2016/02/25/mumbai-orders-selfie-ban-after-19-people-die-5716731/

[4] S. Bhogesha, J. R. John, and S. Tripathy. Death in a flash: selfie and the lack of self-awareness. Journal of Travel Medicine, 23(4):taw033, 2016

[5] B. Subrahmanyam, K. S. Rao, R. Sivakumar, and G. C. Sekhar. Selfie related deaths perils of newer technologies. Narayana Medical Journal, 5(1):52–56, 2016.

[6] A. LAKSHMI. The selfie culture: Narcissism or counter hegemony? Journal of Communication and media Studies (JCMS), 5:2278–4942, 2015

[7] http://labs.precog.iiitd.edu.in/killfie/analysis

[8] http://www.telegraph.co.uk/news/2016/07/01/german-tourist-plunges-to-his-death-while-posing-for-picture-at/

[9] http://www.thenewsminute.com/article/selfie-deaths-two-men-drown-karnataka-couple-washed-away-tn-46735

[10] http://www.ndtv.com/cities/teenager-drowns-while-clicking-selfie-friend-dies-trying-to-save-him-1277217

[11] http://www.independent.co.uk/news/world/americas/selfie-crash-death-woman-dies-in-head-on-collision-seconds-after-uploading-pictures-of-herself-and-9293694.html

[12] http://timesofindia.indiatimes.com/city/varanasi/2-killed-while-taking-selfie-on-railway-tracks/articleshow/51850194.cms

[13] http://www.aljazeera.com/news/2015/07/russia-launches-safe-selfie-guide-light-deaths-150707132204704.html

[14] http://www.radar.ng/2016/04/elephant-tramples-boy-to-death-while.html?utm_source=nnd&utm_medium=twitter&utm_campaign=nnd

[15] http://www.thelocal.es/20140318/young-man-dies-in-train-selfie-fail

The complete picture: Visual Themes and Sentiment on Social Media for First Responders.



Researchers and academicians all over the world have conducted numerous studies and established that ​social media plays a vital role during crisis events. From citizens helping police to capture suspected terrorists Boston Marathon [5], to vigilant users spreading  situational awareness [6], OSNs have proved their mettle as a powerful platform for information dissemination during crisis.

Most of the aforementioned work has relied on textual content posted on OSNs to extract knowledge, and make inferences. Now the thing is, that online media is rapidly moving from text to visual media. With the prevalence of 3G, 4G technologies and high-bandwidth connectivity in most Internet enabled countries, images and videos are gaining much more traction than text. This is also natural, since the human brain is hardwired to recognize and make sense of visual information more efficiently [1]. Just using text to draw inferences from social media data is no longer enough. As we discussed in our previous blog, there is a significant percentage of social media posts which do not contain any text. Moreover, there’s also a large percentage of posts which contain both text, and images. The point to keep in mind here is, that images and text may be contradicting each other, even if they’re part of the same post. While text in Figure 1 inspires support and positive sentiment, the image (or more precisely, the text in the image) is pretty negative. This is what current research methodology is missing out on http://followersguru.net/.

Example of Facebook post

Figure 1. Example of a Facebook post with contradicting text and image sentiment.

Continuing our work on image and online social media, we​ decided to dig further into images posted on social networks, and see if images could aid first responders to get a more complete picture of the situation during a crisis event.​ We collected Facebook posts published during the attacks in Paris in November 2015, and performed large scale mining on the image content we captured. Typically, monitoring the popular topics and sentiment among the citizens can be of help to first responders. Timely identification of misinformation, sensitive topics, negative sentiment, etc. online can be really helpful in predicting and averting any potential implications in the real world.

​We were able to gather over 57,000 images using the #ParisAttacks and #PrayForParis hashtags put together, out of which, 15,123 images were unique. Analyzing such a big number of images manually is time consuming, and not scalable. So we utilized state-of-the-art techniques from the computer vision domain to automatically analyze images on a large scale. These techniques include Optical Character Recognition (OCR) [2], image classification, and image sentiment identification using Convolutional Neural Networks (CNNs). Figure 2 shows how a typical CNN model processes and classifies images [4].

Figure 2. Typical CNN model for object identification in images. Image taken from http://cs231n.github.io/convolutional-networks/

With all these “weapons”, we set out to mine the sea of images and see if we could discover something useful. And we struck gold right away. We used Google’s Inception-v3 model [3] for generating tags for images automatically, and looked at a few of the most popular tags. Interestingly, we found numerous instances of misinformative images, images containing potentially sensitive themes, and images promoting conspiracy theories among popular images. By the time we identified them, these images had gathered millions of likes, and hundreds of thousands of comments and shares. Some of these examples are listed below (Figure 3 – 6) at http://followersguru.net/buy-instagram-likes/.

Figure 3. Eiffel Tower turns off its lights for the first time in 63 years. This information was incorrect. Eiffel Tower’s lights are turned off every night between 1 am and 6 am following a ruling by the French Government.

Figure 4. Image incorrectly quoting the cause of death of Diesel, a police dog that helped the police during the attacks. The French Police later clarified that the actual cause of death was gunshot wounds from the French Police fleet itself, and not the suicide bomber.

Figure 5. Donald Trump’s insensitive tweet just after the Paris attacks. As the time stamp of the tweet suggests, this tweet was posted months ago, but resurfaced just after the attacks to defame the politician.

Figure 6. Picture claiming that a muslim guard named Zouheir stopped a suicide bomber from entering the Stade de France football stadium and saved thousands of lives. As later clarified by the security guard himself, such an incident never took place. Zouheir, the security guard was stationed at a different spot.

Applying OCR on the images in our dataset, we were able to extract text from about 55% of the images (31,869 out of 57,748 images). We wondered if this text embedded in images would be any different than the text that users post otherwise, in the orthodox manner. Upon analyzing and comparing the sentiment of image text and post text, we found that image text (extracted through OCR) was much more negative than post text (the orthodox text). In fact, not only was image text more negative, it was also different from post text in terms of topics being talked about. Table 1 shows a mutually exclusive subset of the most common words appearing in image text and post text. While post text was full of generic text offering prayers, support and solidarity, image text was found to mention some sensitive issues like “refugees”, “syria”, etc.

Top words in posts Top words in images
S. No. Word Normalized frequency Word Normalized frequency
1. retweeted 0.005572571 house 0.00452941
2. time 0.005208351 safety 0.004481122
3. prayers 0.005001407 washington 0.004297628
4. news 0.004713342 sisters 0.003940297
5. prayfortheworld 0.004431899 learned 0.003863036
6. life 0.004393821 mouth 0.003853378
7. let 0.004249789 stacy 0.003751974
8. support 0.004249789 passport 0.003708515
9. god 0.00401139 americans 0.003694028
10. war 0.003986557 refugee 0.00352502
11. thoughts 0.003882258 japan 0.002887619
12. need 0.003878946 texas 0.002781386
13. last 0.003797825 born 0.002689639
14. lives 0.003734914 dear 0.002689639
15. said 0.003468371 syrians 0.002607549
16. place 0.003468371 similar 0.002573748
17. country 0.003319372 deadly 0.002568919
18. city 0.003291227 services 0.002554433
19. everyone 0.003281294 accept 0.002554433
20. live 0.003274672 necessary 0.002549604
Table 1. Mutually exclusive set of 20 most frequently occurring
relevant keywords in post and image text, with their normalized
frequency. We identified some potentially sensitive topics among
image text, which were not present in post text. Word frequencies
are normalized independently by the total sum of frequencies of the
top 500 words in each class.
We also uncovered a popular conspiracy theory surrounding the Syrian “passports” that were found by French police near the bodies of terrorists who carried out the attacks, and were allegedly used to establish the identity of the attackers as Syrian citizens. Text embedded in images depicting this theme questioned how the passports could have survived the heat of the blasts and fire. This conspiracy theory was then used by miscreants to label the attacks as a false flag operation, influencing citizens to question the policies and motives of their own government. The popularity of such memes on OSN platforms can have undesirable outcomes in the real world, like protests and mass unrest. It is therefore vital for first responders to be able to identify such content and counter / control its flow to avoid repercussions in the real world.

Figure 7. Example of a picture containing text relating to a conspiracy theory questioning how the Syrian passports survived the blasts. We found hundreds of images talking about this topic in our dataset.

Images posted on OSNs are a critical source of information that can be useful for law and order organizations to understand popular topics and public sentiment, especially during crisis events. Through our approach, we propose a semi-automated methodology for mining knowledge from visual content and identifying popular themes and citizens’ pulse during crisis events. Although this methodology has its limitations, it can be very effective for producing high level summaries and reducing the search space for organizations with respect to content that may need attention. We also described how our methodology can be used for automatically identifying (potentially sensitive) misinformation spread through images during crisis events, which may lead to major implications in the real world.

Here is a link to the complete Technical report on this work. Big credits to Varun Bharadhwaj, Aditi Mithal, and Anshuman Suri for all their efforts. Below is an infographic of work.

References:

[1] https://www.eyeqinsights.com/power-visual-content-images-vs-text/

[2] https://github.com/tesseract-ocr/

[3] https://www.tensorflow.org/versions/r0.11/tutorials/image_recognition/index.html

[4] http://cs231n.github.io/convolutional-networks/

[5] Gupta, Aditi, Hemank Lamba, and Ponnurangam Kumaraguru. “$1.00 per rt# bostonmarathon# prayforboston: Analyzing fake content on twitter.” In eCrime Researchers Summit (eCRS), 2013, pp. 1-12. IEEE, 2013.

[6] Vieweg, Sarah, Amanda L. Hughes, Kate Starbird, and Leysia Palen. “Microblogging during two natural hazards events: what twitter may contribute to situational awareness.” In Proceedings of the SIGCHI conference on human factors in computing systems, pp. 1079-1088. ACM, 2010.