I have been Precog-ed (for life): Part 4

Holà! It’s the first day of 2017. All of us just got done with looking back at the past year, trying to fathom how time flies and life metamorphosizes. My life has taken a leap too and this is my last blog as a part of the ‘I have been Precog-ed’ series. Earlier, I have written about my first stint at research (Part 1), a wonderful summer at the Information Sciences Institute at Marina Delray, Los Angeles (Part 2), my first paper presentation at ICWSM 2016 in Germany (Part 3), and my time at Precog. This post is about the last 6 months of my journey and an attempt to express what being a Precog-er is all about (for more on this, please read the first three parts too). Being a Precog-er for more than 3 years, I have more thoughts than I can ever pen down; from being an undergrad who joined Precog as a noob to a grad student at Carnegie Mellon University, my path has always been illuminated by the light of learning and hope.

April 2016 – I was struggling with end-sem preparations, document processing and Visa applications for my trip to ICWSM and my masters in the States, and the humdrum undergrad life when an unexpected email got an unexpected reaction from me –

“Dear Megha,

We are pleased to inform you that you have been selected as an one of the 40 CERN Openlab Summer Students 2016 (out of 1461 applicants)! For nine weeks, CERN will be your host for what we hope is going to be an interesting, fun and active summer…”

I have been an amateur astronomer for 9 years, and getting to work at the ‘Mecca of Particle Physics’ would have been a dream come true. I knew I wouldn’t be able to make it. I was applying for my Schengen Visa for Germany (which would take another 2 weeks), and then I had to start my application for the US visa. I needed another Schengen Visa for Switzerland in a span of one week. On top of that, the only dates I could select for the internship were overlapping with my initial orientation schedule at CMU. I almost disrupted a meeting in PK’s office to break the news to him. I was sad. Pillars (Ph.D. students at Precog) and PK were convinced that I should try and if it doesn’t work out, so be it. That’s a Precog trait – not giving up until you have given your best shot! After cutting short the duration of my summer at CERN, pushing CMU to allow me to skip the orientations (convincing them that I’ll manage when I wasn’t sure myself I’ll), and getting my Schegen for Switzerland in a day (thanks to CERN’s administrative staff who made a special request for me to the embassy), I was ready for a summer at CERN.

I worked for 2 months at CERN’s data center on a storage system of ~125PB (one of the largest in the world). CERN openlab program includes a lecture series to helps CS students understand the Physics needed for some of the projects, trips to ETH Zürich and EPFL Lausanne, hackathons, and several means to help the students gain insights about the revolutionary projects spanning across 100 hectares in Switzerland and more than 450 hectares in France! It was a humbling experience, which entailed learning something new every day. Europeans have nailed the work-life balance too. Along with finishing my project on time, I managed to check Geneva, Lausanne, Lyon, Zürich, Paris, Montreux, Bern, Engelberg, Chamonix and many more off my list!

Delhi for 2 days, and Pittsburgh was my next destination, my home for the next 16 months. I am an MSCS student at CMU now. Last to arrive and one of the youngest of the lot, thanks to PK I had ample of background knowledge about life as a student here and the city of Pittsburgh. The experience I have gained at Precog comes in handy when I have to identify research gaps and solve hard problems. I feel more equipped and confident to take up the challenges that come along with grad life at a school like CMU.

Throughout these 6 months (Jul – Dec 2016), I have been working with a few Precog-ers on what we now call the Killfie project. It has turned out to be one of the most exciting projects I have worked on as a part of the group. It is the inclination to work on interesting problems with some brilliant people, which gives me the motivation to find time for this amongst courses and projects at CMU.

I cannot finish this blog without revisiting these lines from my first blog – “…PK, the heart and brain of Precog. He is the coolest adviser I have ever met and his skills and dexterity at work are almost mind-boggling. I came to know him as my Probability and Statistics professor, the role changed to being my adviser working at Precog and now I see him as a mentor for life..”. A lot of what I have been able to achieve in the last 3 years, I owe it to PK’s unconditional support. Thank you PK for illuminating my path always and for proving what good mentorship can accomplish!
My time at Precog has taught me how to help people, make friends, eliminate distractions and focus, improve daily, think big, fail often and give nothing short of your very best effort! I have had last minute unscheduled video calls in the middle of the night from the other end of the world with Precog-ers when I needed help. Pillars, interns, RAs – thank you each one of you for this experience. Even though I live in a different time-zone now and my attendance at the 4th floor Ph.D. lab has been at an all-time low, I know my association with the group will last forever.  As has been rightly put – ‘Once a Precog-er, always a Precog-er!’.

PS – Some pictures…

Just another day at Precog…
“It’s all about the people!”
The room where Tim Berners-Lee developed the World Wide Web at CERN!
This one doesn’t need a caption… 🙂
The Aiguille du Midi Skywalk, “Step into the Void” at Chamonix (altitude – 3842m)
CERN Openlab Summer Students 2016

 

 

Me, Myself and My Killfie: Characterizing and Preventing Selfie Deaths

Authors: Hemank Lamba, Varun Bharadhwaj, Mayank Vachher, Divyansh Agarwal, Megha Arora, Ponnurangam Kumaraguru

Our world is becoming smaller with time, bringing us closer and bestowing upon us a number of avenues to easily showcase ourselves in any manner we want. Perhaps the biggest facilitating agent in this regard, is Online Social Media (OSM). In a way, OSM replicates our world, with friends, interactions and constant information exchange. The world of OSM seems to have developed an interesting currency of its own too – LIKES and COMMENTS, the dollars and cents of the virtual realm; something which everyone aspires to have in abundance.

We are also familiar with the popular “selfie” phenomenon. Recognized as the “word of the year” by Oxford dictionary in 2013, the “selfie” is defined as a “photograph taken of oneself, and uploaded to a social media website.”  In recent years, there has been a sharp increase in the number of selfies posted on OSM. However, one particularly disturbing trend that has emerged lately is that of clicking dangerous selfies; proving to be so disastrous that during the year 2015 alone, there have been more deaths caused due to selfies than shark attacks all over the world [1]. Figure 1 shows examples of such selfies taken moments before the fatal incident. A selfie-related death can be defined as a death of an individual or group of people that could have been avoided had the individual(s) not been taking a selfie.

The level of threat that adventurous selfie taking behaviour exposes people to, is being acknowledged slowly by governments as well. Russian authorities came up with a public awareness campaign to enlighten citizens of the hazardous implications of taking selfies [2]. Similarly, Mumbai police recently classified 16 zones across the city as No-Selfie zones, after a rise in the number of selfie casualties [3].

The reason for this outrageous trend of dangerous selfies becomes clear when we combine the thoughts above. Since the advent of online social networks, people have developed an insatiable urge to be the most “popular” in their community. In medicinal terms, this has been long compared to forms of narcissism and in relation to selfies, termed as Selfitis [4,5,6]. This becomes the prime reason why people resort to performing risky feats while taking a selfie to garner more appreciation in the form of likes and comments from their friends online.

We, at Precog@IIITD chose to analyse the issue from a technical perspective and to dive deeper into what characterizes a selfie casualty/death, what kind of information we can extract from selfie images and how selfie casualties can be prevented.

Over the past two years, we found that a total of 127 deaths have been reported to be caused due to selfies, of which a whopping 76 deaths occurred in India alone! [7] Table 1 shows the country-wise distribution of selfie casualties across the world. The reasons for these selfie casualties were found to broadly belong to the following categories (Figure 2) at https://views.guru/:

  • Height Related – Selfie casualties caused due to people falling from an elevated location. [8]

  • Water Related – Selfie casualties caused due to drowning. [9]

  • Height and Water Related – Selfie casualties involving falling from elevated locations into a water body. [10]

  • Vehicle/Road Related– Selfie casualties caused due to vehicle accidents. [11]

  • Train Related– Selfie casualties caused due to being hit by a train.[12]

  • Weapons Related– Selfie casualties caused due to accidental firing of a weapon.[13]

  • Animal Related– Selfie casualties caused due to attack by an animal while taking the selfie with or near the animal.[14]

  • Electricity Related- Selfie casualties caused due to electrocution from live wires.[15]

Figure 2: (a) Number of Deaths and (b) Number of Incidents due to various reasons

Using a collective dataset of 138,496 tweets collected between August and September 2016, we implemented a three-fold architecture based on Image features, Location features, and Text features to quantify the danger level of selfies in our dataset.  Our machine learning model takes into account a variety of features to identify dangerous selfies along with their potential risks, and analyses common characteristics in these images. These features are supplied to four different classifiers with similar parameters to avoid bias in the results. Table 2 shows the sets of features we used for each feature type.

Table 2: Location-Based, Image-Based and Text-Based features used for classification of selfies

After thorough analysis, we found that the image-based features are the best indicators that accurately capture the dangerous nature of a selfie, in comparison to other feature-types. This seems logical as image features attempt to infer meaning directly out of the image, in a sense replicating our visual senses. Our model resulted in an accuracy of 73.6% for the task of identifying a dangerous selfie.

To further capture the risk type of a dangerous selfie, we used specific features that were relevant only to a particular risk type and supplied the data to our classifier. In particular, we concentrated on singling out dangerous selfies that belonged to height, water and vehicle related risks. We found that the set of features performing the best for this task was a combination of all 3 feature types – Image, Location and Text based features, and the best accuracy was obtained on the Water-related features. With remarkable accuracy, we have been able to establish a method to identify and capture the “danger level” of a selfie along with its risk type.

With the growing trend of dangerous selfies, it becomes important to spread awareness of the inherent hazards associated with people risking their lives simply for the sake of recognition on a virtual forum. As Shakespeare coins it, this type of “Bubble Reputation” induced by a dangerous selfie posted on OSM has claimed multiple lives lately. This work is a small contribution towards making the world safer, by making the people aware.

Our full report / paper on this work. You can access the portal and our dataset here.

References:

[1] http://www.telegraph.co.uk/technology/11881900/More-people-have-died-by-taking-selfies-this-year-than-by-shark-attacks.html

[2] https://www.theguardian.com/world/2015/jul/07/a-selfie-with-a-weapon-kills-russia-launches-safe-selfie-campaign

[3] http://metro.co.uk/2016/02/25/mumbai-orders-selfie-ban-after-19-people-die-5716731/

[4] S. Bhogesha, J. R. John, and S. Tripathy. Death in a flash: selfie and the lack of self-awareness. Journal of Travel Medicine, 23(4):taw033, 2016

[5] B. Subrahmanyam, K. S. Rao, R. Sivakumar, and G. C. Sekhar. Selfie related deaths perils of newer technologies. Narayana Medical Journal, 5(1):52–56, 2016.

[6] A. LAKSHMI. The selfie culture: Narcissism or counter hegemony? Journal of Communication and media Studies (JCMS), 5:2278–4942, 2015

[7] http://labs.precog.iiitd.edu.in/killfie/analysis

[8] http://www.telegraph.co.uk/news/2016/07/01/german-tourist-plunges-to-his-death-while-posing-for-picture-at/

[9] http://www.thenewsminute.com/article/selfie-deaths-two-men-drown-karnataka-couple-washed-away-tn-46735

[10] http://www.ndtv.com/cities/teenager-drowns-while-clicking-selfie-friend-dies-trying-to-save-him-1277217

[11] http://www.independent.co.uk/news/world/americas/selfie-crash-death-woman-dies-in-head-on-collision-seconds-after-uploading-pictures-of-herself-and-9293694.html

[12] http://timesofindia.indiatimes.com/city/varanasi/2-killed-while-taking-selfie-on-railway-tracks/articleshow/51850194.cms

[13] http://www.aljazeera.com/news/2015/07/russia-launches-safe-selfie-guide-light-deaths-150707132204704.html

[14] http://www.radar.ng/2016/04/elephant-tramples-boy-to-death-while.html?utm_source=nnd&utm_medium=twitter&utm_campaign=nnd

[15] http://www.thelocal.es/20140318/young-man-dies-in-train-selfie-fail

The complete picture: Visual Themes and Sentiment on Social Media for First Responders.



Researchers and academicians all over the world have conducted numerous studies and established that ​social media plays a vital role during crisis events. From citizens helping police to capture suspected terrorists Boston Marathon [5], to vigilant users spreading  situational awareness [6], OSNs have proved their mettle as a powerful platform for information dissemination during crisis.

Most of the aforementioned work has relied on textual content posted on OSNs to extract knowledge, and make inferences. Now the thing is, that online media is rapidly moving from text to visual media. With the prevalence of 3G, 4G technologies and high-bandwidth connectivity in most Internet enabled countries, images and videos are gaining much more traction than text. This is also natural, since the human brain is hardwired to recognize and make sense of visual information more efficiently [1]. Just using text to draw inferences from social media data is no longer enough. As we discussed in our previous blog, there is a significant percentage of social media posts which do not contain any text. Moreover, there’s also a large percentage of posts which contain both text, and images. The point to keep in mind here is, that images and text may be contradicting each other, even if they’re part of the same post. While text in Figure 1 inspires support and positive sentiment, the image (or more precisely, the text in the image) is pretty negative. This is what current research methodology is missing out on http://followersguru.net/.

Example of Facebook post

Figure 1. Example of a Facebook post with contradicting text and image sentiment.

Continuing our work on image and online social media, we​ decided to dig further into images posted on social networks, and see if images could aid first responders to get a more complete picture of the situation during a crisis event.​ We collected Facebook posts published during the attacks in Paris in November 2015, and performed large scale mining on the image content we captured. Typically, monitoring the popular topics and sentiment among the citizens can be of help to first responders. Timely identification of misinformation, sensitive topics, negative sentiment, etc. online can be really helpful in predicting and averting any potential implications in the real world.

​We were able to gather over 57,000 images using the #ParisAttacks and #PrayForParis hashtags put together, out of which, 15,123 images were unique. Analyzing such a big number of images manually is time consuming, and not scalable. So we utilized state-of-the-art techniques from the computer vision domain to automatically analyze images on a large scale. These techniques include Optical Character Recognition (OCR) [2], image classification, and image sentiment identification using Convolutional Neural Networks (CNNs). Figure 2 shows how a typical CNN model processes and classifies images [4].

Figure 2. Typical CNN model for object identification in images. Image taken from http://cs231n.github.io/convolutional-networks/

With all these “weapons”, we set out to mine the sea of images and see if we could discover something useful. And we struck gold right away. We used Google’s Inception-v3 model [3] for generating tags for images automatically, and looked at a few of the most popular tags. Interestingly, we found numerous instances of misinformative images, images containing potentially sensitive themes, and images promoting conspiracy theories among popular images. By the time we identified them, these images had gathered millions of likes, and hundreds of thousands of comments and shares. Some of these examples are listed below (Figure 3 – 6) at http://followersguru.net/buy-instagram-likes/.

Figure 3. Eiffel Tower turns off its lights for the first time in 63 years. This information was incorrect. Eiffel Tower’s lights are turned off every night between 1 am and 6 am following a ruling by the French Government.

Figure 4. Image incorrectly quoting the cause of death of Diesel, a police dog that helped the police during the attacks. The French Police later clarified that the actual cause of death was gunshot wounds from the French Police fleet itself, and not the suicide bomber.

Figure 5. Donald Trump’s insensitive tweet just after the Paris attacks. As the time stamp of the tweet suggests, this tweet was posted months ago, but resurfaced just after the attacks to defame the politician.

Figure 6. Picture claiming that a muslim guard named Zouheir stopped a suicide bomber from entering the Stade de France football stadium and saved thousands of lives. As later clarified by the security guard himself, such an incident never took place. Zouheir, the security guard was stationed at a different spot.

Applying OCR on the images in our dataset, we were able to extract text from about 55% of the images (31,869 out of 57,748 images). We wondered if this text embedded in images would be any different than the text that users post otherwise, in the orthodox manner. Upon analyzing and comparing the sentiment of image text and post text, we found that image text (extracted through OCR) was much more negative than post text (the orthodox text). In fact, not only was image text more negative, it was also different from post text in terms of topics being talked about. Table 1 shows a mutually exclusive subset of the most common words appearing in image text and post text. While post text was full of generic text offering prayers, support and solidarity, image text was found to mention some sensitive issues like “refugees”, “syria”, etc.

Top words in posts Top words in images
S. No. Word Normalized frequency Word Normalized frequency
1. retweeted 0.005572571 house 0.00452941
2. time 0.005208351 safety 0.004481122
3. prayers 0.005001407 washington 0.004297628
4. news 0.004713342 sisters 0.003940297
5. prayfortheworld 0.004431899 learned 0.003863036
6. life 0.004393821 mouth 0.003853378
7. let 0.004249789 stacy 0.003751974
8. support 0.004249789 passport 0.003708515
9. god 0.00401139 americans 0.003694028
10. war 0.003986557 refugee 0.00352502
11. thoughts 0.003882258 japan 0.002887619
12. need 0.003878946 texas 0.002781386
13. last 0.003797825 born 0.002689639
14. lives 0.003734914 dear 0.002689639
15. said 0.003468371 syrians 0.002607549
16. place 0.003468371 similar 0.002573748
17. country 0.003319372 deadly 0.002568919
18. city 0.003291227 services 0.002554433
19. everyone 0.003281294 accept 0.002554433
20. live 0.003274672 necessary 0.002549604
Table 1. Mutually exclusive set of 20 most frequently occurring
relevant keywords in post and image text, with their normalized
frequency. We identified some potentially sensitive topics among
image text, which were not present in post text. Word frequencies
are normalized independently by the total sum of frequencies of the
top 500 words in each class.
We also uncovered a popular conspiracy theory surrounding the Syrian “passports” that were found by French police near the bodies of terrorists who carried out the attacks, and were allegedly used to establish the identity of the attackers as Syrian citizens. Text embedded in images depicting this theme questioned how the passports could have survived the heat of the blasts and fire. This conspiracy theory was then used by miscreants to label the attacks as a false flag operation, influencing citizens to question the policies and motives of their own government. The popularity of such memes on OSN platforms can have undesirable outcomes in the real world, like protests and mass unrest. It is therefore vital for first responders to be able to identify such content and counter / control its flow to avoid repercussions in the real world.

Figure 7. Example of a picture containing text relating to a conspiracy theory questioning how the Syrian passports survived the blasts. We found hundreds of images talking about this topic in our dataset.

Images posted on OSNs are a critical source of information that can be useful for law and order organizations to understand popular topics and public sentiment, especially during crisis events. Through our approach, we propose a semi-automated methodology for mining knowledge from visual content and identifying popular themes and citizens’ pulse during crisis events. Although this methodology has its limitations, it can be very effective for producing high level summaries and reducing the search space for organizations with respect to content that may need attention. We also described how our methodology can be used for automatically identifying (potentially sensitive) misinformation spread through images during crisis events, which may lead to major implications in the real world.

Here is a link to the complete Technical report on this work. Big credits to Varun Bharadhwaj, Aditi Mithal, and Anshuman Suri for all their efforts. Below is an infographic of work.

References:

[1] https://www.eyeqinsights.com/power-visual-content-images-vs-text/

[2] https://github.com/tesseract-ocr/

[3] https://www.tensorflow.org/versions/r0.11/tutorials/image_recognition/index.html

[4] http://cs231n.github.io/convolutional-networks/

[5] Gupta, Aditi, Hemank Lamba, and Ponnurangam Kumaraguru. “$1.00 per rt# bostonmarathon# prayforboston: Analyzing fake content on twitter.” In eCrime Researchers Summit (eCRS), 2013, pp. 1-12. IEEE, 2013.

[6] Vieweg, Sarah, Amanda L. Hughes, Kate Starbird, and Leysia Palen. “Microblogging during two natural hazards events: what twitter may contribute to situational awareness.” In Proceedings of the SIGCHI conference on human factors in computing systems, pp. 1079-1088. ACM, 2010.