The Great Precog Expedition

It all began with searching for opportunities to work in the summer of 2016. I had heard endlessly about the work culture at IIIT Delhi, the research groups and the out of this world faculty. After researching through the site, I stumbled upon Professor PK’s profile. As I read further about him, I was awestruck and amongst the many who would want to work with PK.

My first meeting with him lasted for roughly 15 minutes but I went back home with a bag full of riveting information about what it takes to be a Precog-er. This was also the first time I got to know about Randy Pausch. At home, I watched ‘The Last Lecture’ and understood why the walls of the Precog area are adorned with his quotes.

Soon after I took part in the OSMpalooza Hackathon and witnessed firsthand how quick progress is made by students here. My team came up with whatever best solution we could think of, for the problem statements given. Sadly, my team didn’t win a position but I witnessed some amazing solutions by other teams and most importantly I saw myself serious and engrossed in a project in Social Media Analysis. This was the time I was further sure of wanting to work in Precog since majority of the work is done on analysing social media content. This incident would be incomplete without quoting the following:

“Experience is what you get when you didn’t get what you wanted. And experience is often the most valuable thing you have to offer.” –Randy Pausch

Very soon, I applied for the internship. After an intricate interview process, I received my offer letter. My first day at Precog was a Brainstorming session (which is another bonus point of this internship). Before the internship, how I went through research papers was basic skimming. And in the first session itself, I witnessed the dissection of a paper and not only deriving the entire methodology, but also discussing elaborate ideas about extending the current paper and implementing those as well. This is just one example of how working at Precog means legit serious work.

I was lucky to have Prateek Dewan as my mentor during the internship period. I started working closely with Prateek and soon after there were series of things I learnt that I apply till date. Before the internship the only language I worked in was Java and by the end of it, I had another language i.e. Python, to add to my skill set. Each little doubt regarding my project was cleared by him and he promptly replied to any query I had at any odd hour. I was a little apprehensive in the beginning since the progress made at Precog is super quick but I learned it all in my own time.

The most incredible characteristics of this group are the levels of sincerity and passion shared by each Precog-er when it comes to work. Apart from the respective projects carried out by each group, the regular Brainstorming sessions covered the latest research topics extensively. Several new ideas and information about the tech world were discussed in the mailing list and very soon I got the hang of it. One particular email comprised of PK discussing his latest choice of book to read, “Eat That Frog!” By Brian Tracy. Being an avid reader, I bought it the next day itself and the book has had phenomenal influences on my life. (amazing book suggestions!; another bonus of the internship). Striking a balance between working and having fun is another take away. The binding force of Precog is PK and the smart-working researchers, known as Precog-ers, make this group what it is.

Why I chose such a heavy sounding title for this post is because Precog can’t be defined by anything less. It is indeed a great expedition and I am fortunate to have experienced it.

I would like to end by quoting my favourite Randy Pausch saying that has now adorned my room’s wall as well:

“The brick walls are there for a reason. The brick walls are not there to keep us out. The brick walls are there to give us a chance to show how badly we want something. Because the brick walls are there to stop the people who don’t want it badly enough. They’re there to stop the other people.” 

Below is a picture from one of the group photo sessions!(Missing in the picture: PK)

Preventing KillFie: A crowdsourced approach

Selfies have become a prominent medium for self-portrayal on social media. Certain social media users go to extreme lengths to click dangerous selfies, which puts their lives at risk. A hundred and twenty seven individuals have died since March 2014 until December 2016 while trying to click selfies out of which 76 deaths were from India alone.

This disturbing trend can be traced to users taking selfies in “dangerous” locations which in turn can be linked to the concept of self representation on online social media. A user will be perceived more bold and adventurous if he posts from a cliff top or in front of a moving train. Such acts of portraying oneself as being a daredevil leads people to go to life threatening lengths to get the perfect selfie. The engrossing hunt for the ultimate display picture momentarily distracts the selfie taker from their surroundings, which might result in tragedies.

Our goal is to get a dataset of all such locations around the globe which are popular selfie spots, but the lives of people clicking selfies at these locations are inadvertently put at risk. We go about collecting this data through an android app and a chatbot(on facebook) that we made. Users can report locations along with the type of risk(eg : height related, water related, vehicle related etc) associated with taking a selfie on that location. Using this data, our app nudges the user (through a notification) whenever the user goes near such a location.

As we become more engrossed in mobile phones, and go to extreme lengths while being absorbed in our phones and digital world, it is possible that we lose sense of our physical surroundings. For instance, there were several incidents when people got injured while using Pokemon Go (an immersive augmented reality application). Tagging such dangerous locations might be helpful for such apps to get relevant warnings from the database.

If you know of a selfie lover or are one yourself, please download this app. Even a single location which you report may prevent many unfortunate deaths. And since what goes around comes around, one day the app might save you from unknowingly putting yourself in danger.

To interact with the chatbot, go to the facebook page and click on send message. Send a location by clicking on the location icon(can be done only through the messenger app) and just follow the instructions given by the bot.

Click here to download the app.

Click here to interact with the chatbot.

          

          

 

I have been Precog-ed (for life): Part 4

Holà! It’s the first day of 2017. All of us just got done with looking back at the past year, trying to fathom how time flies and life metamorphosizes. My life has taken a leap too and this is my last blog as a part of the ‘I have been Precog-ed’ series. Earlier, I have written about my first stint at research (Part 1), a wonderful summer at the Information Sciences Institute at Marina Delray, Los Angeles (Part 2), my first paper presentation at ICWSM 2016 in Germany (Part 3), and my time at Precog. This post is about the last 6 months of my journey and an attempt to express what being a Precog-er is all about (for more on this, please read the first three parts too). Being a Precog-er for more than 3 years, I have more thoughts than I can ever pen down; from being an undergrad who joined Precog as a noob to a grad student at Carnegie Mellon University, my path has always been illuminated by the light of learning and hope.

April 2016 – I was struggling with end-sem preparations, document processing and Visa applications for my trip to ICWSM and my masters in the States, and the humdrum undergrad life when an unexpected email got an unexpected reaction from me –

“Dear Megha,

We are pleased to inform you that you have been selected as an one of the 40 CERN Openlab Summer Students 2016 (out of 1461 applicants)! For nine weeks, CERN will be your host for what we hope is going to be an interesting, fun and active summer…”

I have been an amateur astronomer for 9 years, and getting to work at the ‘Mecca of Particle Physics’ would have been a dream come true. I knew I wouldn’t be able to make it. I was applying for my Schengen Visa for Germany (which would take another 2 weeks), and then I had to start my application for the US visa. I needed another Schengen Visa for Switzerland in a span of one week. On top of that, the only dates I could select for the internship were overlapping with my initial orientation schedule at CMU. I almost disrupted a meeting in PK’s office to break the news to him. I was sad. Pillars (Ph.D. students at Precog) and PK were convinced that I should try and if it doesn’t work out, so be it. That’s a Precog trait – not giving up until you have given your best shot! After cutting short the duration of my summer at CERN, pushing CMU to allow me to skip the orientations (convincing them that I’ll manage when I wasn’t sure myself I’ll), and getting my Schegen for Switzerland in a day (thanks to CERN’s administrative staff who made a special request for me to the embassy), I was ready for a summer at CERN.

I worked for 2 months at CERN’s data center on a storage system of ~125PB (one of the largest in the world). CERN openlab program includes a lecture series to helps CS students understand the Physics needed for some of the projects, trips to ETH Zürich and EPFL Lausanne, hackathons, and several means to help the students gain insights about the revolutionary projects spanning across 100 hectares in Switzerland and more than 450 hectares in France! It was a humbling experience, which entailed learning something new every day. Europeans have nailed the work-life balance too. Along with finishing my project on time, I managed to check Geneva, Lausanne, Lyon, Zürich, Paris, Montreux, Bern, Engelberg, Chamonix and many more off my list!

Delhi for 2 days, and Pittsburgh was my next destination, my home for the next 16 months. I am an MSCS student at CMU now. Last to arrive and one of the youngest of the lot, thanks to PK I had ample of background knowledge about life as a student here and the city of Pittsburgh. The experience I have gained at Precog comes in handy when I have to identify research gaps and solve hard problems. I feel more equipped and confident to take up the challenges that come along with grad life at a school like CMU.

Throughout these 6 months (Jul – Dec 2016), I have been working with a few Precog-ers on what we now call the Killfie project. It has turned out to be one of the most exciting projects I have worked on as a part of the group. It is the inclination to work on interesting problems with some brilliant people, which gives me the motivation to find time for this amongst courses and projects at CMU.

I cannot finish this blog without revisiting these lines from my first blog – “…PK, the heart and brain of Precog. He is the coolest adviser I have ever met and his skills and dexterity at work are almost mind-boggling. I came to know him as my Probability and Statistics professor, the role changed to being my adviser working at Precog and now I see him as a mentor for life..”. A lot of what I have been able to achieve in the last 3 years, I owe it to PK’s unconditional support. Thank you PK for illuminating my path always and for proving what good mentorship can accomplish!
My time at Precog has taught me how to help people, make friends, eliminate distractions and focus, improve daily, think big, fail often and give nothing short of your very best effort! I have had last minute unscheduled video calls in the middle of the night from the other end of the world with Precog-ers when I needed help. Pillars, interns, RAs – thank you each one of you for this experience. Even though I live in a different time-zone now and my attendance at the 4th floor Ph.D. lab has been at an all-time low, I know my association with the group will last forever.  As has been rightly put – ‘Once a Precog-er, always a Precog-er!’.

PS – Some pictures…

Just another day at Precog…
“It’s all about the people!”
The room where Tim Berners-Lee developed the World Wide Web at CERN!
This one doesn’t need a caption… 🙂
The Aiguille du Midi Skywalk, “Step into the Void” at Chamonix (altitude – 3842m)
CERN Openlab Summer Students 2016

 

 

There’s misinformation on Facebook. Here’s how you deal with it.

I’ll keep this short and to the point. There’s a sudden backlash on Facebook for hosting misinformation [1], and polar politics [2] after the recent elections in the USA. Is this new? NO.

Let me take you to back in time, to March 2014. The deeply tragic incident of the Malaysian Airlines Flight MH370 wiped off an entire aircraft and all on board [3]. A sea of prayers and solidarity followed on all social networks including Facebook. What also followed was a series of fake, misinformative posts, links, and videos claiming to show you the footage of the aircraft crashing [4], and rumors claiming that the plane had been found in the Bermuda triangle (see image of one such post below). Such footage never existed.

http://www.hoax-slayer.com/images/malaysia-airlines-MH370-scam-1.jpg

Following this incident, there have been a series of events where miscreants have exploited the context of a popular event to spread hoaxes, misinformation, rumors, fake news, etc. From the rumor of the death of comic actor Rowan Atkinson (a.k.a. Mr. Bean) to the suicide video by late legendary actor Robin Williams, misinformation has plagued Facebook for years, and is continuing to do so. While Facebook has recently acknowledged misinformation to be a serious problem, we at Precog had already started working on it when we first came across instances of misinformation. So how do you really deal with misinformation and rumors and hoaxes and fake news on Facebook?

There have been a few attempts to solve this problem. Facebook posted a series of blogs vowing to improve their algorithms to reduce misinformation, hoaxes, rumors, clickbaiting, etc. [8, 9, 10, 11, 12]. A recently conducted hackathon by Princeton University also witnessed a group of 4 students attempting to fix this problem [13]. Well, as it turns out, we took a dig at this problem over 2 years ago, and came up with a robust solution of our own. In August 2015, we publicly launched Facebook Inspector, a free, easy-to-use browser extension that identifies malicious content (including the type we just discussed above) in real time. At this moment, Facebook Inspector has over 200 daily active users, and has just crossed 5,000,000 hits (it’s 5 million; but it’s just fun to write it with so many zeros xD). We leveraged multiple crowd sourcing mechanisms to gather a pool of misinformative and other types of malicious posts, and harnessed them to generate a model to automatically identify misinformative posts, hoaxes, rumors, scams, etc.

Give it a try. Download the Chrome version at https://chrome.google.com/webstore/detail/facebook-inspector/jlhjfkmldnokgkhbhgbnmiejokohmlfc

Firefox users, download at https://addons.mozilla.org/en-US/firefox/addon/fbi-facebook-inspector/

To read the entire story behind the inception of the idea, and incarnation of Facebook Inspector, read the detailed technical report here.

So we spotted a problem a couple of years ago, took a dig at solving it (and I’d like to believe we succeeded), and apparently, the entire world is after Facebook for the same problem today. But misinformation, hoaxes, and rumors aren’t the only big problems that Facebook is surrounded by. Lets talk some more about the US elections. Facebook’s algorithms have been accused of reinforcing “political polarization” by Professor Filippo Menczer in a popular news article [2]. Apparently, Facebook is home to a big bunch of political groups which post polarized content to influence users towards / against certain political beliefs. Whether such content should be allowed on social networking websites, is debatable. After all, free speech is a thing! But the question that demands attention here is, did these politically polarized entities suddenly appear on Facebook around the election time? I mean, if they would’ve been around for long, Facebook would’ve known, right? And the effects of social network content on elections are well known and studied [5, 6, 7]. So Facebook would’ve definitely done something to at least nudge users when getting exposed to polarized political content. But polarized political content was never a point of concern for Facebook. So it probably didn’t exist until right before the elections. Right? Wrong!

Well, this is a literal “I told you so moment.” Last year, we conducted a large scale study of malicious Facebook pages, and one of our main findings was the dominant presence of politically polarized entities on Facebook among malicious pages. We analyzed the content posted by these politically polarized pages, and found that negative sentiment, anger, and religion dominated within such content. We reported our findings in the form of a technical report: https://arxiv.org/abs/1510.05828v1

It is good to know that what you work on, as part of research, connects closely to relevant, present day, real world problems, but it isn’t really a good feeling to realize that something you already knew could happen, happens anyway. We at Precog always push towards trying to make a difference and making the online world better and safer. We try our best, but we can only do so much.

To conclude, not bragging here (well, it’s not bragging if it’s true!), but we saw not one, but two real problems coming, more than a year before Facebook did.

You see, we’re called “Precog” for a reason. *mic drop*

References

[1] https://techcrunch.com/2016/11/10/facebook-admits-it-must-do-more-to-stop-the-spread-of-misinformation-on-its-platform/

[2] https://www.theguardian.com/technology/2016/nov/10/facebook-fake-news-election-conspiracy-theories

[3] https://en.wikipedia.org/wiki/Malaysia_Airlines_Flight_370

[4] https://www.scamwatch.gov.au/news/scammers-using-videos-of-malaysian-airlines-flight-mh370-to-spread-malware

[5] Williams, Christine B., and Girish J. Gulati. “Social networks in political campaigns: Facebook and the 2006 midterm elections.” annual meeting of the American Political Science Association. Vol. 1. No. 11. 2007.

[6] Williams, Christine B., and J. Girish. “Social networks in political campaigns: Facebook and the congressional elections of 2006 and 2008.” New Media & Society (2012): 1461444812457332.

[7] Douglas, Sara, et al. “Politics and young adults: the effects of Facebook on candidate evaluation.” Proceedings of the 15th Annual International Conference on Digital Government Research. ACM, 2014.

[8] https://newsroom.fb.com/news/2015/01/news-feed-fyi-showing-fewer-hoaxes/

[9] http://newsroom.fb.com/news/2016/08/news-feed-fyi-further-reducing-clickbait-in-feed/

[10] http://newsroom.fb.com/news/2014/11/news-feed-fyi-reducing-overly-promotional-page-posts-in-news-feed/

[11] http://newsroom.fb.com/news/2014/08/news-feed-fyi-click-baiting/

[12] http://newsroom.fb.com/news/2014/04/news-feed-fyi-cleaning-up-news-feed-spam/

[13] http://www.businessinsider.in/It-only-took-36-hours-for-these-students-to-solve-Facebooks-fake-news-problem/articleshow/55426656.cms

Me, Myself and My Killfie: Characterizing and Preventing Selfie Deaths

Authors: Hemank Lamba, Varun Bharadhwaj, Mayank Vachher, Divyansh Agarwal, Megha Arora, Ponnurangam Kumaraguru

Our world is becoming smaller with time, bringing us closer and bestowing upon us a number of avenues to easily showcase ourselves in any manner we want. Perhaps the biggest facilitating agent in this regard, is Online Social Media (OSM). In a way, OSM replicates our world, with friends, interactions and constant information exchange. The world of OSM seems to have developed an interesting currency of its own too – LIKES and COMMENTS, the dollars and cents of the virtual realm; something which everyone aspires to have in abundance.

We are also familiar with the popular “selfie” phenomenon. Recognized as the “word of the year” by Oxford dictionary in 2013, the “selfie” is defined as a “photograph taken of oneself, and uploaded to a social media website.”  In recent years, there has been a sharp increase in the number of selfies posted on OSM. However, one particularly disturbing trend that has emerged lately is that of clicking dangerous selfies; proving to be so disastrous that during the year 2015 alone, there have been more deaths caused due to selfies than shark attacks all over the world [1]. Figure 1 shows examples of such selfies taken moments before the fatal incident. A selfie-related death can be defined as a death of an individual or group of people that could have been avoided had the individual(s) not been taking a selfie.

The level of threat that adventurous selfie taking behaviour exposes people to, is being acknowledged slowly by governments as well. Russian authorities came up with a public awareness campaign to enlighten citizens of the hazardous implications of taking selfies [2]. Similarly, Mumbai police recently classified 16 zones across the city as No-Selfie zones, after a rise in the number of selfie casualties [3].

The reason for this outrageous trend of dangerous selfies becomes clear when we combine the thoughts above. Since the advent of online social networks, people have developed an insatiable urge to be the most “popular” in their community. In medicinal terms, this has been long compared to forms of narcissism and in relation to selfies, termed as Selfitis [4,5,6]. This becomes the prime reason why people resort to performing risky feats while taking a selfie to garner more appreciation in the form of likes and comments from their friends online.

We, at Precog@IIITD chose to analyse the issue from a technical perspective and to dive deeper into what characterizes a selfie casualty/death, what kind of information we can extract from selfie images and how selfie casualties can be prevented.

Over the past two years, we found that a total of 127 deaths have been reported to be caused due to selfies, of which a whopping 76 deaths occurred in India alone! [7] Table 1 shows the country-wise distribution of selfie casualties across the world. The reasons for these selfie casualties were found to broadly belong to the following categories (Figure 2) at https://views.guru/:

  • Height Related – Selfie casualties caused due to people falling from an elevated location. [8]

  • Water Related – Selfie casualties caused due to drowning. [9]

  • Height and Water Related – Selfie casualties involving falling from elevated locations into a water body. [10]

  • Vehicle/Road Related– Selfie casualties caused due to vehicle accidents. [11]

  • Train Related– Selfie casualties caused due to being hit by a train.[12]

  • Weapons Related– Selfie casualties caused due to accidental firing of a weapon.[13]

  • Animal Related– Selfie casualties caused due to attack by an animal while taking the selfie with or near the animal.[14]

  • Electricity Related- Selfie casualties caused due to electrocution from live wires.[15]

Figure 2: (a) Number of Deaths and (b) Number of Incidents due to various reasons

Using a collective dataset of 138,496 tweets collected between August and September 2016, we implemented a three-fold architecture based on Image features, Location features, and Text features to quantify the danger level of selfies in our dataset.  Our machine learning model takes into account a variety of features to identify dangerous selfies along with their potential risks, and analyses common characteristics in these images. These features are supplied to four different classifiers with similar parameters to avoid bias in the results. Table 2 shows the sets of features we used for each feature type.

Table 2: Location-Based, Image-Based and Text-Based features used for classification of selfies

After thorough analysis, we found that the image-based features are the best indicators that accurately capture the dangerous nature of a selfie, in comparison to other feature-types. This seems logical as image features attempt to infer meaning directly out of the image, in a sense replicating our visual senses. Our model resulted in an accuracy of 73.6% for the task of identifying a dangerous selfie.

To further capture the risk type of a dangerous selfie, we used specific features that were relevant only to a particular risk type and supplied the data to our classifier. In particular, we concentrated on singling out dangerous selfies that belonged to height, water and vehicle related risks. We found that the set of features performing the best for this task was a combination of all 3 feature types – Image, Location and Text based features, and the best accuracy was obtained on the Water-related features. With remarkable accuracy, we have been able to establish a method to identify and capture the “danger level” of a selfie along with its risk type.

With the growing trend of dangerous selfies, it becomes important to spread awareness of the inherent hazards associated with people risking their lives simply for the sake of recognition on a virtual forum. As Shakespeare coins it, this type of “Bubble Reputation” induced by a dangerous selfie posted on OSM has claimed multiple lives lately. This work is a small contribution towards making the world safer, by making the people aware.

Our full report / paper on this work. You can access the portal and our dataset here.

References:

[1] http://www.telegraph.co.uk/technology/11881900/More-people-have-died-by-taking-selfies-this-year-than-by-shark-attacks.html

[2] https://www.theguardian.com/world/2015/jul/07/a-selfie-with-a-weapon-kills-russia-launches-safe-selfie-campaign

[3] http://metro.co.uk/2016/02/25/mumbai-orders-selfie-ban-after-19-people-die-5716731/

[4] S. Bhogesha, J. R. John, and S. Tripathy. Death in a flash: selfie and the lack of self-awareness. Journal of Travel Medicine, 23(4):taw033, 2016

[5] B. Subrahmanyam, K. S. Rao, R. Sivakumar, and G. C. Sekhar. Selfie related deaths perils of newer technologies. Narayana Medical Journal, 5(1):52–56, 2016.

[6] A. LAKSHMI. The selfie culture: Narcissism or counter hegemony? Journal of Communication and media Studies (JCMS), 5:2278–4942, 2015

[7] http://labs.precog.iiitd.edu.in/killfie/analysis

[8] http://www.telegraph.co.uk/news/2016/07/01/german-tourist-plunges-to-his-death-while-posing-for-picture-at/

[9] http://www.thenewsminute.com/article/selfie-deaths-two-men-drown-karnataka-couple-washed-away-tn-46735

[10] http://www.ndtv.com/cities/teenager-drowns-while-clicking-selfie-friend-dies-trying-to-save-him-1277217

[11] http://www.independent.co.uk/news/world/americas/selfie-crash-death-woman-dies-in-head-on-collision-seconds-after-uploading-pictures-of-herself-and-9293694.html

[12] http://timesofindia.indiatimes.com/city/varanasi/2-killed-while-taking-selfie-on-railway-tracks/articleshow/51850194.cms

[13] http://www.aljazeera.com/news/2015/07/russia-launches-safe-selfie-guide-light-deaths-150707132204704.html

[14] http://www.radar.ng/2016/04/elephant-tramples-boy-to-death-while.html?utm_source=nnd&utm_medium=twitter&utm_campaign=nnd

[15] http://www.thelocal.es/20140318/young-man-dies-in-train-selfie-fail

The complete picture: Visual Themes and Sentiment on Social Media for First Responders.

Researchers and academicians all over the world have conducted numerous studies and established that ​social media plays a vital role during crisis events. From citizens helping police to capture suspected terrorists Boston Marathon [5], to vigilant users spreading  situational awareness [6], OSNs have proved their mettle as a powerful platform for information dissemination during crisis.

Most of the aforementioned work has relied on textual content posted on OSNs to extract knowledge, and make inferences. Now the thing is, that online media is rapidly moving from text to visual media. With the prevalence of 3G, 4G technologies and high-bandwidth connectivity in most Internet enabled countries, images and videos are gaining much more traction than text. This is also natural, since the human brain is hardwired to recognize and make sense of visual information more efficiently [1]. Just using text to draw inferences from social media data is no longer enough. As we discussed in our previous blog, there is a significant percentage of social media posts which do not contain any text. Moreover, there’s also a large percentage of posts which contain both text, and images. The point to keep in mind here is, that images and text may be contradicting each other, even if they’re part of the same post. While text in Figure 1 inspires support and positive sentiment, the image (or more precisely, the text in the image) is pretty negative. This is what current research methodology is missing out on http://followersguru.net/.

Example of Facebook post

Figure 1. Example of a Facebook post with contradicting text and image sentiment.

Continuing our work on image and online social media, we​ decided to dig further into images posted on social networks, and see if images could aid first responders to get a more complete picture of the situation during a crisis event.​ We collected Facebook posts published during the attacks in Paris in November 2015, and performed large scale mining on the image content we captured. Typically, monitoring the popular topics and sentiment among the citizens can be of help to first responders. Timely identification of misinformation, sensitive topics, negative sentiment, etc. online can be really helpful in predicting and averting any potential implications in the real world.

​We were able to gather over 57,000 images using the #ParisAttacks and #PrayForParis hashtags put together, out of which, 15,123 images were unique. Analyzing such a big number of images manually is time consuming, and not scalable. So we utilized state-of-the-art techniques from the computer vision domain to automatically analyze images on a large scale. These techniques include Optical Character Recognition (OCR) [2], image classification, and image sentiment identification using Convolutional Neural Networks (CNNs). Figure 2 shows how a typical CNN model processes and classifies images [4].

Figure 2. Typical CNN model for object identification in images. Image taken from http://cs231n.github.io/convolutional-networks/

With all these “weapons”, we set out to mine the sea of images and see if we could discover something useful. And we struck gold right away. We used Google’s Inception-v3 model [3] for generating tags for images automatically, and looked at a few of the most popular tags. Interestingly, we found numerous instances of misinformative images, images containing potentially sensitive themes, and images promoting conspiracy theories among popular images. By the time we identified them, these images had gathered millions of likes, and hundreds of thousands of comments and shares. Some of these examples are listed below (Figure 3 – 6) at http://followersguru.net/buy-instagram-likes/.

Figure 3. Eiffel Tower turns off its lights for the first time in 63 years. This information was incorrect. Eiffel Tower’s lights are turned off every night between 1 am and 6 am following a ruling by the French Government.

Figure 4. Image incorrectly quoting the cause of death of Diesel, a police dog that helped the police during the attacks. The French Police later clarified that the actual cause of death was gunshot wounds from the French Police fleet itself, and not the suicide bomber.

Figure 5. Donald Trump’s insensitive tweet just after the Paris attacks. As the time stamp of the tweet suggests, this tweet was posted months ago, but resurfaced just after the attacks to defame the politician.

Figure 6. Picture claiming that a muslim guard named Zouheir stopped a suicide bomber from entering the Stade de France football stadium and saved thousands of lives. As later clarified by the security guard himself, such an incident never took place. Zouheir, the security guard was stationed at a different spot.

Applying OCR on the images in our dataset, we were able to extract text from about 55% of the images (31,869 out of 57,748 images). We wondered if this text embedded in images would be any different than the text that users post otherwise, in the orthodox manner. Upon analyzing and comparing the sentiment of image text and post text, we found that image text (extracted through OCR) was much more negative than post text (the orthodox text). In fact, not only was image text more negative, it was also different from post text in terms of topics being talked about. Table 1 shows a mutually exclusive subset of the most common words appearing in image text and post text. While post text was full of generic text offering prayers, support and solidarity, image text was found to mention some sensitive issues like “refugees”, “syria”, etc.

Top words in posts Top words in images
S. No. Word Normalized frequency Word Normalized frequency
1. retweeted 0.005572571 house 0.00452941
2. time 0.005208351 safety 0.004481122
3. prayers 0.005001407 washington 0.004297628
4. news 0.004713342 sisters 0.003940297
5. prayfortheworld 0.004431899 learned 0.003863036
6. life 0.004393821 mouth 0.003853378
7. let 0.004249789 stacy 0.003751974
8. support 0.004249789 passport 0.003708515
9. god 0.00401139 americans 0.003694028
10. war 0.003986557 refugee 0.00352502
11. thoughts 0.003882258 japan 0.002887619
12. need 0.003878946 texas 0.002781386
13. last 0.003797825 born 0.002689639
14. lives 0.003734914 dear 0.002689639
15. said 0.003468371 syrians 0.002607549
16. place 0.003468371 similar 0.002573748
17. country 0.003319372 deadly 0.002568919
18. city 0.003291227 services 0.002554433
19. everyone 0.003281294 accept 0.002554433
20. live 0.003274672 necessary 0.002549604
Table 1. Mutually exclusive set of 20 most frequently occurring
relevant keywords in post and image text, with their normalized
frequency. We identified some potentially sensitive topics among
image text, which were not present in post text. Word frequencies
are normalized independently by the total sum of frequencies of the
top 500 words in each class.
We also uncovered a popular conspiracy theory surrounding the Syrian “passports” that were found by French police near the bodies of terrorists who carried out the attacks, and were allegedly used to establish the identity of the attackers as Syrian citizens. Text embedded in images depicting this theme questioned how the passports could have survived the heat of the blasts and fire. This conspiracy theory was then used by miscreants to label the attacks as a false flag operation, influencing citizens to question the policies and motives of their own government. The popularity of such memes on OSN platforms can have undesirable outcomes in the real world, like protests and mass unrest. It is therefore vital for first responders to be able to identify such content and counter / control its flow to avoid repercussions in the real world.

Figure 7. Example of a picture containing text relating to a conspiracy theory questioning how the Syrian passports survived the blasts. We found hundreds of images talking about this topic in our dataset.

Images posted on OSNs are a critical source of information that can be useful for law and order organizations to understand popular topics and public sentiment, especially during crisis events. Through our approach, we propose a semi-automated methodology for mining knowledge from visual content and identifying popular themes and citizens’ pulse during crisis events. Although this methodology has its limitations, it can be very effective for producing high level summaries and reducing the search space for organizations with respect to content that may need attention. We also described how our methodology can be used for automatically identifying (potentially sensitive) misinformation spread through images during crisis events, which may lead to major implications in the real world.

Here is a link to the complete Technical report on this work. Big credits to Varun Bharadhwaj, Aditi Mithal, and Anshuman Suri for all their efforts. Below is an infographic of work.

References:

[1] https://www.eyeqinsights.com/power-visual-content-images-vs-text/

[2] https://github.com/tesseract-ocr/

[3] https://www.tensorflow.org/versions/r0.11/tutorials/image_recognition/index.html

[4] http://cs231n.github.io/convolutional-networks/

[5] Gupta, Aditi, Hemank Lamba, and Ponnurangam Kumaraguru. “$1.00 per rt# bostonmarathon# prayforboston: Analyzing fake content on twitter.” In eCrime Researchers Summit (eCRS), 2013, pp. 1-12. IEEE, 2013.

[6] Vieweg, Sarah, Amanda L. Hughes, Kate Starbird, and Leysia Palen. “Microblogging during two natural hazards events: what twitter may contribute to situational awareness.” In Proceedings of the SIGCHI conference on human factors in computing systems, pp. 1079-1088. ACM, 2010.

#TPBT: The Pin-Bang Theory

In the monsoon semester 2012, I took a course on Privacy and Security in Online Social Media. We had to do a project on a popular online social media. Pinterest, caught my eye. It was new, it was among the TIME Magazine’s top 50 websites of 2011 and then had close to 20 million users. Its growth was amazing; in a matter of 2 years it was well integrated with popular e-commerce sites like e-bay, etsy, Amazon etc. The big white-on-red “P” next to the blue bird and white-on-blue “f” motivated me to work on Pinterest.

Share Buttons on Amazon.

Without digging much into the OSN and the fact that project proposal submission deadline was like 30 minutes away, I proudly declared that my project will entail user analysis, locating spam / malware and also touch upon copyright issues on Pinterest.
The next time I opened my project, I got my “shock of the semester”. Pinterest had no API. Third-Party python-wrappers were all useless. I will have to scrape the whole network. Thought I was able to complete only a part of my project proposal in the semester, PK sir asked me to continue working. I was joined by Neha on the project and Prateek started shepherding us.
A crawler was created to push data from Pinterest to our databases. Starting from 5 extremely popular seed users.

The darker blocks had the primary data from Pinterest; lighter blocks had associated data collected from many different sources.

We collected a massive data set of 17.9 million user handles, 3.3 million user profiles and about 58 million “Pins” from 26th December 2012 to 1st February 2013.
We then began our analysis, some of our key findings were:

  • We found that the most common topics across users, and pins were design, fashion, photography, food and travel.
  • User, pin, and board characterization: We analyzed various user profile attributes, their geographical distribution, top pin sources and board categories.
  • Exploring Pinterest as a possible venue for copyright infringement: We found copyrighted images being shared publicly on Pinterest and almost half of these images did not give due credit to the copyright owners.
  • Analysis of personal information and malicious content present on Pinterest: Users were giving significant amount of Personally Identifiable Information (PII) voluntarily. We found numerous instances where users shared phone numbers, BBM pins, email IDs, marital status, and other personal information. We also found (and analyzed) traces of malwares in the form of pin sources by using blacklists.
Heatmap
Heat-map for user locations.

The final step was finding the title. So we called upon the highly imaginative and vocal members of Precog, who in a couple of 15-minuite sessions took us from nowhere to “Pinacolada”, “Pingoo” and finally agreeing on “The Pin-Bang Theory”. For more details have a look at our technical report here.

Here is the picture of the discussion (a memorable moment indeed):

All said and done working on Pinterest was indeed an amazing experience for all us ☺

Cheers!
Sudip, Neha, Prateek

Go home Google Groups, you’re drunk!!!

Well, as they say, no one’s perfect. Not even Google! Evidence: A recent “praise the iPad” bug in Google’s Text-To-Speech [0], which has reportedly, now been rectified, went unnoticed for months!

All the geeks out there must be familiar with the concept of bugs. May it be the =rand(200,99) bug in MS word, the famous “Why can’t I create a folder named ‘con’ in Windows” bug, or the Y2K mega-bug; geeks love bugs. Their impact can vary from funny to disastrous.

Coming to the point, we (PK and myself) recently discovered a bug in Google Groups, which made me feel rather “unpleasant.” We at Precog, run a mailing list, where all members of the group post about topics of common interest, related to security, privacy, and social media etc. Google Groups provides a nice summary of the total number of topics and posts circulated on the list for each month. Last month, that is May 2013, we hit our all-time-high (#PrecogRocks) in terms of topics and posts. PK and I went to the About page to check it out, and were rather shocked to see 183 posts for the month of June already! Terrible statistics, Google! Less than 2 hours into the month of June (IST), it does not seem humanly possible to make 183 posts, right? Given that our previous best was just over 300 for the previous month, this was definitely….. a “bug”!

The Google Groups Bug: 183 posts in under 2 hours? Incorrect!

Reverting to the “old” Google Groups revealed something totally different. The older interface reflected that we did not have a single post for June yet! That would be inaccurate, since both PK and I had posted on the mailing list just a few minutes ago. A possible explanation could be the difference in time zones. If Google works in some western time zone, then our posts were indeed in May (2am, June 1, 2013, IST would still be May 31, 2013 at many places on the planet). Well, if that’s the reason, how does one justify the 183 posts in June, 2013?

The “old” Google Groups. June doesn’t yet have posts? Incorrect again!

Feel free to write to us, if you have encountered a similar bug in the past. If you haven’t, we’d be glad if you can give it a shot! Stay glued to Google Groups, just past midnight on a month end, and check what’s going on! 🙂

Stay tuned for more “bug reports” from Precog@IIITD!

PK and Prateek

[0] http://onefoottsunami.com/2013/01/04/android-issue-38538/

See it, while it’s hot! MultiOSN: Monitoring real-world events on online social media

Today, the world is a place where “chats” refer to Facebook chats, when people “hang out”, they are referring to Google+, and “following” someone is a Twitter thing! The penetration of social media into the common Internet user’s life has been so intense, that people literally “tweet” about an earthquake before running to safety!

Online social media has become one of the fastest, and most widely used means of information transfer today. Especially, when it comes to news, a big proportion of people look for breaking news on Facebook and Twitter! This paradigm shift has resulted because of multiple reasons, the reach of the Internet and online social media, the crowd-sourcing aspect, and the immediacy factor. By and large, online social media has become the best place to look for the latest activity, and keep up-to-date. Acknowledging this fact, and the role of online social media in the modern world, we at Precog@IIITD, have come up with MultiOSN, a tool which monitors multiple online social media during real-world events, and presents analytics based on real-time activity. MultiOSN is our first baby step towards building real-time event monitoring systems to extract knowledge, make interesting analysis and inferences from the data, and visualize the data in usable form, which can help somebody with actionable information. Currently, MultiOSN tracks five social media services viz. Facebook, Twitter, YouTube, Google+, and Flickr.

MultiOSN provides basic, but crucial information floating all over the web of online social media, about real-world events. The number of posts per hour, in the past 24 hours, geographical locations from where these posts have been made, and sentiment analysis are among the few analytics that are presented. Events like Boston Marathon blasts are a perfect example of the kind that can be tracked by organizations / individuals using MultiOSN, and utilize the analytics to potentially detect and prevent further damage. We believe these types of analytics during events like Mumbai Blasts, North Eastern Crisis, can be of great help to various departments of National Governments. For the common users, MultiOSN can be used to visualize events like the IPL (Indian Premiere League) to see which team is being talked about, which players have been making an impact, what is the sentiment of social media users towards the IPL, etc. What makes MultiOSN effective is the fact that all analysis is updated and shown in real-time; while the event is in progress in the real world. Such monitoring can be immensely effective in disaster management during emergencies; in the past we have analyzed various events of emergencies in India (past work). For example, the news of earthquakes, riots, etc. has been witnessed to break faster on social media than by any other means. This kind of critical information about earthquake locations and magnitude, riot locations, if monitored in real-time, can help minimize damage in areas which are expected to be affected next by such events. This is one of the major endeavors of MultiOSN.

The system is now live at http://precog.iiitd.edu.in/tools/beta/multiosnportal/. Feel free to explore more, and email us your valuable feedback at pk [at] iiitd [dot] ac [dot] in. For more details and insights into MultiOSN, please read the technical report here.

Image credits: http://redcrosschat.org/wp-content/uploads/2012/10/205547170462558700_Ks134xFV_c.jpg

The Republic of Ireland

Football and booze. If those are not the first things that come to your mind when you think of Ireland (or the entire EU for that matter), you’re probably not in the right zone. I didn’t exactly know what to expect when I was about to board my first international flight to Dublin. 19 hours later, I had the answer. Perhaps, it wasn’t about how much the place could offer, it was about how much I was ready to accept!

Apparently, I had landed on a Friday, and there was a long weekend to follow. Day 0 (the day I landed) was damn cold by Indian standards, and I was very tired after the long flight. But the mind refused to shut down and was super-keen on looking around, exploring the new place! The breath-taking greens, the tidy streets, the little traffic and the fresh air were amongst the very first things which caught my attention. Thanks to Sandipan, PK’s friend, who showed me around! I met my mentor, Dr. Maura Conway and Dr. Lisa McInerney, shifted to my apartment with lots of help from Sandipan, bought some stuff to eat and then I was pretty much, all on my own. I had to wake up to a morning to make sure I wasn’t dreaming! The next morning was a different experience altogether. I could not comprehend what I was supposed to do! Perhaps, just breathe and take some time to sink in to this new heavenly place! A visit to the sea side on a sunny Saturday marked the perfect beginning of the trip… Although the wind was chilling to death, the exotic view of the sea-side was inexplicably awesome!

Then came the big day. Tuesday, May 8, my first day at work at the Dublin City University! The feeling was a mixture of nervousness, anxiety, pride and excitement, all at the same time. I went to Dr. Maura’s office in the morning, and she got me started, running around with me to get me my ID card, my desk, access to the lab, and other stuff. She is one great person I must say! She took care of everything so well, and it was a smooth beginning. She even took us out for dinner the same evening!

During the first couple of weeks, I did not get to speak to a lot of people. The students in the lab would work all day, and there would be absolute silence around! I was amazed to see people walking out of the lab if they had something to talk about… Coming from a place where the noisiest place is the lab, I was taken by surprise! It wasn’t long before I started finding a few friends. Students here are really nice. A couple of girls came up to me and we introduced ourselves. Soon, I found an Indian, in fact, some one who lived just a stone’s throw away from my house back in New Delhi! That was shocking!

3 weeks into DCU, it was my 23rd birthday. I was expecting this one to be a silent day. No one around knew. Well, that’s what I thought! But thanks to my advisor, PK, who (I learnt lately) told Dr. Maura about it! Maura offered me to go out to a friend’s place for dinner. I instantly agreed, and thought I’d tell them it was my birthday after the dinner. But to my surprise, it was actually my own birthday dinner I had been invited to! That was the sweetest gesture I’ve ever come across in my academic life! It was a majestic experience… Birthday dinner, the Irish way. Candle lights, small cup cakes with candles, Irish food, and wine. I’m sure it would have been anyone’s dream evening! Especially, when it came as a surprise! I even got a DCU pullover for my birthday gift, again, thanks to one of the sweetest person I’ve ever come across, Dr. Maura. 🙂

The birthday party pretty much marked the beginning of the wild time I had here! I came to know more people, started going out with friends, started enjoying the night life here, basically, it turned out to put me into “party” mode! I was lucky to find a wonderful group of friends, which included people from all over the world! I met people from Spain, Poland, Romania, Greece, Costa Rica, Germany, France, Italy, Japan, Taiwan, Ireland (of course), and more… To add to the buzz, the Euro Cup football began, with Ireland qualifying for the tournament after 10 years! Streets and pubs started to fill with enthusiastic supporters cheering for Ireland and singing “holy chants” for the “boys in green”. The atmosphere was electric! I watched all the three matches that Ireland played, with friends at different pubs. That wasn’t all. The partying went to another level when we hung out at nights and boozed and danced till the pubs shut down and kicked us out early in the mornings! It was one of these nights that I tried my first Tequila, and got a bad headache next morning…

But while in Ireland, I also got to learn a lot. The European work culture is different from the Indian one in multiple ways! 8:00 am to 6:00 pm is a strictly followed working period and is often productive. At the same time, evenings and weekends are mostly spent work-free, unless there is a real need to work! I also got the chance to be a part of the School of Law and Government here. Dr. Maura comes from the School of Law and Government, so it was my first time working with a non-computer science mentor! The experience was quite amazing (and sometimes, even amusing) since there were significant differences in the way we approached the research problem we were working on. It was also nice to know how non-computer science students pursued their research, and how they were keen and excited to learn about problems in security and privacy in computer science!

Overall, the visit was amazing both personally and professionally, and the memories and learnings would remain with me for a long time! Below is a picture of me at the Dublin Zoo. Yes, that is a real giraffe, and if you can notice the ostrich in the background! 🙂

Come back soon for more experiences and fun-reads from me and PreCog!