The journey of an unlikely doctor

My journey at IIIT-Delhi started back in July 2010. To put things in perspective, that was the year when Steve Jobs introduced iPhone 4 and the first ever iPad. Game of Thrones was a TV series unheard of. Sachin Tendulkar scored the first ever double hundred in a one day cricket match. Barack Obama had just completed 1 year as the president of the United States. And Tinder and Instagram did not exist.

The past seven years have taught me more about myself than I could ever expect to learn probably during the rest of my life. I learned what I want to do in life. More importantly, I learned what I do NOT want to do in life. The credit for that goes collectively to the institute (IIIT-Delhi), the group (Precog), the advisor (PK), and the process (PhD). This can seem to be a long blog so I’ve tried to break it down into sections.

Epilogue
The review read, “He is a smart student, but definitely has not picked up how to do research. He needs to really pick up some concrete questions and contribute with research work by end of the semester, otherwise, it is going to be tough to continue.” The illusion of success had been busted wide open.

The Beginning (July 2010 – Early 2012)
My association with IIIT-Delhi started in July 2010 when I joined as a Masters student. Too many people have talked about the cultural “shock” (in the positive sense) that they witness when they first come to IIITD. My case was no different, so I’ll skip that part. Moving on to the academics, life seemed pretty simple in the beginning. Putting in effort was translating to output and good results. One semester went by, and then another. A glowing grade card, a fellowship and an internship with IBM, and an amazing bunch of friends made life seem pretty much perfect. It was around this time [August of 2011] that I (and a few others from my batch) got an email from Prof. Pankaj Jalote, offering direct admission into the PhD program. I knew this was big, but at that point in time, I didn’t realise the magnitude of the impact that this decision [of enrolling into a PhD program] could have on my life and career. So I let my instinctive, opportunistic self drive this decision and without too much thought, I signed up within 3 days of getting the offer.

Note: This is probably not the best way to make such a big decision. I consider myself unbelievably lucky to have been able to pull this off with such an approach.

One important factor that pushed me in favour of taking up PhD was the individual I would be working with, the advisor. In retrospect, I think it was the excitement of working with PK more than anything else that tipped me into making the decision.

The Local Maxima: Illusion of success (Early 2012 to Mid 2014)
Having accepted the PhD offer, the plan was to wrap up all requirements for the Masters degree and officially join as a PhD candidate only during the second half of 2012. But something else was in store. Right around the time the PhD offer came in, PK had connected me with Dr. Maura Conway from Dublin City University regarding a research project which was part of the seventh Framework Program (a.k.a. FP7) for research in Europe. Things started moving along at a decent pace and we made some progress in the next 4 months or so. Now the thing with projects like FP7 is that they come along with their perks. January of 2012. PK sat me down and said that I’d need to travel to Ireland for around 3 months as part of the project. This is where the “honeymoon period” starts. My starting date as a PhD candidate was moved up and I officially started PhD on February 4, 2012 (it made more sense to travel as a PhD student instead of a Masters student). I was already starting to realise the differences between being a Masters student and a PhD student. I had “levelled up” in life. I had my own desk for the first time in my life. Friends and family had started looking at me differently. There was a certain sense of respect I could feel. Cheap thrills! Anyhow, a trip to Dublin was waiting. The semester ended and I was off. May 3, 2012. My first international travel. Fully funded. And I had a ball. Read all about it in another blog I wrote. I’d like to think that the 11 weeks I spent at DCU were fairly fruitful. We got a paper submitted at a conference, I completed my Masters requirements, the exposure of an international collaboration helped me understand how the outside world functions, and the amount of fun I had was absolutely amazing.

By the time I came back [July 2012], I was already part of another collaboration with Tiago and Prof. Virgilio from UFMG, Brazil. Another international collaboration with some big names, some ground work was already set up and that made life easier. Tiago and I started working on uTrack, incremented the work and submitted it to WWW, which is possibly the largest and biggest conferences in the Internet domain. Amazingly enough, it went through. Now getting this kind of success [paper at the main track of the world’s biggest conference] at such an early stage in PhD makes you feel like the James Bond of research. You tend to get the feeling that this journey is going to be a piece of cake and you’ll ace it every step of the way. And that is NOT good news.

uTrack was followed by MultiOSN, my first independent project, which was featured in national level news and media during mid 2013. I was enjoying my work. The lab was practically a second home. The year ended with an opportunity to work with Symantec Research Labs for 6 months, which I gleefully accepted.

To summarise this phase of life, I had two successful international collaborations, a project that had made national news, and an association with one of the world’s biggest and most popular brands in computer and web security. This was stuff that would make a glittering CV. All these events put me on my high horse and created the illusion of grand success at the PhD level. And this was building on top of the glorified GPA I had during my Masters degree. It was almost as if I couldn’t put a foot wrong. Little did I know that the real journey of PhD hadn’t even begun.

The Global Minima (Mid 2014 to Mid 2015)
Before I went to Pune for my 6-month long internship with Symantec in December 2013, PK clearly underlined that I “needed” to have a concrete problem defined for my PhD thesis before / by the time I came back. I agreed. Everything I had done up until this point, wasn’t really building up to a thesis. They were small independent projects, which might have come out well individually, but weren’t really building up towards a solving a clear-cut PhD problem.

PK always talked about an input vs. output graph and said that most of us fall in the high-input-medium-output zone or probably even high-input-low-output-zone. Up until now, I didn’t really agree with him. The first part of my PhD life seemed to lie kind of like in the medium-input-high-output zone. But this was the phase where I realised what PK really meant. December 2013 to May 2014 was high input time for me. In addition to the project I was working on with Symantec, I was also working towards building a concrete problem for my thesis. I used to start my days at around 10 am, go to office, work, come back at 7-8 in the evening, sat down again at around midnight and worked until 3-4 am. Pretty much high input time.

June 2014. I came back, wrapped up the Symantec work, submitted it to a conference, and it went through. However, I still did not have a concrete PhD problem. Most of what I did during the past 6 months went to dust. It was ok for some initial ground work, for scratching the surface, but nowhere close to a well-defined problem to solve. High-input-low-output had come true. And this hit me pretty hard. Research did not seem to be easy any more. I didn’t know where I was heading. I didn’t even know where I had to go. I had spent 2 and a half years in the program and I still did not have a problem statement. The comprehensive exam (somewhat equivalent of what is referred to as a qualifying exam in other universities) was overdue. I got an “Average” rating for the yearly review. The review read, “He is a smart student, but definitely has not picked up how to do research. He needs to really pick up some concrete questions and contribute with research work by end of the semester, otherwise, it is going to be tough to continue.” The illusion of success had been busted wide open.

The first few conversations I had with PK during the second half of 2014 highlighted the problems with my PhD journey more and more. It was during one of these conversations where I was given the ultimatum. Getting my act together or leaving [literally] were the only options I was given. I stepped out of the meeting and I was in tears [which is pretty rare by the way]. I came back to the lab (which was somehow empty that time of the day) and started evaluating my options while not being able to control those salty droplets dripping out of my eyes. I was seriously contemplating the option of dropping out. “I’ll figure out something, I still have a Masters degree”, I said to myself. It took an hour of consoling from Niharika to get me back to my senses.

I was shattered. This was probably the lowest point in my life. Starting 2006, I had been on top of my game. Aced undergrad, aced Masters… But as they say, “the higher you fly, the harder you fall.” Over the next few days, I told myself that I couldn’t just drop off. Overcoming this failure would probably be one of the biggest features of my life (only if I could pull it off). Failures are supposed to be part of life. Success is meaningless without failures. Basically, all those classic motivational one-liners were suddenly relevant now.

The next few months were all about trying to regroup my thoughts, learning all about research [I had realised I knew jack about research], and nailing down a concrete problem. I got it down to 1) poor quality content, and 2) Facebook. I told PK that this was the intersection I would be working in. I read some literature and put together a literature review document for my comprehensive exam which took place in November 2014. Over the next few months, I worked on refining the problem and chalking out a potential thesis outline.

Note: This phase of my life featured a 10-day trip to the US for presenting a paper, and a 6-week internship to DCU, Ireland [yes, again] too. With the kind of excitement I usually carry for international travels, these should’ve been the highlights of this part of the story, but as was evident, something bigger was going on during this time. Adventures from the travel, maybe some other time.

The Pursuit of Happiness (Mid 2015 to Early 2017)
Three years into the program, I had virtually nothing that would be part of my thesis. But the good part was that the worst was over. The next 16 months or so were all about constructing the castle of my thesis brick by brick.

I’ve always been inclined towards “building systems” more than studying / reading about things, i.e. I prefer the “practical” over the “theoretical”. However, PhD [in most cases] is incomplete without theory. I couldn’t keep myself away from thinking about a practical solution approach when I saw a problem, and at the same time, I knew that PhD would keep itself away from me if I didn’t do well at theoretically backing up a proposed solution. I had to cater to both, and I was able to do it in the form of Facebook Inspector. I identified a problem [of poor quality content running riot on Facebook], gathered some data, used the all-mightly theory-heavy machine learning and natural language processing to solve the problem, and put out the solution in the form of a system for everyone to use. That’s pretty much what Facebook Inspector was [easier said than done, trust me]. It satisfied my appetite for “building” systems and at the same time, had the potential to bring theory to the scene.

I did one project and wrote a paper, then another, and another, and went on and on. Each paper (or project I took up) was completely defined by me, unlike the kind of projects I had done during the initial phase of my journey. I learnt from every project and used the learning to improve the quality of the next project. I slowly gained my confidence back. I was getting back on my feet, but somewhere inside, I knew I wasn’t the best at what I was doing. Whatever I was doing “needed to be done” more than me “wanting to do it”. Whether I wanted to do this for a career was questionable. I had come to terms with what PhD research was, but it had taken me slightly far away from what I “really” wanted to do. Would this last forever? Would things change after some time? Would I be able to get back to doing what I want and still be good enough at research? I didn’t have the answers. Moreover, I didn’t have the time to look for answers. Building a strong thesis was top priority. It was the only priority.

Meanwhile, the work I was doing [identifying poor quality content on Facebook] was getting hotter and hotter in the community. Facebook started frequenting the news; media houses held the social network responsible for the spread of one rumour or hoax after another. The relevance of my work shot up, giving me confidence to pitch my thesis as timely and important.

The summer of 2016 helped me discover a new side of me. I was one of the “senior” people in the lab by now. Our group [Precog] was getting recognition. Students from other institutes in the country (and even abroad) wanted to come to us to work. This was an opportunity to “lead” a project instead of doing one. We got lots of very smart students from different parts of the country (and the world) to work with us during the summer. This is what is now famously known as the #PrecogSummer. As for me, three smart students, 3 months of summer, and a completely new area [computer vision] produced some really exciting output. I had not only gotten into a new area of work, but also into a new role; a “project leader” of sorts. And fortunately, it all worked out well. All the three students were happy and satisfied about what they had done during the summer, and so was I. More importantly from a personal perspective, getting back to doing something that was appreciated and taken well by the advisor [PK] really helped me get back my confidence and my competence.

It was at this point that I started getting back to being my confident self again, without worrying too much about what would happen to my PhD. PK had shown faith in me, helped me get back on my feet, encouraged me and made me believe that I still belonged to this league. Looking back now, I realise how big a risk he took with me. An unsuccessful PhD candidate doesn’t look pretty on any professor’s profile. He took the risk anyway (and I’d like to believe he would feel it was worth it :P). The advisor showing faith and trust in you is a game-changing phenomenon during PhD life. I’ve been lucky to have experienced this phenomenon.

Mind you, everything I did during this phase, had still not converted to acceptable output. I barely got enough to keep me afloat, but I needed more papers accepted at better places to have a strong enough / defendable thesis. Rejected drafts kept piling on. I kept on resubmitting to different venues. By January 2017, I had done enough work to make up [what I thought would be] an acceptable thesis. PK and I had already agreed on this. It was time to start wrapping up. More “new” work wasn’t needed, but existing work needed fixes and acceptance in the community. However, this wasn’t reason enough to stay a PhD student for long. It was time to move on, to start looking for a job, for a life after PhD.

The Mini Panic Attack (Feb 2017 to April 2017)
By February 2017, all of my work had either concluded, or under review somewhere. There was always scope of starting new projects, but it was time to step out into the real world. Browsing LinkedIn for jobs had become the most common thing to do these days. At the same time, I was facing an internal conflict. All my student life, I wanted to go outside India to work after finishing studies. But finding a job in Europe or the US is pretty goddamn hard for the simple reason that you’re not a local. Why would any organisation go through the pain of bringing a foreign national onboard. Not only is it complicated, but expensive and time consuming as well. A good alternative was post doctoral research (famously a.k.a. postdoc). This route was comparatively much easier. However, postdoc would have meant continuing to do academic research, something which I already confessed I wasn’t the best at, and something I wasn’t sure I really “wanted” to continue doing. The only option left was industry in India, but then I wanted to go outside!

While I was going through this internal conflict, I got news that Niharika was leaving. Leaving for good. She had wrapped up her work and I knew she was looking for job opportunities. She got a pretty amazing job offer and she had to join soon. Of course I was happy for her, but this still hit me like an absolute shocker. Why? I’d like to use up the rest of this paragraph to take a bit of a detour here. Niharika and I started our journeys together way back in July 2010 as Masters students at IIITD. We did almost all our courses, assignments, and projects together during M. Tech. We joined PhD together, with the same professor, in the same lab, around the same time. We also shared our journey back home together for the first couple of years. We shared big chunks, and all the ups and downs of our PhD lives with each other. Essentially, she was one constant in my life at IIITD who was there every single day, every step of the way for the past 7 years. And when I realised that this constant wasn’t going to be there any more, it gave me a mini panic attack at multiple levels.

After that, somehow, the lab didn’t feel like home any more. Every day I stepped into the lab since, something told me that I HAD to leave soon. Not only had one of my closest associates left, this was also a reminder that it was high time for me to make a move on, too. I was lagging behind. Apparently, I wasn’t taking job hunting too seriously. But now I knew I had to step this up. I started preparing for interviews and started setting up interviews with any company that would show interest.

The Flurry Towards The End (April 2017 to June 2017)
While I was coping with the shrewd job market as a fresher, my inputs over the last year or so started showing output. One paper got through a journal. Then another got through at a decent conference, and another as a book chapter. I also got an opportunity to spend 3 months as an intern at IBM Research. All this happened within a span of a couple of months or so.

The IBM internship was important at two levels. One, this was perfect timing for me to “impress” IBMers to get a shot at a full time position, and two, this was my opportunity to leave the lab for good and move on in life. However hard it may sound, I knew it had to happen. It was about time. So I thought the sooner the better. April 30, 2017. My last day in the lab. A mixture of emotions amidst a goosebumpy level farewell from fellow lab mates. I was excited to have “potentially” levelled up in life, but at the same time, it isn’t easy to go away from a place where you’ve spent 7 good years of your life. Above all, this opportunity wasn’t technically a move on [that’s why I said, “potentially”]. If I couldn’t convert this internship into a full time position, or couldn’t find another job in the next 3 months, I would have no where to go. I had already told myself that coming back to the lab was not an option. This feeling really, really stressed me out.

I moved to Bangalore the very next day and joined IBM as an intern on May 2, 2017. The next three months were pretty eventful. A completely new area of work, regular outings with fellow interns, and job hunting kept me pretty occupied. The month of May was particularly stressful. Consistent rejections from job interviews were eating me up from the inside. Stress levels were at an all time high. Normally, no matter what happens during the day, I usually sleep well at night. This was probably the first time in life that I started encountering sleepless nights. Was I being too harsh on myself? I don’t really know. After all, I really did not have a plan in life after July 2017! How many rejections would it take to get one acceptance? Would that “one” acceptance be the only acceptance I’d ever get? What if I don’t like that “one” and don’t get another chance either? Finding the first job on your own is one hard nut to crack! No matter what your CV or qualifications say.

The Last Ball Sixer
Among the many applications that I had put in, was an application for a Data Scientist position at Apple [that company which makes fancy, expensive phones]. A few days after I put in the application on the Apple website, I received an email from someone at Apple asking for my availability for an interview. This was kind of unbelievable. I was struggling to get through tiny startups! Moreover, the response rate that I was experiencing was no more than 2 in 10 applications [in the best case]. I looked at the email header to verify if the email was genuine. It seemed ok. I confirmed the interview which was supposed to happen later that week.

Four rounds of interviews and one month later, the HR called me for a face-to-face at Apple’s Bangalore office and told me that I would be getting an offer. I had made it through. THIS was a moment for which words like “flabbergasted” are made. I did my best to control the excitement but kept this strictly to myself. I had to see the offer with my own eyes before sharing this with anyone. Another 3 weeks later, I got the offer which I accepted. Done deal. My biggest cause of stress had vanished. I finally knew what I would do, where I would go, and where I would be [hopefully for a foreseeable future] after July 2017. I shared the news with family and friends. All my near and dear ones were ecstatic.

Here I was, struggling with my PhD until less than a year ago, knowing I wasn’t the best at what I was doing, barely making progress in life, and all of a sudden, I had a contract with one of the world’s best. What’s more, the profile I was offered, wasn’t academic research. It needed “practical” applications of whatever I had learnt, to address real world problems. The moment I was told what my job profile at Apple would be like, I knew this was what I wanted. Like Steve Jobs once said, “A lot of times, people don’t know what they want until you show it to them.” I can’t think of another moment in my life where this could be any more relevant.

Meanwhile, one of the papers that I had gotten accepted a few months ago, had me visit Sydney, Australia for a week for the presentation. This trip had come right in between the IBM internship and joining Apple. Exactly what I needed. This was the cherry on top of the metaphorical cake. Life was sorted. Everything was in place. I couldn’t see how anything could’ve been better than how it was in the current state. PhD journey had come to an end. The real world was waiting. All’s well that ends well. Happy ending.

The People, The Lab, The Culture
One consistent factor throughout my journey (and probably everyone’s journey) at Precog was the people, the lab, and the culture. We were always a happy bunch of kids who worked together and partied together. There was always someone for everything. From people who’d leave behind their work without a second thought and brainstorm with you to solve your problem, to people who’d be ready in a jiffy if you ever felt like going out for a drink, we were a great mix of everything. We never let anything get to us. No matter how big the problem, we knew we’d eventually figure it out. All of us, as a group. It was towards the end that I truly realised the true value of having such a strong support system. Surviving 5 years of PhD [especially for someone like me] without such an environment at work would have easily been at least a hundred times harder. Ofcourse there’s the commander-in-chief [PK], but it is also the people that make Precog magical.

The Most Important Learning
People often say that PhD is not a destination, it’s a journey. What they don’t say is that it is a journey of self-discovery that humbles you to the core. This is a phase in life where you get the freedom to do what you like, fail, pick yourself up, repeat, and figure out what’s best for you. What I learnt about myself was what I did NOT want to do in life; academic research. As ironical as it may sound, this was my biggest and most important learning from PhD. I was doing academic research but it didn’t really come to me naturally. I trained to work hard, be patient, and persevere. I wasn’t a natural at any of them. Mind you, these skills are priceless to possess and I’ll always appreciate learning them no matter how hard it was. I could train myself and get better at academic research. In fact, I did to an extent. But I knew if I had to do it for long, I wouldn’t be my happiest self.

If I took up academic research as a career, I would always think that there’s something else out there that I might be happier doing. So I made my choice. Academic research wasn’t for me. I had to push myself away from it, and move into something more and more hands-on, something where I could spend more time experimenting, failing, learning, improving, improvising, and innovating but without the infinite time-consuming cycles of cramming my ideas into eight (sometimes ten) page documents packed with jargon, and defending them to an unknown bunch of people through a peer review process.

“Gyan” that worked for me
1. Louis Pasteur once said [PK also said this a few times], “Fortune favours the prepared mind.” What kept me from breaking during my journey was the fact that I was always prepared. Prepared for the best and for the worst. If you’re prepared, nothing can surprise you. Eliminating the element of surprise from a situation deflates half its impact already, making it easier to deal with.
2. Harvey Spector says, “Anyone can do my job, but no one can be me.” You’ll always be replaceable. There’s nothing you can do that someone else can’t. It’s only a matter of finding the correct replacement. So be more than a machine. Bring more than your technical skills to the table. What you do has a price. Who you are, doesn’t. There’s a difference between what your value is, and what your worth is. Make yourself worthy in addition to being valuable. Because there will always be someone else who can do your job [maybe even for a better value], but you’d be preferred only if you’re worth it.
3. “The hardest choice may not always be the best choice.” When someone says something will be really hard for you to do, your natural instinct is to prove you can do it, and work as hard as it takes to eventually do it. It is a great confidence booster if you are able to do it. But before getting into it, pause for a moment, and think. Just because it is hard, is it the best thing for you to do? I knew great academic research was a hard nut for me to crack. But more importantly, I realised that just because it was hard, didn’t mean it was the best thing for me. So I didn’t try to take it head-on. Instead, I focused on developing a skill set that would help me get a job. This connects to the philosophy of “picking your battles wisely” and working “smart” over working “hard.”

Acknowledgements
I’ve been fortunate to have some extremely smart, fun, and hard working people during PhD. There are too many people to thank here and I would prefer not to specifically add names simply because of the fear that I might miss out someone, which I would hate to do. So, to everyone who has been associated with me during this journey, you know if you are one of them. I’d like to say, thank you.

MT10014 and PhD1111 signing off.


Cheers!
Dr. Prateek Dewan
B. Tech., M. Tech., Ph. D. (OMG)

There’s misinformation on Facebook. Here’s how you deal with it.

I’ll keep this short and to the point. There’s a sudden backlash on Facebook for hosting misinformation [1], and polar politics [2] after the recent elections in the USA. Is this new? NO.

Let me take you to back in time, to March 2014. The deeply tragic incident of the Malaysian Airlines Flight MH370 wiped off an entire aircraft and all on board [3]. A sea of prayers and solidarity followed on all social networks including Facebook. What also followed was a series of fake, misinformative posts, links, and videos claiming to show you the footage of the aircraft crashing [4], and rumors claiming that the plane had been found in the Bermuda triangle (see image of one such post below). Such footage never existed.

http://www.hoax-slayer.com/images/malaysia-airlines-MH370-scam-1.jpg

Following this incident, there have been a series of events where miscreants have exploited the context of a popular event to spread hoaxes, misinformation, rumors, fake news, etc. From the rumor of the death of comic actor Rowan Atkinson (a.k.a. Mr. Bean) to the suicide video by late legendary actor Robin Williams, misinformation has plagued Facebook for years, and is continuing to do so. While Facebook has recently acknowledged misinformation to be a serious problem, we at Precog had already started working on it when we first came across instances of misinformation. So how do you really deal with misinformation and rumors and hoaxes and fake news on Facebook?

There have been a few attempts to solve this problem. Facebook posted a series of blogs vowing to improve their algorithms to reduce misinformation, hoaxes, rumors, clickbaiting, etc. [8, 9, 10, 11, 12]. A recently conducted hackathon by Princeton University also witnessed a group of 4 students attempting to fix this problem [13]. Well, as it turns out, we took a dig at this problem over 2 years ago, and came up with a robust solution of our own. In August 2015, we publicly launched Facebook Inspector, a free, easy-to-use browser extension that identifies malicious content (including the type we just discussed above) in real time. At this moment, Facebook Inspector has over 200 daily active users, and has just crossed 5,000,000 hits (it’s 5 million; but it’s just fun to write it with so many zeros xD). We leveraged multiple crowd sourcing mechanisms to gather a pool of misinformative and other types of malicious posts, and harnessed them to generate a model to automatically identify misinformative posts, hoaxes, rumors, scams, etc.

Give it a try. Download the Chrome version at https://chrome.google.com/webstore/detail/facebook-inspector/jlhjfkmldnokgkhbhgbnmiejokohmlfc

Firefox users, download at https://addons.mozilla.org/en-US/firefox/addon/fbi-facebook-inspector/

To read the entire story behind the inception of the idea, and incarnation of Facebook Inspector, read the detailed technical report here.

So we spotted a problem a couple of years ago, took a dig at solving it (and I’d like to believe we succeeded), and apparently, the entire world is after Facebook for the same problem today. But misinformation, hoaxes, and rumors aren’t the only big problems that Facebook is surrounded by. Lets talk some more about the US elections. Facebook’s algorithms have been accused of reinforcing “political polarization” by Professor Filippo Menczer in a popular news article [2]. Apparently, Facebook is home to a big bunch of political groups which post polarized content to influence users towards / against certain political beliefs. Whether such content should be allowed on social networking websites, is debatable. After all, free speech is a thing! But the question that demands attention here is, did these politically polarized entities suddenly appear on Facebook around the election time? I mean, if they would’ve been around for long, Facebook would’ve known, right? And the effects of social network content on elections are well known and studied [5, 6, 7]. So Facebook would’ve definitely done something to at least nudge users when getting exposed to polarized political content. But polarized political content was never a point of concern for Facebook. So it probably didn’t exist until right before the elections. Right? Wrong!

Well, this is a literal “I told you so moment.” Last year, we conducted a large scale study of malicious Facebook pages, and one of our main findings was the dominant presence of politically polarized entities on Facebook among malicious pages. We analyzed the content posted by these politically polarized pages, and found that negative sentiment, anger, and religion dominated within such content. We reported our findings in the form of a technical report: https://arxiv.org/abs/1510.05828v1

It is good to know that what you work on, as part of research, connects closely to relevant, present day, real world problems, but it isn’t really a good feeling to realize that something you already knew could happen, happens anyway. We at Precog always push towards trying to make a difference and making the online world better and safer. We try our best, but we can only do so much.

To conclude, not bragging here (well, it’s not bragging if it’s true!), but we saw not one, but two real problems coming, more than a year before Facebook did.

You see, we’re called “Precog” for a reason. *mic drop*

References

[1] https://techcrunch.com/2016/11/10/facebook-admits-it-must-do-more-to-stop-the-spread-of-misinformation-on-its-platform/

[2] https://www.theguardian.com/technology/2016/nov/10/facebook-fake-news-election-conspiracy-theories

[3] https://en.wikipedia.org/wiki/Malaysia_Airlines_Flight_370

[4] https://www.scamwatch.gov.au/news/scammers-using-videos-of-malaysian-airlines-flight-mh370-to-spread-malware

[5] Williams, Christine B., and Girish J. Gulati. “Social networks in political campaigns: Facebook and the 2006 midterm elections.” annual meeting of the American Political Science Association. Vol. 1. No. 11. 2007.

[6] Williams, Christine B., and J. Girish. “Social networks in political campaigns: Facebook and the congressional elections of 2006 and 2008.” New Media & Society (2012): 1461444812457332.

[7] Douglas, Sara, et al. “Politics and young adults: the effects of Facebook on candidate evaluation.” Proceedings of the 15th Annual International Conference on Digital Government Research. ACM, 2014.

[8] https://newsroom.fb.com/news/2015/01/news-feed-fyi-showing-fewer-hoaxes/

[9] http://newsroom.fb.com/news/2016/08/news-feed-fyi-further-reducing-clickbait-in-feed/

[10] http://newsroom.fb.com/news/2014/11/news-feed-fyi-reducing-overly-promotional-page-posts-in-news-feed/

[11] http://newsroom.fb.com/news/2014/08/news-feed-fyi-click-baiting/

[12] http://newsroom.fb.com/news/2014/04/news-feed-fyi-cleaning-up-news-feed-spam/

[13] http://www.businessinsider.in/It-only-took-36-hours-for-these-students-to-solve-Facebooks-fake-news-problem/articleshow/55426656.cms

The complete picture: Visual Themes and Sentiment on Social Media for First Responders.

Researchers and academicians all over the world have conducted numerous studies and established that ​social media plays a vital role during crisis events. From citizens helping police to capture suspected terrorists Boston Marathon [5], to vigilant users spreading  situational awareness [6], OSNs have proved their mettle as a powerful platform for information dissemination during crisis.

Most of the aforementioned work has relied on textual content posted on OSNs to extract knowledge, and make inferences. Now the thing is, that online media is rapidly moving from text to visual media. With the prevalence of 3G, 4G technologies and high-bandwidth connectivity in most Internet enabled countries, images and videos are gaining much more traction than text. This is also natural, since the human brain is hardwired to recognize and make sense of visual information more efficiently [1]. Just using text to draw inferences from social media data is no longer enough. As we discussed in our previous blog, there is a significant percentage of social media posts which do not contain any text. Moreover, there’s also a large percentage of posts which contain both text, and images. The point to keep in mind here is, that images and text may be contradicting each other, even if they’re part of the same post. While text in Figure 1 inspires support and positive sentiment, the image (or more precisely, the text in the image) is pretty negative. This is what current research methodology is missing out on http://followersguru.net/.

Example of Facebook post

Figure 1. Example of a Facebook post with contradicting text and image sentiment.

Continuing our work on image and online social media, we​ decided to dig further into images posted on social networks, and see if images could aid first responders to get a more complete picture of the situation during a crisis event.​ We collected Facebook posts published during the attacks in Paris in November 2015, and performed large scale mining on the image content we captured. Typically, monitoring the popular topics and sentiment among the citizens can be of help to first responders. Timely identification of misinformation, sensitive topics, negative sentiment, etc. online can be really helpful in predicting and averting any potential implications in the real world.

​We were able to gather over 57,000 images using the #ParisAttacks and #PrayForParis hashtags put together, out of which, 15,123 images were unique. Analyzing such a big number of images manually is time consuming, and not scalable. So we utilized state-of-the-art techniques from the computer vision domain to automatically analyze images on a large scale. These techniques include Optical Character Recognition (OCR) [2], image classification, and image sentiment identification using Convolutional Neural Networks (CNNs). Figure 2 shows how a typical CNN model processes and classifies images [4].

Figure 2. Typical CNN model for object identification in images. Image taken from http://cs231n.github.io/convolutional-networks/

With all these “weapons”, we set out to mine the sea of images and see if we could discover something useful. And we struck gold right away. We used Google’s Inception-v3 model [3] for generating tags for images automatically, and looked at a few of the most popular tags. Interestingly, we found numerous instances of misinformative images, images containing potentially sensitive themes, and images promoting conspiracy theories among popular images. By the time we identified them, these images had gathered millions of likes, and hundreds of thousands of comments and shares. Some of these examples are listed below (Figure 3 – 6) at http://followersguru.net/buy-instagram-likes/.

Figure 3. Eiffel Tower turns off its lights for the first time in 63 years. This information was incorrect. Eiffel Tower’s lights are turned off every night between 1 am and 6 am following a ruling by the French Government.

Figure 4. Image incorrectly quoting the cause of death of Diesel, a police dog that helped the police during the attacks. The French Police later clarified that the actual cause of death was gunshot wounds from the French Police fleet itself, and not the suicide bomber.

Figure 5. Donald Trump’s insensitive tweet just after the Paris attacks. As the time stamp of the tweet suggests, this tweet was posted months ago, but resurfaced just after the attacks to defame the politician.

Figure 6. Picture claiming that a muslim guard named Zouheir stopped a suicide bomber from entering the Stade de France football stadium and saved thousands of lives. As later clarified by the security guard himself, such an incident never took place. Zouheir, the security guard was stationed at a different spot.

Applying OCR on the images in our dataset, we were able to extract text from about 55% of the images (31,869 out of 57,748 images). We wondered if this text embedded in images would be any different than the text that users post otherwise, in the orthodox manner. Upon analyzing and comparing the sentiment of image text and post text, we found that image text (extracted through OCR) was much more negative than post text (the orthodox text). In fact, not only was image text more negative, it was also different from post text in terms of topics being talked about. Table 1 shows a mutually exclusive subset of the most common words appearing in image text and post text. While post text was full of generic text offering prayers, support and solidarity, image text was found to mention some sensitive issues like “refugees”, “syria”, etc.

Top words in posts Top words in images
S. No. Word Normalized frequency Word Normalized frequency
1. retweeted 0.005572571 house 0.00452941
2. time 0.005208351 safety 0.004481122
3. prayers 0.005001407 washington 0.004297628
4. news 0.004713342 sisters 0.003940297
5. prayfortheworld 0.004431899 learned 0.003863036
6. life 0.004393821 mouth 0.003853378
7. let 0.004249789 stacy 0.003751974
8. support 0.004249789 passport 0.003708515
9. god 0.00401139 americans 0.003694028
10. war 0.003986557 refugee 0.00352502
11. thoughts 0.003882258 japan 0.002887619
12. need 0.003878946 texas 0.002781386
13. last 0.003797825 born 0.002689639
14. lives 0.003734914 dear 0.002689639
15. said 0.003468371 syrians 0.002607549
16. place 0.003468371 similar 0.002573748
17. country 0.003319372 deadly 0.002568919
18. city 0.003291227 services 0.002554433
19. everyone 0.003281294 accept 0.002554433
20. live 0.003274672 necessary 0.002549604
Table 1. Mutually exclusive set of 20 most frequently occurring
relevant keywords in post and image text, with their normalized
frequency. We identified some potentially sensitive topics among
image text, which were not present in post text. Word frequencies
are normalized independently by the total sum of frequencies of the
top 500 words in each class.
We also uncovered a popular conspiracy theory surrounding the Syrian “passports” that were found by French police near the bodies of terrorists who carried out the attacks, and were allegedly used to establish the identity of the attackers as Syrian citizens. Text embedded in images depicting this theme questioned how the passports could have survived the heat of the blasts and fire. This conspiracy theory was then used by miscreants to label the attacks as a false flag operation, influencing citizens to question the policies and motives of their own government. The popularity of such memes on OSN platforms can have undesirable outcomes in the real world, like protests and mass unrest. It is therefore vital for first responders to be able to identify such content and counter / control its flow to avoid repercussions in the real world.

Figure 7. Example of a picture containing text relating to a conspiracy theory questioning how the Syrian passports survived the blasts. We found hundreds of images talking about this topic in our dataset.

Images posted on OSNs are a critical source of information that can be useful for law and order organizations to understand popular topics and public sentiment, especially during crisis events. Through our approach, we propose a semi-automated methodology for mining knowledge from visual content and identifying popular themes and citizens’ pulse during crisis events. Although this methodology has its limitations, it can be very effective for producing high level summaries and reducing the search space for organizations with respect to content that may need attention. We also described how our methodology can be used for automatically identifying (potentially sensitive) misinformation spread through images during crisis events, which may lead to major implications in the real world.

Here is a link to the complete Technical report on this work. Big credits to Varun Bharadhwaj, Aditi Mithal, and Anshuman Suri for all their efforts. Below is an infographic of work.

References:

[1] https://www.eyeqinsights.com/power-visual-content-images-vs-text/

[2] https://github.com/tesseract-ocr/

[3] https://www.tensorflow.org/versions/r0.11/tutorials/image_recognition/index.html

[4] http://cs231n.github.io/convolutional-networks/

[5] Gupta, Aditi, Hemank Lamba, and Ponnurangam Kumaraguru. “$1.00 per rt# bostonmarathon# prayforboston: Analyzing fake content on twitter.” In eCrime Researchers Summit (eCRS), 2013, pp. 1-12. IEEE, 2013.

[6] Vieweg, Sarah, Amanda L. Hughes, Kate Starbird, and Leysia Palen. “Microblogging during two natural hazards events: what twitter may contribute to situational awareness.” In Proceedings of the SIGCHI conference on human factors in computing systems, pp. 1079-1088. ACM, 2010.

#TPBT: The Pin-Bang Theory

In the monsoon semester 2012, I took a course on Privacy and Security in Online Social Media. We had to do a project on a popular online social media. Pinterest, caught my eye. It was new, it was among the TIME Magazine’s top 50 websites of 2011 and then had close to 20 million users. Its growth was amazing; in a matter of 2 years it was well integrated with popular e-commerce sites like e-bay, etsy, Amazon etc. The big white-on-red “P” next to the blue bird and white-on-blue “f” motivated me to work on Pinterest.

Share Buttons on Amazon.

Without digging much into the OSN and the fact that project proposal submission deadline was like 30 minutes away, I proudly declared that my project will entail user analysis, locating spam / malware and also touch upon copyright issues on Pinterest.
The next time I opened my project, I got my “shock of the semester”. Pinterest had no API. Third-Party python-wrappers were all useless. I will have to scrape the whole network. Thought I was able to complete only a part of my project proposal in the semester, PK sir asked me to continue working. I was joined by Neha on the project and Prateek started shepherding us.
A crawler was created to push data from Pinterest to our databases. Starting from 5 extremely popular seed users.

The darker blocks had the primary data from Pinterest; lighter blocks had associated data collected from many different sources.

We collected a massive data set of 17.9 million user handles, 3.3 million user profiles and about 58 million “Pins” from 26th December 2012 to 1st February 2013.
We then began our analysis, some of our key findings were:

  • We found that the most common topics across users, and pins were design, fashion, photography, food and travel.
  • User, pin, and board characterization: We analyzed various user profile attributes, their geographical distribution, top pin sources and board categories.
  • Exploring Pinterest as a possible venue for copyright infringement: We found copyrighted images being shared publicly on Pinterest and almost half of these images did not give due credit to the copyright owners.
  • Analysis of personal information and malicious content present on Pinterest: Users were giving significant amount of Personally Identifiable Information (PII) voluntarily. We found numerous instances where users shared phone numbers, BBM pins, email IDs, marital status, and other personal information. We also found (and analyzed) traces of malwares in the form of pin sources by using blacklists.
Heatmap
Heat-map for user locations.

The final step was finding the title. So we called upon the highly imaginative and vocal members of Precog, who in a couple of 15-minuite sessions took us from nowhere to “Pinacolada”, “Pingoo” and finally agreeing on “The Pin-Bang Theory”. For more details have a look at our technical report here.

Here is the picture of the discussion (a memorable moment indeed):

All said and done working on Pinterest was indeed an amazing experience for all us ☺

Cheers!
Sudip, Neha, Prateek

Go home Google Groups, you’re drunk!!!

Well, as they say, no one’s perfect. Not even Google! Evidence: A recent “praise the iPad” bug in Google’s Text-To-Speech [0], which has reportedly, now been rectified, went unnoticed for months!

All the geeks out there must be familiar with the concept of bugs. May it be the =rand(200,99) bug in MS word, the famous “Why can’t I create a folder named ‘con’ in Windows” bug, or the Y2K mega-bug; geeks love bugs. Their impact can vary from funny to disastrous.

Coming to the point, we (PK and myself) recently discovered a bug in Google Groups, which made me feel rather “unpleasant.” We at Precog, run a mailing list, where all members of the group post about topics of common interest, related to security, privacy, and social media etc. Google Groups provides a nice summary of the total number of topics and posts circulated on the list for each month. Last month, that is May 2013, we hit our all-time-high (#PrecogRocks) in terms of topics and posts. PK and I went to the About page to check it out, and were rather shocked to see 183 posts for the month of June already! Terrible statistics, Google! Less than 2 hours into the month of June (IST), it does not seem humanly possible to make 183 posts, right? Given that our previous best was just over 300 for the previous month, this was definitely….. a “bug”!

The Google Groups Bug: 183 posts in under 2 hours? Incorrect!

Reverting to the “old” Google Groups revealed something totally different. The older interface reflected that we did not have a single post for June yet! That would be inaccurate, since both PK and I had posted on the mailing list just a few minutes ago. A possible explanation could be the difference in time zones. If Google works in some western time zone, then our posts were indeed in May (2am, June 1, 2013, IST would still be May 31, 2013 at many places on the planet). Well, if that’s the reason, how does one justify the 183 posts in June, 2013?

The “old” Google Groups. June doesn’t yet have posts? Incorrect again!

Feel free to write to us, if you have encountered a similar bug in the past. If you haven’t, we’d be glad if you can give it a shot! Stay glued to Google Groups, just past midnight on a month end, and check what’s going on! 🙂

Stay tuned for more “bug reports” from Precog@IIITD!

PK and Prateek

[0] http://onefoottsunami.com/2013/01/04/android-issue-38538/

See it, while it’s hot! MultiOSN: Monitoring real-world events on online social media

Today, the world is a place where “chats” refer to Facebook chats, when people “hang out”, they are referring to Google+, and “following” someone is a Twitter thing! The penetration of social media into the common Internet user’s life has been so intense, that people literally “tweet” about an earthquake before running to safety!

Online social media has become one of the fastest, and most widely used means of information transfer today. Especially, when it comes to news, a big proportion of people look for breaking news on Facebook and Twitter! This paradigm shift has resulted because of multiple reasons, the reach of the Internet and online social media, the crowd-sourcing aspect, and the immediacy factor. By and large, online social media has become the best place to look for the latest activity, and keep up-to-date. Acknowledging this fact, and the role of online social media in the modern world, we at Precog@IIITD, have come up with MultiOSN, a tool which monitors multiple online social media during real-world events, and presents analytics based on real-time activity. MultiOSN is our first baby step towards building real-time event monitoring systems to extract knowledge, make interesting analysis and inferences from the data, and visualize the data in usable form, which can help somebody with actionable information. Currently, MultiOSN tracks five social media services viz. Facebook, Twitter, YouTube, Google+, and Flickr.

MultiOSN provides basic, but crucial information floating all over the web of online social media, about real-world events. The number of posts per hour, in the past 24 hours, geographical locations from where these posts have been made, and sentiment analysis are among the few analytics that are presented. Events like Boston Marathon blasts are a perfect example of the kind that can be tracked by organizations / individuals using MultiOSN, and utilize the analytics to potentially detect and prevent further damage. We believe these types of analytics during events like Mumbai Blasts, North Eastern Crisis, can be of great help to various departments of National Governments. For the common users, MultiOSN can be used to visualize events like the IPL (Indian Premiere League) to see which team is being talked about, which players have been making an impact, what is the sentiment of social media users towards the IPL, etc. What makes MultiOSN effective is the fact that all analysis is updated and shown in real-time; while the event is in progress in the real world. Such monitoring can be immensely effective in disaster management during emergencies; in the past we have analyzed various events of emergencies in India (past work). For example, the news of earthquakes, riots, etc. has been witnessed to break faster on social media than by any other means. This kind of critical information about earthquake locations and magnitude, riot locations, if monitored in real-time, can help minimize damage in areas which are expected to be affected next by such events. This is one of the major endeavors of MultiOSN.

The system is now live at http://precog.iiitd.edu.in/tools/beta/multiosnportal/. Feel free to explore more, and email us your valuable feedback at pk [at] iiitd [dot] ac [dot] in. For more details and insights into MultiOSN, please read the technical report here.

Image credits: http://redcrosschat.org/wp-content/uploads/2012/10/205547170462558700_Ks134xFV_c.jpg

The Republic of Ireland

Football and booze. If those are not the first things that come to your mind when you think of Ireland (or the entire EU for that matter), you’re probably not in the right zone. I didn’t exactly know what to expect when I was about to board my first international flight to Dublin. 19 hours later, I had the answer. Perhaps, it wasn’t about how much the place could offer, it was about how much I was ready to accept!

Apparently, I had landed on a Friday, and there was a long weekend to follow. Day 0 (the day I landed) was damn cold by Indian standards, and I was very tired after the long flight. But the mind refused to shut down and was super-keen on looking around, exploring the new place! The breath-taking greens, the tidy streets, the little traffic and the fresh air were amongst the very first things which caught my attention. Thanks to Sandipan, PK’s friend, who showed me around! I met my mentor, Dr. Maura Conway and Dr. Lisa McInerney, shifted to my apartment with lots of help from Sandipan, bought some stuff to eat and then I was pretty much, all on my own. I had to wake up to a morning to make sure I wasn’t dreaming! The next morning was a different experience altogether. I could not comprehend what I was supposed to do! Perhaps, just breathe and take some time to sink in to this new heavenly place! A visit to the sea side on a sunny Saturday marked the perfect beginning of the trip… Although the wind was chilling to death, the exotic view of the sea-side was inexplicably awesome!

Then came the big day. Tuesday, May 8, my first day at work at the Dublin City University! The feeling was a mixture of nervousness, anxiety, pride and excitement, all at the same time. I went to Dr. Maura’s office in the morning, and she got me started, running around with me to get me my ID card, my desk, access to the lab, and other stuff. She is one great person I must say! She took care of everything so well, and it was a smooth beginning. She even took us out for dinner the same evening!

During the first couple of weeks, I did not get to speak to a lot of people. The students in the lab would work all day, and there would be absolute silence around! I was amazed to see people walking out of the lab if they had something to talk about… Coming from a place where the noisiest place is the lab, I was taken by surprise! It wasn’t long before I started finding a few friends. Students here are really nice. A couple of girls came up to me and we introduced ourselves. Soon, I found an Indian, in fact, some one who lived just a stone’s throw away from my house back in New Delhi! That was shocking!

3 weeks into DCU, it was my 23rd birthday. I was expecting this one to be a silent day. No one around knew. Well, that’s what I thought! But thanks to my advisor, PK, who (I learnt lately) told Dr. Maura about it! Maura offered me to go out to a friend’s place for dinner. I instantly agreed, and thought I’d tell them it was my birthday after the dinner. But to my surprise, it was actually my own birthday dinner I had been invited to! That was the sweetest gesture I’ve ever come across in my academic life! It was a majestic experience… Birthday dinner, the Irish way. Candle lights, small cup cakes with candles, Irish food, and wine. I’m sure it would have been anyone’s dream evening! Especially, when it came as a surprise! I even got a DCU pullover for my birthday gift, again, thanks to one of the sweetest person I’ve ever come across, Dr. Maura. 🙂

The birthday party pretty much marked the beginning of the wild time I had here! I came to know more people, started going out with friends, started enjoying the night life here, basically, it turned out to put me into “party” mode! I was lucky to find a wonderful group of friends, which included people from all over the world! I met people from Spain, Poland, Romania, Greece, Costa Rica, Germany, France, Italy, Japan, Taiwan, Ireland (of course), and more… To add to the buzz, the Euro Cup football began, with Ireland qualifying for the tournament after 10 years! Streets and pubs started to fill with enthusiastic supporters cheering for Ireland and singing “holy chants” for the “boys in green”. The atmosphere was electric! I watched all the three matches that Ireland played, with friends at different pubs. That wasn’t all. The partying went to another level when we hung out at nights and boozed and danced till the pubs shut down and kicked us out early in the mornings! It was one of these nights that I tried my first Tequila, and got a bad headache next morning…

But while in Ireland, I also got to learn a lot. The European work culture is different from the Indian one in multiple ways! 8:00 am to 6:00 pm is a strictly followed working period and is often productive. At the same time, evenings and weekends are mostly spent work-free, unless there is a real need to work! I also got the chance to be a part of the School of Law and Government here. Dr. Maura comes from the School of Law and Government, so it was my first time working with a non-computer science mentor! The experience was quite amazing (and sometimes, even amusing) since there were significant differences in the way we approached the research problem we were working on. It was also nice to know how non-computer science students pursued their research, and how they were keen and excited to learn about problems in security and privacy in computer science!

Overall, the visit was amazing both personally and professionally, and the memories and learnings would remain with me for a long time! Below is a picture of me at the Dublin Zoo. Yes, that is a real giraffe, and if you can notice the ostrich in the background! 🙂

Come back soon for more experiences and fun-reads from me and PreCog!