Moments for life! My first International trip to USA

Well, it’s been 2 months since I returned from the United States of America but it all still looks fresh as I sit down to put it in words! 🙂

It was about a year back that I learnt my plausible visit to the University at Buffalo (UB), NY and the news left me all afluttered! As much as I was excited for my first international travel to this beautiful country, my parents were equally nervous as I am known to be an over-pampered and not-so-independent girl (yea, majority including my parents think so :(). The most important thing for my travel was to get a passport! (I didn’t have one by then ;)). All the formalities for passport and visa were done within no time and I was all set to fly! The days passed by, and the excitement grew by leaps and bounds and it was just packing, packing and more packing. Finally, on June 7, 2013, I left my home at 1 AM with all the excitement and thrill. As the huge Boeing 777 was waiting for me, I bid adieu to friends and family. Within no time, I was sitting in my plane to get to my next humble abode for 2 months. As I lay down, enveloped with clouds, I suddenly realized that I am away from home for long 60 days leaving me nervous and anxious! But the very fact of freedom and independence shooed that feeling away! 😉 With a smooth continuous flight of 15 hours, rare did I know what was next in my kitty! As I sipped a cup of coffee at the large JFK airport, NY, it was announced that my connecting flight was cancelled due to bad weather. The next I got into my senses, I was running back and forth to get another flight and collect my baggage. I felt really low and disappointed to get another flight, 6 hours later and that too from a different airport. With all these glitches, finally I was standing in my apartment admiring my new room and the feeling of being in a new country altogether finally sinked in. Next was the big day, my first day at UB which was entirely a different experience.

The large campus at UB with breath-taking greens and multi-cultured people was difficult to capture in a frame. It took me nearly 2 hours to cover the whole campus. At UB, I got an opportunity to work in the Jacobs Management Centre (The word management sounds confusing but we worked in information and security! :)) under the esteemed guidance of Prof. Hejamadi Raghav Rao. Adapting to the new international setting was less of an issue, thanks to my project partner Rajarshi Chakraborty (PhD@UB) for making things easy (We had already met by then since he visited us at IIITD the previous month, lucky me!). Well, honestly there was no time to get comfortable since we had a deadline after 15 days :D. The work culture there was a little different than what is being followed in India. There were strict working hours from 9:00 AM to 6:00 PM with usually free weekends. It was also different in a sense that in India we used to work together in lab (a noisy one! :D) and there I was given a separate room to work. Unfortunately, since it was a summer time, I missed meeting a lot of students in the university as most of them had gone to their native places. My research work was exciting and challenging and I enjoyed every bit of it it in the apt motivational ambience at UB. My project advisors, Prof. H.R. Rao in the US and PK, here in India, were a great support throughout. I learnt a lot of things at UB and had an opportunity of interacting with some great minds. I also got a chance to attend Western New York Cyber Security conference by ISecure, a conference for discussing the concerns arising in Information Security. Since the conference was near Niagara, I managed to get a quick short trip to the heavenly Niagara Falls. During the last week of my stay, Rajarshi offered me for a dinner at a Thai restaurant. Being a total vegetarian, I had a tough time selecting a dish for myself and end up having an eggplant (Brinjal, in simple terms, which I don’t eat in India :D). Quite an experience!

Apparently, this blog is incomplete without the mention of ‘masti’ and ‘mazaa’ I had in the US. I was lucky enough to have cousins there who helped in exploring cities like New York and Washington and visit some really beautiful places like Statue of Liberty, Madame Tassauds, Niagara Falls, Virginia beaches accompanied with loads and loads of shopping! I got a chance to witness the mind blowing fire cracker show on the Independence Day. I made new friends and did lots of chatting, dance and fun together. This trip also marked the beginning of my cooking skills (and end too for that matter :P), doing groceries and other regular stuff which I could have never imagined myself doing! 😀

Below is an image of me at Madame Tassauds with the waxed Big B! 😀
Overall, the experience was quite satisfying, both personally and professionally. I would like to thank my advisor PK and Prof. H.R Rao for giving me this wonderful opportunity. Looking forward to such exposures in life. Stay tuned! 🙂


My work, my pride!

Privacy in Open Government Data

As they say, ideas can be life changing… and an idea changed my life too (in a positive way, of course! :)).

It is amazing and satisfying to see how ideas turn into reality! It was an year back I, along with Mayank Gupta (B.Tech , DCE) started working on an idea which revolved around the lines of open government data and its potential malicious use. Information portals in the form of the e-governance websites (e.g., voter-id, driving license, mtnl phone directory) run by Delhi Government in India provide access to personally identifiable information (PII) of the residents of Delhi. Information like name, address, age, date of birth, voter-id, driver’s license number, and father’s name is openly and freely available. With the increase in Cyber security thefts online and increasing privacy awareness among Indian citizens, we thought it would be an interesting problem to encash. And Voila! It actually turned to be in consonance with our ideas :).

The project was planned in various phases / stages. The first phase was identifying the open government sources and going through their privacy policies to check if data collection was permissible or not. Next step was to write PHP scripts to start extracting the information. Within a month, we had approximately 8 million voter-id and 2.5 million driving license records in our local repository. We also collected data from 5 popular social networking sites viz. Facebook, Twitter, Google+, Foursquare and LinkedIn. Public API calls were used to extract the data. Now the next step was to create awareness and spread it among masses. To make this possible, we developed a system which could highlight the public availability and easy accessibility of such PII. Hence, OCEAN: Open Source Collation of eGovernment data and Networks was developed and deployed on January 21, 2013. The input to the system is the name of the individual to be searched and the system returns a candidate set with same name and personal attributes associated with each individual. Interestingly, aggregation of data within the voter-id database helped in creating a family tree which connected people within a family. Below is an image which shows the family tree of Srishti Rawat (a random name, details blackened for privacy purposes).

The system is gaining popularity and has been in talks in privacy research community since its deployment.  Within a short span of time, 398 unique visitors have been recorded in the system (as on May 18, 2013). OCEAN brought lot of accolades to me, Dr. PK and Precog.

  • Article published in national daily, Hindustan (April 16, 2013) [pic attached]
  • Best poster award at IITK Security and Privacy Symposium 2013
  • Accepted poster at IBM I-care 2012, IISc Bangalore

I would also like to thank Swetank Kumar Saha, Sudip Mittal and Daksha Yadav (B.Tech, IIITD) for doing the initial thinking and simple prototype for this work.

Hopefully this effort serves as an eye-opener to general public and other stakeholders in the country.

A Precog Summer

The words “Bhelcome Raghav” caught my attention as I walked into the lab for the first time. Paridhi was sitting in her chair with a huge grin on her face, most amused with the arrival of a new intern. The two months and a half that I spent working with Precog went past like a breeze. When I look back at the summer of 2013 the first thing I am reminded of are all those amazing conversations I had with everyone I worked with. Right from the intricacies of decay analysis on social media to my supposed “American accent”, I think we covered everything. It took me a while to think of how I could best talk about my time at Precog and how I could make this sound different from other things that people do over summer break. I could say that I ended up doing some amazing research work with my friend Megha or that we are now in the process of documenting it in the form of a paper. But then it gets better. I mean what’s the big deal about having done research?  Isn’t that what research groups are supposed to do? So what’s different about Precog? I would like to think it’s the people and the way they work together right from PK to all the PHD students (Prateek, Paridhi, Niharika, Anupama, Aditi) to the graduate students and then of course my teammate, Megha. What’s most impressive is the effort they put in not only for their own work but also for each other. We often here phrases like ‘research across disciplines’ or ‘multi-department studies’ at CMU but you really have to hear PK talk at one of his presentations before you what they mean when they say that. It was very interesting to see how simple activities like Whatsup and presentations could help us work better us a group and then of course PK left no stone unturned. I remember skyping with him along with Megha when he was in Brazil, I in Daman and Megha in Delhi. Yeah, believe it or not I was in Daman, skyping with these people rather than hanging out on the beach.That’s how Precog works.

So let’s try and list out everything IIITD brought to my life this summer.

There was awesome research, great friends, a pretty decent amount of money, metro rides, free food and the chance to learn some real cool stuff.

I never went to college in India but if today I was given the option to study back home I would go back to Okhla with a smile on my face.

Raghav, Sophomore @ CMU

#TPBT: The Pin-Bang Theory

In the monsoon semester 2012, I took a course on Privacy and Security in Online Social Media. We had to do a project on a popular online social media. Pinterest, caught my eye. It was new, it was among the TIME Magazine’s top 50 websites of 2011 and then had close to 20 million users. Its growth was amazing; in a matter of 2 years it was well integrated with popular e-commerce sites like e-bay, etsy, Amazon etc. The big white-on-red “P” next to the blue bird and white-on-blue “f” motivated me to work on Pinterest.

Share Buttons on Amazon.

Without digging much into the OSN and the fact that project proposal submission deadline was like 30 minutes away, I proudly declared that my project will entail user analysis, locating spam / malware and also touch upon copyright issues on Pinterest.
The next time I opened my project, I got my “shock of the semester”. Pinterest had no API. Third-Party python-wrappers were all useless. I will have to scrape the whole network. Thought I was able to complete only a part of my project proposal in the semester, PK sir asked me to continue working. I was joined by Neha on the project and Prateek started shepherding us.
A crawler was created to push data from Pinterest to our databases. Starting from 5 extremely popular seed users.

The darker blocks had the primary data from Pinterest; lighter blocks had associated data collected from many different sources.

We collected a massive data set of 17.9 million user handles, 3.3 million user profiles and about 58 million “Pins” from 26th December 2012 to 1st February 2013.
We then began our analysis, some of our key findings were:

  • We found that the most common topics across users, and pins were design, fashion, photography, food and travel.
  • User, pin, and board characterization: We analyzed various user profile attributes, their geographical distribution, top pin sources and board categories.
  • Exploring Pinterest as a possible venue for copyright infringement: We found copyrighted images being shared publicly on Pinterest and almost half of these images did not give due credit to the copyright owners.
  • Analysis of personal information and malicious content present on Pinterest: Users were giving significant amount of Personally Identifiable Information (PII) voluntarily. We found numerous instances where users shared phone numbers, BBM pins, email IDs, marital status, and other personal information. We also found (and analyzed) traces of malwares in the form of pin sources by using blacklists.
Heat-map for user locations.

The final step was finding the title. So we called upon the highly imaginative and vocal members of Precog, who in a couple of 15-minuite sessions took us from nowhere to “Pinacolada”, “Pingoo” and finally agreeing on “The Pin-Bang Theory”. For more details have a look at our technical report here.

Here is the picture of the discussion (a memorable moment indeed):

All said and done working on Pinterest was indeed an amazing experience for all us ☺

Sudip, Neha, Prateek

Obrigado Rio @ WWW 2013

At IGI Airport, in a flight at 4:15pm, talked to all my family, friends, colleagues, and told them that `THE TRIP’ was finally taking place. Scared, excited, ready to learn and explore, I knew the trip bagged many things for me. I was flying to RIO DE JANERIO, BRAZIL (The Trip), to attend WWW conference to present joint our work with Prof. Joshi on “Identity Resolution” at WoLE. This was my second WWW, after 2011.  Thrilled, I kept on polishing and practicing my presentation in the flight, people thought I was weird because I was talking too much IDENTITY (u see).

Reached Rio, settled down, roamed around a bit and then started the academic excitement. First day, first workshop, first presentation (May 13th, WoLE, 2pm), sitting with PK in the same room, my first International presentation made me all shiver on the stage. Though conference people had very nice infrastructure that presenter could see slides on a screen placed at the right eye angle and that comforted me. On the successful completion of the presentation, multiple researchers approached to discuss ideas and to know more about the work.  To my surprise, the paper bagged “Honorable mention for the best paper award” [1].

Rest of the WWW days kept us (PK and me) on toes, with paper presentations in 24 rooms, spreaded out across 5 floors, 125 research papers + workshop + demos + posters. WWW had 22 social network papers, out of 148 papers submitted, 15 security papers out of 82 submitted and 11 user interface papers, out of 55 submitted.

After attending an amazing keynote by Luis Von Ahn on Captcha and Duolingo, we rushed to attend our marked sessions in the conference booklet. Some very interesting sessions on how to smartly pick mechanical turk users, to give them something they like to annotate [2], how to remove near-duplicate tweets from Twitter and why do it? [3], how timestamps and content created by users can be used to correlate their accounts on multiple social networks [4], how shortened URLs clickthrough behavior can help building the user profile and disclose her identity [5], characteristics of Q-A forums as Quora [6], prediction of evolution of user activity graphs for an social media app [7], why and how criminals hold on valid domains for profit (cybersquatting and typo squatting) [8], etc. One interesting paper on predicting a group stability on an online social networks, said that radioactive decay was observed while detecting user engagement in game / site / application, however they claimed different observations for DBLP network [9].

Apart from technical learning and experience, we got to meet smart people around during poster sessions, research tracks and coffee breaks. Few kind professors and senior PhD students also responded with meeting slots when I requested them. And few good professors invited to roam around the city and experience Rio specialties.

We returned with one best paper award (Aditi’s work on credibility [10]), one honorable mention award, a problem for next WWW, loads of memories and sad faces.

Brazil was an amazing fun loving relaxing city. I got to see beaches, which I had been thinking of, since my first year in PhD. I got to meet my old friends in Rio, and made new friends as well, tried new cuisines, food, places, art, history, and above all, the Christ. Ahh, the feeling of ticking off another wonder from your list, was just amazing.

Thanks to all Precog members, and special thanks to PK for supporting me in all ways (kind to give away his travel grant to add to my travel grant to cover the trip expenses).

Attached is the moment, to say it all in one go:

[1]: Paridhi Jain, Ponnurangam Kumaraguru, and Anupam Joshi. 2013. @i seek ‘’: identifying users across multiple online social networks. WWW ’13 Companion.

[2]: Djellel Eddine Difallah, Gianluca Demartini, and Philippe Cudré-Mauroux. Pick-a-crowd: tell me what you like, and i’ll tell you what to do. WWW ’13

[3]: Ke TaoFabian AbelClaudia Hauff, Geert-Jan Houben, Ujwal GadirajuGroundhog day: near-duplicate detection on Twitter. WWW ‘13

[4]: Oana Goga, Howard Lei, Sree Hari Krishnan Parthasarathi, Gerald Friedland, Robin Sommer, and Renata Teixeira. Exploiting innocuous activity for correlating users across sites. WWW ’13

[5]: Jonghyuk Song, Sangho Lee, and Jong Kim. I know the shortened URLs you clicked on Twitter: Inference attack using public click analytics and Twitter metadata. WWW ’13

[6]: Gang Wang, Konark Gill, Manish Mohanlal, Haitao Zheng, and Ben Y. Zhao. Wisdom in the social crowd: an analysis of quora. WWW ’13

[7]: Han Liu, Atif Nazir, Jinoo Joung, and Chen-Nee Chuah. Modeling/predicting the evolution trend of osn-based applications. WWW ’13

[8]:  Nick Nikiforakis, Steven Van Acker, Wannes Meert, Lieven Desmet, Frank Piessens, and Wouter Joose. Bitsquatting: exploiting bit-flips for fun, or profit? WWW ’13

[9]: Akshay Patil, Juan Liu, and Jie Gao. Predicting group stability in online social networks. WWW ’13

[10]: Aditi Gupta, Hemank Lamba, Ponnurangam Kumaraguru, and Anupam Joshi. Faking Sandy: characterizing and identifying fake images on Twitter during Hurricane Sandy. WWW ’13 Companion.




Wizters, making you socially anonymous

“We are the generation of Social Media, Our biggest Revolution is a Tweet of 141 characters.”
― Sandra Chami Kassis

The social aspect of the web has been quite astonishing. No one before 2004 thought online social networking can be something this big that almost all the Internet companies will have to go “Social” to gain people’s attention. When Facebook started showing the potential of Online Social Networking (this is how everyone remembers, Friendster and Mysapce are dispensable ), it caught everyone’s imagination and spawned an urge to create more social networks for special needs. And now that we are connected through multiple social networks, do we really share everything that we want? Social networking brought privacy and identity exposure issues for users and the only solution that appears first in the list is, Anonymity. There are things that people can’t share because they are afraid that everyone will know who shared it. These things are their voice, their thoughts, their confessions, things that they want everyone to know, but they cannot. This is how Wizters comes in the picture.

Wizters, is a social network that serves as the medium of online anonymity. Its hard to define what “social” anonymity is because according most people a person cannot be social while being anonymous. But we are trying to break this barrier of anonymity and social networking. Wizters is a social network first and foremost, because it helps in connecting you with the people you know in real life but the only thing that is different here is the way you connect to them.

Every person has a current / active social circle like colleges, schools, offices etc. Wizters divides people in their respective social circle, called Networks. Odds are very high that you know most of the people in your social circle of everyday life and so you must also have a lot of things to say to the people in it. So, Wizters solves this problem by putting you in your relevant network and then whatever you share is anonymous to the people in it, but they get the message, they get what you want to say. Another cool thing that Wizters has brought to the table is sharing direct and private posts with your Facebook friends. This opens up avenues for the birth of a totally different online social practice or an anonymous ecosystem.

If we put Wizters aside and focus on what happened few months ago on Facebook. The flood of college confession and compliment pages came in and people welcomed these pages, they welcomed the idea of anonymity. Thousands of likes and hundreds of posts shared per day on a single page and yet people were hungry for more but these pages lost their importance because they raised moderation issues and more than that the fact that anonymity cannot survive on conventional social networks. A piece of web was meant to be cut out for this thing.

The idea of Wizters was born in the summers of 2011 and it has made a lots of progress since then. It is available for web, Windows Phone and Android (updated apps will be relaunched this month). It is now being developed as a part of PreCog@IIITD.

Wizters, for its motive can be called a social anonymity start-up which aims not only to provide this service on web and mobile but also works to counter the issues generated by online anonymity, like moderation and handling misuse of such services. It is one of the reason why Like-a-little, one of the promising and very well funded anonymity start-up from California got shut down. But where Wizters is trying to head, it can become a revolution in social media instead of 141 characters tweet.


Go home Google Groups, you’re drunk!!!

Well, as they say, no one’s perfect. Not even Google! Evidence: A recent “praise the iPad” bug in Google’s Text-To-Speech [0], which has reportedly, now been rectified, went unnoticed for months!

All the geeks out there must be familiar with the concept of bugs. May it be the =rand(200,99) bug in MS word, the famous “Why can’t I create a folder named ‘con’ in Windows” bug, or the Y2K mega-bug; geeks love bugs. Their impact can vary from funny to disastrous.

Coming to the point, we (PK and myself) recently discovered a bug in Google Groups, which made me feel rather “unpleasant.” We at Precog, run a mailing list, where all members of the group post about topics of common interest, related to security, privacy, and social media etc. Google Groups provides a nice summary of the total number of topics and posts circulated on the list for each month. Last month, that is May 2013, we hit our all-time-high (#PrecogRocks) in terms of topics and posts. PK and I went to the About page to check it out, and were rather shocked to see 183 posts for the month of June already! Terrible statistics, Google! Less than 2 hours into the month of June (IST), it does not seem humanly possible to make 183 posts, right? Given that our previous best was just over 300 for the previous month, this was definitely….. a “bug”!

The Google Groups Bug: 183 posts in under 2 hours? Incorrect!

Reverting to the “old” Google Groups revealed something totally different. The older interface reflected that we did not have a single post for June yet! That would be inaccurate, since both PK and I had posted on the mailing list just a few minutes ago. A possible explanation could be the difference in time zones. If Google works in some western time zone, then our posts were indeed in May (2am, June 1, 2013, IST would still be May 31, 2013 at many places on the planet). Well, if that’s the reason, how does one justify the 183 posts in June, 2013?

The “old” Google Groups. June doesn’t yet have posts? Incorrect again!

Feel free to write to us, if you have encountered a similar bug in the past. If you haven’t, we’d be glad if you can give it a shot! Stay glued to Google Groups, just past midnight on a month end, and check what’s going on! 🙂

Stay tuned for more “bug reports” from Precog@IIITD!

PK and Prateek


See it, while it’s hot! MultiOSN: Monitoring real-world events on online social media

Today, the world is a place where “chats” refer to Facebook chats, when people “hang out”, they are referring to Google+, and “following” someone is a Twitter thing! The penetration of social media into the common Internet user’s life has been so intense, that people literally “tweet” about an earthquake before running to safety!

Online social media has become one of the fastest, and most widely used means of information transfer today. Especially, when it comes to news, a big proportion of people look for breaking news on Facebook and Twitter! This paradigm shift has resulted because of multiple reasons, the reach of the Internet and online social media, the crowd-sourcing aspect, and the immediacy factor. By and large, online social media has become the best place to look for the latest activity, and keep up-to-date. Acknowledging this fact, and the role of online social media in the modern world, we at Precog@IIITD, have come up with MultiOSN, a tool which monitors multiple online social media during real-world events, and presents analytics based on real-time activity. MultiOSN is our first baby step towards building real-time event monitoring systems to extract knowledge, make interesting analysis and inferences from the data, and visualize the data in usable form, which can help somebody with actionable information. Currently, MultiOSN tracks five social media services viz. Facebook, Twitter, YouTube, Google+, and Flickr.

MultiOSN provides basic, but crucial information floating all over the web of online social media, about real-world events. The number of posts per hour, in the past 24 hours, geographical locations from where these posts have been made, and sentiment analysis are among the few analytics that are presented. Events like Boston Marathon blasts are a perfect example of the kind that can be tracked by organizations / individuals using MultiOSN, and utilize the analytics to potentially detect and prevent further damage. We believe these types of analytics during events like Mumbai Blasts, North Eastern Crisis, can be of great help to various departments of National Governments. For the common users, MultiOSN can be used to visualize events like the IPL (Indian Premiere League) to see which team is being talked about, which players have been making an impact, what is the sentiment of social media users towards the IPL, etc. What makes MultiOSN effective is the fact that all analysis is updated and shown in real-time; while the event is in progress in the real world. Such monitoring can be immensely effective in disaster management during emergencies; in the past we have analyzed various events of emergencies in India (past work). For example, the news of earthquakes, riots, etc. has been witnessed to break faster on social media than by any other means. This kind of critical information about earthquake locations and magnitude, riot locations, if monitored in real-time, can help minimize damage in areas which are expected to be affected next by such events. This is one of the major endeavors of MultiOSN.

The system is now live at Feel free to explore more, and email us your valuable feedback at pk [at] iiitd [dot] ac [dot] in. For more details and insights into MultiOSN, please read the technical report here.

Image credits:

Exciting times! Indo-UK workshop on Cyber security

Well, it was two months back I received an email from PK asking me to help him organize an Indo-UK workshop on Cyber security jointly with RCUK, India. In spite of not having the background details for it, just the word “UK” excited me to work on this. It took me not more than 5 minutes to say a big “Yes”. The next response was a set of tasks required for the same and that too in not less than 5 minutes :D. The workshop was planned for 4 days, March 24-27, 2013 to discuss the Cyber security and online security issues, both in India and UK. It all started within no time..setting up the website…lot of e-mail exchanges (With some very big people!)..designing take-away for the wokshop..handling local administrative issues. With all this heavy exercise for about 3-4 weeks, finally the workshop day was approaching.

Day 1, The Oberoi Hotel, New Delhi: Yes! You read the venue right! The workshop started at the grand 5 star hotel with beautiful scenic view from the roof top. It was a large lobby where we all assembled. Within few hours, there was an onset of the delegates, both British and Indian. The delegation was a good combination of people from Industry and research area. It started with a general introduction, few short sessions describing the landscape of Cyber security in UK and India and an interesting 60 minutes networking session to end the day well. It was two big round concentric circles with Indians taking the inner circle and UK people, on outer circle. We had 5 minute, one-to-one slot where everyone was discussing their work. I too got a golden chance to present my work to all the big dignitaries. I will take pride in saying some were really impressed! 🙂 The day finally ended with a scrumptious dinner and little planning for the next day.

Day 2, IIIT-Delhi: The delegation visited IIIT-Delhi, fascinated with the huge campus and facilities. It went with long hours of brainstorming sessions where there were 4 groups each discussing separate problems relating on and off to Cyber security. The board room looked beautiful with walls covered with multi-coloured post-it containing the gist of each discussion. After enjoying the meal, delegation went through another round of getting and giving the feedbacks to the speaker from each group. To ease the mental fatigue, at the end of the day, there was a tour to the campus, group picture and finally a visit to Barbeque Nation in the evening where everyone enjoyed the unlimited food and drinks. Another satisfying day came to an end.

Day 3, IIIT-Delhi: The morning session was yet another long session where people got shuffled within the groups and carried out the discussions! To end the workshop at IIIT-Delhi (Not the end of the workshop!), momentos were given to the delegation thanking them for their time and efforts. There was a surprise here, I got a gift from RCUK, India for helping them with the workshop. Contended! Afternoon was packing bags for Industrial visit to Infosys, Hyderabad. I was travelling with my advisor, scary and exciting. The former was justified and latter since it was my first visit with him alone! It was fun!

Day 4, Infosys, Hyderabad: It was a Holi day! The day started with presentations on history and work culture at Infosys, interesting demos of the research work being carried out, visiting the labs, interacting with the researchers. Got a chance to meet two of my undergrad friends, played Holi with them. Rejuvenating! With all this, we came back to the board room and we had one more thing in our kitty, colours to play Holi! It was pleasure playing with the UK people and my advisor himself! A day worth spending! Overall, the workshop for wonderful, both personally and professionally. It gave an outlook to the pertaining problems like threat reduction in online social media, risk analysis, privacy laws, human responses to attacks, security management, BYOD policies etc. Looking forward to many more to come!

Attached is the pic of an interesting chat with Ms. Elinor Buxton,The Royal Society as part of the networking session at Oberoi!


(Re)-evaluate your communities!!

Though it is one of the most significant or rather say it, one of the most appealing problem in the domain of Network Science, but still it suffers from the most primitive flaw – subjectivity. Yes, I am talking about the problem of Community Detection or what some would like to call Clustering. Why did I use the word “subjectivity” ? Because a lot many definitions exist for how a cluster should be? Adding to the problem, there exist various evaluation metric pertaining to one or multiple features of this “definition”. The problem gets worse as usually there is limited or absolutely no ground truth for most of the social network. So, to summarize the problem of finding “optimal” clusters suffers has the following road-blocks:

  1. No single definition of “optimal”.
  2. Different metrics exist pertaining to different definitions.
  3. Algorithms lacking flexibility – their goal to optimize just one metric.
  4. No ground truth. Subjectivity can exist even in different versions of ground truth.

So what is the solution? I guess, answers to all of these questions was presented in the paper “Defining and Evaluating Network Communities based on Ground-truth” by J.Yang and J.Leskovec published in ICDM 2012. I have been recently involved in quite a few discussions over the evaluation of community detection, how good are the clusterings obtained by the algorithm? And, definitely there was no clear solution, until I got my eyes on this paper, which was truly like nailing the jelly to the wall (and a reliable citation source :p)

The paper talks about comparison of 13 metrics, which they come up with after seeing the commonly used definitions, and for each of the metric, compared how they performed with respect to ground truth.

What were the metrics ?

The 13 metrics comes from 4 classes namely:

  • Metric based on internal connectivity
  • Metric based on external connectivity
  • Metric based on both internal and external connectivity
  • Metric based on network model

Metrics based on internal connectivity include internal density, no. of edges inside the cluster, avg degree , fraction over median degree and triangle participation ratio.

Whereas those based on external connectivity include expansion and cut ratio.

Those combining internal and external connectivity are Conductance, Normalized Cut, Maximum ODF, Average ODF, Flake ODF. and the one that is based on network model is Modularity.

How were these metrics evaluated?

The metrics were mainly evaluated on the basis of how they kept up to the four goodness measures defined below with respect to the ground truth community structure:

  • Separability – “A good community should be well separated from rest of the network”
  • Density – “Good communities are well-connected”
  • Cohesiveness – “Good community should be well connected internally”
  • Clustering Coefficient – “In a good community structure, nodes in a graph should cluster together to a high degree”

Besides this, metrics were also evaluated on how they responded to the change in the ground truth community structure. Ideally they should not change much with little changes in the community structure, but however they should change drastically with high intensity changes. This test was conducted with the help of Z-Score and 4 models of noisening the community.

The results (as posted in the paper) indicated that generally conductance and triad participation ratio performed better for both of the tests.

The paper seemed a good read and you can also download it from Arxiv.

Besides this, it will be interesting to find out what algorithms perform better with comparison to ground truth. The measures of comparison could be F-Score. But as one of the colleague says it should be about measuring “oranges to oranges” and “apples to apples”… and comes up with a metric poised in the paper “Approximation Clustering without the Approximation” by M.F Balcan, A. Blum and A. Gupta.

The metric presented is as follows:

Suppose there are two clusterings C and C’ (where one of them is the ground truth clustering), we should aim to minimize the distance between these 2 clusterings denoted by dist(C,C’). dist(C,C’) is nothing but the fraction of points on which they disagree when matched from C to C’.

The question is to find a permutation of k-communities in C’ which when matched to k-communities in C minimize the distance between the 2 k-clusterings.

However, the metric on the first look seems like computationally intensive for graphs with high number of communities (say 30,000 communities would mean computing distance 30000! ).  The problem looks like finding min-weight perfect matching in a bipartite graph.

Just a disclaimer, All views are my personal opinion (does represent group’s view) and I am really sorry if it tend to hurt someone or are in disagreement with some of the more intellectual minds out there. All results however are the intellectual property of the authors referenced in the post. I do not seek to take credit for any of it. 

Signing Off

Hemank Lamba

(A curious mind)