#TPBT: The Pin-Bang Theory

In the monsoon semester 2012, I took a course on Privacy and Security in Online Social Media. We had to do a project on a popular online social media. Pinterest, caught my eye. It was new, it was among the TIME Magazine’s top 50 websites of 2011 and then had close to 20 million users. Its growth was amazing; in a matter of 2 years it was well integrated with popular e-commerce sites like e-bay, etsy, Amazon etc. The big white-on-red “P” next to the blue bird and white-on-blue “f” motivated me to work on Pinterest.

Share Buttons on Amazon.

Without digging much into the OSN and the fact that project proposal submission deadline was like 30 minutes away, I proudly declared that my project will entail user analysis, locating spam / malware and also touch upon copyright issues on Pinterest.
The next time I opened my project, I got my “shock of the semester”. Pinterest had no API. Third-Party python-wrappers were all useless. I will have to scrape the whole network. Thought I was able to complete only a part of my project proposal in the semester, PK sir asked me to continue working. I was joined by Neha on the project and Prateek started shepherding us.
A crawler was created to push data from Pinterest to our databases. Starting from 5 extremely popular seed users.

The darker blocks had the primary data from Pinterest; lighter blocks had associated data collected from many different sources.

We collected a massive data set of 17.9 million user handles, 3.3 million user profiles and about 58 million “Pins” from 26th December 2012 to 1st February 2013.
We then began our analysis, some of our key findings were:

  • We found that the most common topics across users, and pins were design, fashion, photography, food and travel.
  • User, pin, and board characterization: We analyzed various user profile attributes, their geographical distribution, top pin sources and board categories.
  • Exploring Pinterest as a possible venue for copyright infringement: We found copyrighted images being shared publicly on Pinterest and almost half of these images did not give due credit to the copyright owners.
  • Analysis of personal information and malicious content present on Pinterest: Users were giving significant amount of Personally Identifiable Information (PII) voluntarily. We found numerous instances where users shared phone numbers, BBM pins, email IDs, marital status, and other personal information. We also found (and analyzed) traces of malwares in the form of pin sources by using blacklists.
Heat-map for user locations.

The final step was finding the title. So we called upon the highly imaginative and vocal members of Precog, who in a couple of 15-minuite sessions took us from nowhere to “Pinacolada”, “Pingoo” and finally agreeing on “The Pin-Bang Theory”. For more details have a look at our technical report here.

Here is the picture of the discussion (a memorable moment indeed):

All said and done working on Pinterest was indeed an amazing experience for all us ☺

Sudip, Neha, Prateek

Obrigado Rio @ WWW 2013

At IGI Airport, in a flight at 4:15pm, talked to all my family, friends, colleagues, and told them that `THE TRIP’ was finally taking place. Scared, excited, ready to learn and explore, I knew the trip bagged many things for me. I was flying to RIO DE JANERIO, BRAZIL (The Trip), to attend WWW conference to present joint our work with Prof. Joshi on “Identity Resolution” at WoLE. This was my second WWW, after 2011.  Thrilled, I kept on polishing and practicing my presentation in the flight, people thought I was weird because I was talking too much IDENTITY (u see).

Reached Rio, settled down, roamed around a bit and then started the academic excitement. First day, first workshop, first presentation (May 13th, WoLE, 2pm), sitting with PK in the same room, my first International presentation made me all shiver on the stage. Though conference people had very nice infrastructure that presenter could see slides on a screen placed at the right eye angle and that comforted me. On the successful completion of the presentation, multiple researchers approached to discuss ideas and to know more about the work.  To my surprise, the paper bagged “Honorable mention for the best paper award” [1].

Rest of the WWW days kept us (PK and me) on toes, with paper presentations in 24 rooms, spreaded out across 5 floors, 125 research papers + workshop + demos + posters. WWW had 22 social network papers, out of 148 papers submitted, 15 security papers out of 82 submitted and 11 user interface papers, out of 55 submitted.

After attending an amazing keynote by Luis Von Ahn on Captcha and Duolingo, we rushed to attend our marked sessions in the conference booklet. Some very interesting sessions on how to smartly pick mechanical turk users, to give them something they like to annotate [2], how to remove near-duplicate tweets from Twitter and why do it? [3], how timestamps and content created by users can be used to correlate their accounts on multiple social networks [4], how shortened URLs clickthrough behavior can help building the user profile and disclose her identity [5], characteristics of Q-A forums as Quora [6], prediction of evolution of user activity graphs for an social media app [7], why and how criminals hold on valid domains for profit (cybersquatting and typo squatting) [8], etc. One interesting paper on predicting a group stability on an online social networks, said that radioactive decay was observed while detecting user engagement in game / site / application, however they claimed different observations for DBLP network [9].

Apart from technical learning and experience, we got to meet smart people around during poster sessions, research tracks and coffee breaks. Few kind professors and senior PhD students also responded with meeting slots when I requested them. And few good professors invited to roam around the city and experience Rio specialties.

We returned with one best paper award (Aditi’s work on credibility [10]), one honorable mention award, a problem for next WWW, loads of memories and sad faces.

Brazil was an amazing fun loving relaxing city. I got to see beaches, which I had been thinking of, since my first year in PhD. I got to meet my old friends in Rio, and made new friends as well, tried new cuisines, food, places, art, history, and above all, the Christ. Ahh, the feeling of ticking off another wonder from your list, was just amazing.

Thanks to all Precog members, and special thanks to PK for supporting me in all ways (kind to give away his travel grant to add to my travel grant to cover the trip expenses).

Attached is the moment, to say it all in one go:

[1]: Paridhi Jain, Ponnurangam Kumaraguru, and Anupam Joshi. 2013. @i seek ‘fb.me’: identifying users across multiple online social networks. WWW ’13 Companion.

[2]: Djellel Eddine Difallah, Gianluca Demartini, and Philippe Cudré-Mauroux. Pick-a-crowd: tell me what you like, and i’ll tell you what to do. WWW ’13

[3]: Ke TaoFabian AbelClaudia Hauff, Geert-Jan Houben, Ujwal GadirajuGroundhog day: near-duplicate detection on Twitter. WWW ‘13

[4]: Oana Goga, Howard Lei, Sree Hari Krishnan Parthasarathi, Gerald Friedland, Robin Sommer, and Renata Teixeira. Exploiting innocuous activity for correlating users across sites. WWW ’13

[5]: Jonghyuk Song, Sangho Lee, and Jong Kim. I know the shortened URLs you clicked on Twitter: Inference attack using public click analytics and Twitter metadata. WWW ’13

[6]: Gang Wang, Konark Gill, Manish Mohanlal, Haitao Zheng, and Ben Y. Zhao. Wisdom in the social crowd: an analysis of quora. WWW ’13

[7]: Han Liu, Atif Nazir, Jinoo Joung, and Chen-Nee Chuah. Modeling/predicting the evolution trend of osn-based applications. WWW ’13

[8]:  Nick Nikiforakis, Steven Van Acker, Wannes Meert, Lieven Desmet, Frank Piessens, and Wouter Joose. Bitsquatting: exploiting bit-flips for fun, or profit? WWW ’13

[9]: Akshay Patil, Juan Liu, and Jie Gao. Predicting group stability in online social networks. WWW ’13

[10]: Aditi Gupta, Hemank Lamba, Ponnurangam Kumaraguru, and Anupam Joshi. Faking Sandy: characterizing and identifying fake images on Twitter during Hurricane Sandy. WWW ’13 Companion.




Wizters, making you socially anonymous

“We are the generation of Social Media, Our biggest Revolution is a Tweet of 141 characters.”
― Sandra Chami Kassis

The social aspect of the web has been quite astonishing. No one before 2004 thought online social networking can be something this big that almost all the Internet companies will have to go “Social” to gain people’s attention. When Facebook started showing the potential of Online Social Networking (this is how everyone remembers, Friendster and Mysapce are dispensable ), it caught everyone’s imagination and spawned an urge to create more social networks for special needs. And now that we are connected through multiple social networks, do we really share everything that we want? Social networking brought privacy and identity exposure issues for users and the only solution that appears first in the list is, Anonymity. There are things that people can’t share because they are afraid that everyone will know who shared it. These things are their voice, their thoughts, their confessions, things that they want everyone to know, but they cannot. This is how Wizters comes in the picture.

Wizters, is a social network that serves as the medium of online anonymity. Its hard to define what “social” anonymity is because according most people a person cannot be social while being anonymous. But we are trying to break this barrier of anonymity and social networking. Wizters is a social network first and foremost, because it helps in connecting you with the people you know in real life but the only thing that is different here is the way you connect to them.

Every person has a current / active social circle like colleges, schools, offices etc. Wizters divides people in their respective social circle, called Networks. Odds are very high that you know most of the people in your social circle of everyday life and so you must also have a lot of things to say to the people in it. So, Wizters solves this problem by putting you in your relevant network and then whatever you share is anonymous to the people in it, but they get the message, they get what you want to say. Another cool thing that Wizters has brought to the table is sharing direct and private posts with your Facebook friends. This opens up avenues for the birth of a totally different online social practice or an anonymous ecosystem.

If we put Wizters aside and focus on what happened few months ago on Facebook. The flood of college confession and compliment pages came in and people welcomed these pages, they welcomed the idea of anonymity. Thousands of likes and hundreds of posts shared per day on a single page and yet people were hungry for more but these pages lost their importance because they raised moderation issues and more than that the fact that anonymity cannot survive on conventional social networks. A piece of web was meant to be cut out for this thing.

The idea of Wizters was born in the summers of 2011 and it has made a lots of progress since then. It is available for web, Windows Phone and Android (updated apps will be relaunched this month). It is now being developed as a part of PreCog@IIITD.

Wizters, for its motive can be called a social anonymity start-up which aims not only to provide this service on web and mobile but also works to counter the issues generated by online anonymity, like moderation and handling misuse of such services. It is one of the reason why Like-a-little, one of the promising and very well funded anonymity start-up from California got shut down. But where Wizters is trying to head, it can become a revolution in social media instead of 141 characters tweet.


Go home Google Groups, you’re drunk!!!

Well, as they say, no one’s perfect. Not even Google! Evidence: A recent “praise the iPad” bug in Google’s Text-To-Speech [0], which has reportedly, now been rectified, went unnoticed for months!

All the geeks out there must be familiar with the concept of bugs. May it be the =rand(200,99) bug in MS word, the famous “Why can’t I create a folder named ‘con’ in Windows” bug, or the Y2K mega-bug; geeks love bugs. Their impact can vary from funny to disastrous.

Coming to the point, we (PK and myself) recently discovered a bug in Google Groups, which made me feel rather “unpleasant.” We at Precog, run a mailing list, where all members of the group post about topics of common interest, related to security, privacy, and social media etc. Google Groups provides a nice summary of the total number of topics and posts circulated on the list for each month. Last month, that is May 2013, we hit our all-time-high (#PrecogRocks) in terms of topics and posts. PK and I went to the About page to check it out, and were rather shocked to see 183 posts for the month of June already! Terrible statistics, Google! Less than 2 hours into the month of June (IST), it does not seem humanly possible to make 183 posts, right? Given that our previous best was just over 300 for the previous month, this was definitely….. a “bug”!

The Google Groups Bug: 183 posts in under 2 hours? Incorrect!

Reverting to the “old” Google Groups revealed something totally different. The older interface reflected that we did not have a single post for June yet! That would be inaccurate, since both PK and I had posted on the mailing list just a few minutes ago. A possible explanation could be the difference in time zones. If Google works in some western time zone, then our posts were indeed in May (2am, June 1, 2013, IST would still be May 31, 2013 at many places on the planet). Well, if that’s the reason, how does one justify the 183 posts in June, 2013?

The “old” Google Groups. June doesn’t yet have posts? Incorrect again!

Feel free to write to us, if you have encountered a similar bug in the past. If you haven’t, we’d be glad if you can give it a shot! Stay glued to Google Groups, just past midnight on a month end, and check what’s going on! 🙂

Stay tuned for more “bug reports” from Precog@IIITD!

PK and Prateek

[0] http://onefoottsunami.com/2013/01/04/android-issue-38538/

See it, while it’s hot! MultiOSN: Monitoring real-world events on online social media

Today, the world is a place where “chats” refer to Facebook chats, when people “hang out”, they are referring to Google+, and “following” someone is a Twitter thing! The penetration of social media into the common Internet user’s life has been so intense, that people literally “tweet” about an earthquake before running to safety!

Online social media has become one of the fastest, and most widely used means of information transfer today. Especially, when it comes to news, a big proportion of people look for breaking news on Facebook and Twitter! This paradigm shift has resulted because of multiple reasons, the reach of the Internet and online social media, the crowd-sourcing aspect, and the immediacy factor. By and large, online social media has become the best place to look for the latest activity, and keep up-to-date. Acknowledging this fact, and the role of online social media in the modern world, we at Precog@IIITD, have come up with MultiOSN, a tool which monitors multiple online social media during real-world events, and presents analytics based on real-time activity. MultiOSN is our first baby step towards building real-time event monitoring systems to extract knowledge, make interesting analysis and inferences from the data, and visualize the data in usable form, which can help somebody with actionable information. Currently, MultiOSN tracks five social media services viz. Facebook, Twitter, YouTube, Google+, and Flickr.

MultiOSN provides basic, but crucial information floating all over the web of online social media, about real-world events. The number of posts per hour, in the past 24 hours, geographical locations from where these posts have been made, and sentiment analysis are among the few analytics that are presented. Events like Boston Marathon blasts are a perfect example of the kind that can be tracked by organizations / individuals using MultiOSN, and utilize the analytics to potentially detect and prevent further damage. We believe these types of analytics during events like Mumbai Blasts, North Eastern Crisis, can be of great help to various departments of National Governments. For the common users, MultiOSN can be used to visualize events like the IPL (Indian Premiere League) to see which team is being talked about, which players have been making an impact, what is the sentiment of social media users towards the IPL, etc. What makes MultiOSN effective is the fact that all analysis is updated and shown in real-time; while the event is in progress in the real world. Such monitoring can be immensely effective in disaster management during emergencies; in the past we have analyzed various events of emergencies in India (past work). For example, the news of earthquakes, riots, etc. has been witnessed to break faster on social media than by any other means. This kind of critical information about earthquake locations and magnitude, riot locations, if monitored in real-time, can help minimize damage in areas which are expected to be affected next by such events. This is one of the major endeavors of MultiOSN.

The system is now live at http://precog.iiitd.edu.in/tools/beta/multiosnportal/. Feel free to explore more, and email us your valuable feedback at pk [at] iiitd [dot] ac [dot] in. For more details and insights into MultiOSN, please read the technical report here.

Image credits: http://redcrosschat.org/wp-content/uploads/2012/10/205547170462558700_Ks134xFV_c.jpg

Exciting times! Indo-UK workshop on Cyber security

Well, it was two months back I received an email from PK asking me to help him organize an Indo-UK workshop on Cyber security jointly with RCUK, India. In spite of not having the background details for it, just the word “UK” excited me to work on this. It took me not more than 5 minutes to say a big “Yes”. The next response was a set of tasks required for the same and that too in not less than 5 minutes :D. The workshop was planned for 4 days, March 24-27, 2013 to discuss the Cyber security and online security issues, both in India and UK. It all started within no time..setting up the website…lot of e-mail exchanges (With some very big people!)..designing take-away for the wokshop..handling local administrative issues. With all this heavy exercise for about 3-4 weeks, finally the workshop day was approaching.

Day 1, The Oberoi Hotel, New Delhi: Yes! You read the venue right! The workshop started at the grand 5 star hotel with beautiful scenic view from the roof top. It was a large lobby where we all assembled. Within few hours, there was an onset of the delegates, both British and Indian. The delegation was a good combination of people from Industry and research area. It started with a general introduction, few short sessions describing the landscape of Cyber security in UK and India and an interesting 60 minutes networking session to end the day well. It was two big round concentric circles with Indians taking the inner circle and UK people, on outer circle. We had 5 minute, one-to-one slot where everyone was discussing their work. I too got a golden chance to present my work to all the big dignitaries. I will take pride in saying some were really impressed! 🙂 The day finally ended with a scrumptious dinner and little planning for the next day.

Day 2, IIIT-Delhi: The delegation visited IIIT-Delhi, fascinated with the huge campus and facilities. It went with long hours of brainstorming sessions where there were 4 groups each discussing separate problems relating on and off to Cyber security. The board room looked beautiful with walls covered with multi-coloured post-it containing the gist of each discussion. After enjoying the meal, delegation went through another round of getting and giving the feedbacks to the speaker from each group. To ease the mental fatigue, at the end of the day, there was a tour to the campus, group picture and finally a visit to Barbeque Nation in the evening where everyone enjoyed the unlimited food and drinks. Another satisfying day came to an end.

Day 3, IIIT-Delhi: The morning session was yet another long session where people got shuffled within the groups and carried out the discussions! To end the workshop at IIIT-Delhi (Not the end of the workshop!), momentos were given to the delegation thanking them for their time and efforts. There was a surprise here, I got a gift from RCUK, India for helping them with the workshop. Contended! Afternoon was packing bags for Industrial visit to Infosys, Hyderabad. I was travelling with my advisor, scary and exciting. The former was justified and latter since it was my first visit with him alone! It was fun!

Day 4, Infosys, Hyderabad: It was a Holi day! The day started with presentations on history and work culture at Infosys, interesting demos of the research work being carried out, visiting the labs, interacting with the researchers. Got a chance to meet two of my undergrad friends, played Holi with them. Rejuvenating! With all this, we came back to the board room and we had one more thing in our kitty, colours to play Holi! It was pleasure playing with the UK people and my advisor himself! A day worth spending! Overall, the workshop for wonderful, both personally and professionally. It gave an outlook to the pertaining problems like threat reduction in online social media, risk analysis, privacy laws, human responses to attacks, security management, BYOD policies etc. Looking forward to many more to come!

Attached is the pic of an interesting chat with Ms. Elinor Buxton,The Royal Society as part of the networking session at Oberoi!


(Re)-evaluate your communities!!

Though it is one of the most significant or rather say it, one of the most appealing problem in the domain of Network Science, but still it suffers from the most primitive flaw – subjectivity. Yes, I am talking about the problem of Community Detection or what some would like to call Clustering. Why did I use the word “subjectivity” ? Because a lot many definitions exist for how a cluster should be? Adding to the problem, there exist various evaluation metric pertaining to one or multiple features of this “definition”. The problem gets worse as usually there is limited or absolutely no ground truth for most of the social network. So, to summarize the problem of finding “optimal” clusters suffers has the following road-blocks:

  1. No single definition of “optimal”.
  2. Different metrics exist pertaining to different definitions.
  3. Algorithms lacking flexibility – their goal to optimize just one metric.
  4. No ground truth. Subjectivity can exist even in different versions of ground truth.

So what is the solution? I guess, answers to all of these questions was presented in the paper “Defining and Evaluating Network Communities based on Ground-truth” by J.Yang and J.Leskovec published in ICDM 2012. I have been recently involved in quite a few discussions over the evaluation of community detection, how good are the clusterings obtained by the algorithm? And, definitely there was no clear solution, until I got my eyes on this paper, which was truly like nailing the jelly to the wall (and a reliable citation source :p)

The paper talks about comparison of 13 metrics, which they come up with after seeing the commonly used definitions, and for each of the metric, compared how they performed with respect to ground truth.

What were the metrics ?

The 13 metrics comes from 4 classes namely:

  • Metric based on internal connectivity
  • Metric based on external connectivity
  • Metric based on both internal and external connectivity
  • Metric based on network model

Metrics based on internal connectivity include internal density, no. of edges inside the cluster, avg degree , fraction over median degree and triangle participation ratio.

Whereas those based on external connectivity include expansion and cut ratio.

Those combining internal and external connectivity are Conductance, Normalized Cut, Maximum ODF, Average ODF, Flake ODF. and the one that is based on network model is Modularity.

How were these metrics evaluated?

The metrics were mainly evaluated on the basis of how they kept up to the four goodness measures defined below with respect to the ground truth community structure:

  • Separability – “A good community should be well separated from rest of the network”
  • Density – “Good communities are well-connected”
  • Cohesiveness – “Good community should be well connected internally”
  • Clustering Coefficient – “In a good community structure, nodes in a graph should cluster together to a high degree”

Besides this, metrics were also evaluated on how they responded to the change in the ground truth community structure. Ideally they should not change much with little changes in the community structure, but however they should change drastically with high intensity changes. This test was conducted with the help of Z-Score and 4 models of noisening the community.

The results (as posted in the paper) indicated that generally conductance and triad participation ratio performed better for both of the tests.

The paper seemed a good read and you can also download it from Arxiv.

Besides this, it will be interesting to find out what algorithms perform better with comparison to ground truth. The measures of comparison could be F-Score. But as one of the colleague says it should be about measuring “oranges to oranges” and “apples to apples”… and comes up with a metric poised in the paper “Approximation Clustering without the Approximation” by M.F Balcan, A. Blum and A. Gupta.

The metric presented is as follows:

Suppose there are two clusterings C and C’ (where one of them is the ground truth clustering), we should aim to minimize the distance between these 2 clusterings denoted by dist(C,C’). dist(C,C’) is nothing but the fraction of points on which they disagree when matched from C to C’.

The question is to find a permutation of k-communities in C’ which when matched to k-communities in C minimize the distance between the 2 k-clusterings.

However, the metric on the first look seems like computationally intensive for graphs with high number of communities (say 30,000 communities would mean computing distance 30000! ).  The problem looks like finding min-weight perfect matching in a bipartite graph.

Just a disclaimer, All views are my personal opinion (does represent group’s view) and I am really sorry if it tend to hurt someone or are in disagreement with some of the more intellectual minds out there. All results however are the intellectual property of the authors referenced in the post. I do not seek to take credit for any of it. 

Signing Off

Hemank Lamba

(A curious mind)

BrainstorM and Datasets go LIVE!

BrainstorM: Somebody during my graduate school life, told me [paraphrased] “Strength and / or creativity of a researcher is limited to the literature that he or she is aware of.” After going through a pleasant and a well-needed grind at Carnegie Mellon University, I have started preaching this to others. To practice what I preach, I started doing Research Paper reading sessions with my students starting Fall 2010. We did this for a semester, it went well; some students enjoyed it and some did not (understandably!). In Spring 2011, with the help of some of my Ph.D. students, we named the paper reading sessions as “BrainstorM” referred as BM among the group members. After having done it for the last 3 semesters (Spring 2011, Fall 2011, and Spring 2012), some of the Ph.D. students thought, we should take BM to the next level, i.e. make the schedule, papers, lead student, etc. public, so that other research groups are aware of the type of papers we read and can give suggestions, if any. One Ph.D. student who is part of BM from the starting date, says “It [BM] keeps on motivating me to think “Out of the box” looking at what methodologies others are following. One can understand what a paper is all about, but with smart minds having different backgrounds and expertise (undergrads, postgrads and faculty), one understands multiple perceptions of the same idea. I find weekly brainstorm sessions as a way to cultivate healthy group structure and a coherent feeling among group members, which is hard to achieve otherwise. There is always a take away with every BM session either in terms of technical knowledge or motivation to produce good research.”

With this background, experience, and motivation in mind, we are making our BM details public from this semester; for details, please visit http://precog.iiitd.edu.in/brainstorm.html My hope is that, we will get inputs / suggestions on the kind of papers we should read or probably even feedback on the papers that we are reading now from the community. We hope this can help in more interaction between the members of the research group and the research community.

Datasets: I also strongly believe that wherever appropriate, researchers should share their data for others to use; this can ONLY help the community. At this year’s WWW, I asked a question about sharing dataset to one of the speaker who presented a full paper, and the speaker openly denied sharing the data (understandably!). The chairperson for the session reacted positively and  asserted the importance of making  datasets of the papers public. A NYTimes article was written around the question and a follow-up by Dr. Bernardo A. Huberman (Social Computing group at HP labs) http://www.nytimes.com/2012/05/22/science/big-data-troves-stay-forbidden-to-social-scientists.html. With the intent of making  research communities grow healthily, we plan to make our datasets public. In this regard, we are making two datasets public for now; our plan is to share most of the datasets that we have already used and that we will use in future. Please visit http://precog.iiitd.edu.in/resources.html for the datasets and if you have any specific requests on the datasets mentioned here or in any of  our Publications, http://precog.iiitd.edu.in/publications.html please write to me at pk[at]iiitd[dot]ac[dot]in. I will be happy to do needful.


Competitive Influence – A Review

Ever since I got into reading papers, it has always fascinated how much knowledge can just a 8 page document hold. How can the 8 page document be currently extending the boundaries of human knowledge? By my blog posts, I want to draw attention of readers towards some of the upcoming concepts in the field of network science. Today I would like to illuminate readers about the topic of competitive influence maximization.

The problem of influence maximization is quite a well known and has been studied extensively in the past few decades both by economists and computer scientists. However, to just be complete, lets define the problem of influence maximization in the viral marketing setting. Suppose a company wants to launch a new product and wants to market it to consumer by giving some free samples. But, its problem is that, it doesn’t want to give everyone a free sample, but it wants to give samples only to say few people as it really cannot afford to give a lot many free samples. Suppose each sample costs 1 unit and it has only 20 units to give. Now the company wants to select the people whom it wants to give the free samples is in the sense that they can persuade more people to buy the product. It all works on the assumption that a user uses the product and then they refer it to their friends, and they in turn refer to their friends. And those friends buy the product. This problem of which 20 seeds to select such that at the end of the process, the reach of the product is maximum is known as Influence Maximization problem. According to me, for CS students, the best paper to get into this area would definitely be work by Kempe, Kleinberg and Tardos titled “Maximizing the spread of influence through a social network” [link].

Researchers over the past have tried to make the model more realistic by introducing various concepts like, time, game theory, budgets, costs, passivity of users, incentive mechanisms and many more. Others have tried to create efficient heuristics so that the algorithm proposed in the above mentioned paper can scale easily for larger networks without compromising much on the accuracy. One more such factor that is very much applicable in the domain of viral marketing is competition or competitive influence. It is a very valid extension to this problem. Whenever the market is not a monopoly, it is quite safe to assume that there are going to be multiple companies launching a campaign simultaneously or there are going to be more than one marketing campaign in the network at a given time. The question however is, how can we model this competitive behavior?

The researchers have tried to model this competitive behaviour by extending some existing models, for example let us consider the Independent Cascade Model as defined in the above mentioned paper. What the independent cascade model says is that, once a node gets activated, it will try to activate all its currently unactivated nodes with a constant probability p. The model is quite simple. Suppose there are b companies which are competing in the market. Each of the company selects a set Si of nodes, consisting of atmost k nodes. It is assumed that a node selected by the company will buy that company product. However, in the case when a particular node is selected by more than 1 company, then that particular node will get allocated to a company randomly. The goal of each of the company is to maximize the expected number of people allocated to their company at the end of the cascade flow. We can definitely view this as a game. And definitely there does not exist any pure strategy Nash equilibrium, however mixed-strategy Nash equilibrium will exist.

It has been shown that a company’s payoff i.e. expected value of revenue from his strategy given that he knows others strategy is a monotone and a submodular function. Algorithm which can give 1-1/e approximation of the optimal follows a greedy approach. The greedy algorithm is basically iteratively adding a node with largest marginal contribution to the subset.

The above mentioned model just introduces us to the problem of competitive influence in social network. Now, we will try to extend this simplistic model by changing settings. For example consider the budget, the above mentioned model did not account for budget of a player but however that might not be the case. Lets look the above mentioned problem from a follower’s perspective. Suppose there are 2 companies which want to launch a product, and company say A has information about who company B’s influencers are going to be. Now it is upto company A to select nodes in such a way that the number of nodes that buy the product A is more than those buying product B. A has to achieve all of this under a fixed budget. Obviously the model assumes that the choice of a particular node is progressive, that means once a node has bought a product it won’t be willing to change its decision. Also, it is safe to assume that there will definitely be nodes in the network which won’t buy either of A and B. The question which will be answered assuming these settings can be viewed as a best response to competitior’s move in a Stackelberg competition.

Mathematically speaking, we are trying to find the solution to the following problem.

Where A is the company which has information about B’s initil seeds, B is A’s total budget, V is the set of vertices and IA and IB is the set of influencers chosen by A and B respectively. f(IA| IB) is the number of people that will eventually buy product A given that influencers chosen by A is IA and by B is IB.

In the model that we discussed above, we saw that the node randomly choses the company it wants to get influenced by, incase there were multiple nodes who were trying to activate it. Let us now introduce 2 more models to achieve the same task as things might not be random everytime.

Model 1: Distance based model

The distance based model tries to relate to competitive facility location concept. It tries to tell that the location of a node in the network is important to influence people. A consumer will try to adopt the technology which is closest to it and in abundance. Consider a node u is connected by “active” edges (by active edges, we mean edges which have produced the result success when a coin was flipped with success probability equal to independent cascade probability p), and the shortest distance to any active node is say x. Now the node u will accept technology A with probability = number of nodes who have accepted A and are at a distance of x from u connected by active edges divided by total nnumber of nodes who have accepted A and who have accepted B and are at a distance of x from u.

Model 2: Wave propogation model

This particular model doesn’t look at overall shortest distance but it gives more weightage to your immediate neighbours. It says each node makes a decision of which technology to accept, irrespective of the decision made by nodes who are at a distance greater than 1 unit from it.

So probability of u accepting technology A is nothing but ratio of the sum of probabilities of a node accepting technology A for all nodes which are at a distance of 1 unit from it or say x-1 unit from the active set.

It has been proved that both of these models are indeed submodular and a hill climbing approach greedy algorithm can give 1-1/e-E approximation of the optimal. Using these it has been shown that A can obtain a good chunk of market share even by chosing the seed set second.

You can go into the exact technical details and see the results of the above mentioned techniques by going through the following papers:

Cairns T., Nagarajan C., Wild S., and Zulen A., Maximizing Influence in a Competitive Social Network: A Follower’s Perspective [link]

Bharathi S., Kempe D., and Salek M., Competitive Influence Maximization in Social Networks [link]

Other good papers to read regarding this concept:

J. Kotska, Y.A. Oswald, and R. Wattenhofer. Word of Mouth: Rumor dissemination in social networks. In SIROCCO, pages 185-196, 2008

S. Goyal, M. Kearns, Competitive Contagion in Networks. In STOC 2012.

UMBC Memoirs: Research, Fun and Baltimore

INDIA2USA: Now it’s been a few weeks since I have come back from UMBC, Maryland, USA after spending the Spring, 2012 semester; and the hangover of the wonderful experience is still there ;-).  The dream started in Dec, 2011, when I first got to know about my plausible visit to UMBC. It was a mixed feeling of shock, excitement, uncertainty and apprehension and 4.5 months seemed a long time… The wait got longer because of some initial glitches like getting the wrong DS2019 documents and the drought of visa appointment dates at the US Embassy, but finally on 20th Jan, I got my golden pass to USA, I had exactly 8 days to pack for my first international trip and these days flew by in packing, packing and more packing 😉 Before I realized how much time has passed, I was there standing at the IGI airport at 11 PM, waving a teary good bye to my family and friends. Believe me the thought that I am going abroad for 4+ months had not even sunk in… It’s only when I sat on the huge Boeing 777 for the first time, and the engines roared with fuel, the realization hit me like a lightning. It was a 24+ hours long and tiring journey to reach my new abode for next few months, BALTIMORE. Having had a stop-over at New York, a few hours before, Baltimore seemed a very different but pleasing city..  It was more like a country-side with a loads of green patches, rare high rises and open roads [and from there onwards Baltimore was rechristened to “US ka Gaon”]

Work@UMBC: Research experience at UMBC was highly enriching and satisfying. This was my first experience of working in an international setting, and it was truly a unique and multi-cultural experience. I interacted with professors and students from all over the world and it was amazing to see how they worked together as a team. At UMBC, I got an opportunity to work at Ebiquity Lab under the guidance of  Dr. Anupam Joshi. Soon, I realized it is one of the best and most sought after lab among the computer science students at UMBC, with a proven track record of placements with Fortune 500 companies like Microsoft, Amazon and Google. The Research work at Ebiquity lab was always focused and well planned. There were weekly team meetings coupled with scrum meetings to discuss the progress and obstacles and regular updates on research work.

I was highly motivated for my work at UMBC as it was work in my core research domain. Along with the research work I did with Dr. Joshi, I also got the opportunity to attend a course [another meticulously planned detail by my advisor] on “Semantic Web” conducted by Dr. Tim Finin. This was like icing on the cake, as Semantic Web is a new and upcoming field, with numerous applications. In my first few weeks at the research lab, I also got an opportunity to present my work done at IIIT to the Ebiquity group. I got many more wonderful opportunities at UMBC, one such being, when our work was presented by Dr. Joshi at NIST, where I also attended a full day workshop at NIST’s nano-technology labs.

@UMBC I learned about various new domains of computer science research which I had not known earlier, and attended seminars on these topics by the experts. Another interesting milestone for me came during my stay at UMBC, when our research paper got accepted at PSOSM workshop at WWW 2012. I visited Lyon, France for a week to present my work at the conference [another blog coming up shortly about my Paris and Lyon adventures 😉 ]

Fun&Friends: Well no experience is complete without your friends 🙂 and to be honest I met some amazing people at Baltimore and will cherish their friendship throughout my life. These were the people, who made my trip memorable and helped me throughout the stay. It was amazing to see the bonding between the entire Indian community at UMBC. Midnight birthday celebrations, bus trips, movies, bowling, shopping, road-trips,.. it was a trip full of adventure and new experiences. I visited some amazing cities like New York, Washington DC, Boston and Annapolis. I also attended a lot of Indian as well multicultural programs at UMBC, which was an enriching experience too. Now some good things about my “gaon”; the weather in Baltimore was amazing and full of surprises, I experienced snowfall for the first time in my life during my first week at Baltimore, and then stayed through the beautiful spring season to watch all the cherry blossoms bloom to glory and finally bid adieu to the city amidst hurricane warnings.

At the end, I would like to thank my mentors (Dr. Joshi and Dr. PK) for giving me this opportunity to experience a new world and my friends to make it so much memorable for me.