• Who: Paridhi Jain

  • What: Precog's second Ph.D. thesis defense

  • When: 1730 - 1900hrs IST, April 25, 2016

  • Where: Board room, Fifth floor, IIIT-Delhi

  • Why: Precogs put in a lot of effort in their work, you don't want to miss seeing it.

  • Facebook event: https://www.facebook.com/events/1523107397999127/

  • Title: Automated Methods for Identity Resolution across Online Social Networks

  • Abstract: Today, more than two hundred Online Social Networks (OSNs) exist where each OSN extends to offer distinct services to its users such as eased access to news or better business opportunities. To enjoy each distinct service, a user innocuously registers herself on multiple OSNs. For each OSN, she defines her identity with a different set of attributes, genre of content and friends to suit the purpose of using that OSN. Thus, the quality, quantity and veracity of the identity varies with the OSN. This results in dissimilar identities of the same user, scattered across Internet, with no explicit links directing to one another. These disparate unlinked identities worry various stakeholders. For instance, security practitioners find it difficult to verify attributes across unlinked identities; enterprises fail to create a holistic overview of their customers.

    Research that finds and links disconnected identities of a user across OSNs is termed as identity resolution. Accessibility to unique and private attributes of a user like ‘email’ makes the task trivial, however in absence of such attributes, identity resolution is challenging. In this dissertation, we make an effort to leverage intelligent cues and patterns extracted from partially overlapping list of public attributes of compared identities. These patterns emerge due to consistent user behavior like sharing same mobile number, content or profile picture across OSNs. Translating these patterns into features, we devise novel heuristic, unsupervised and supervised frameworks to search and link user identities across social networks. Proposed search methods use an exhaustive set of public attributes looking for consistent behavior patterns and fetch correct identity of the searched user in the candidate set for an additional 11% users. An improvement on the proposed search mechanisms further optimizes time and space complexity. Suggested linking method compares past attribute value sets and correctly connect identities of an additional 48% users, earlier missed by literature methods that compare only current values. Evaluations on popular OSNs like Twitter, Instagram and Facebook prove significance and generalizability of the linking method.

    Proposed search and linking methods are applicable to users that exhibit evolutionary and consistent behavior on OSNs. To understand the dynamics and reasons for such behavior, we conduct two independent in-depth studies. For user evolutionary behavior, specifically for username, we observe that username evolution leads to broken link (404 page) to a user profile. Yet, 10% of 8.7 million tracked Twitter users changed their username in two months. Investigation reveals that reasons to change include malign intentions like fraudulent username promotion and benign ones like express support to events. We believe that Twitter can monitor frequent username changes, derive malign intentions and suspend accounts if needed. Study of sharing information consistently across OSNs, e.g. mobile number, highlights why users share a personally identifiable information online and how can it be used with auxiliary information sources to derive details of a user.

    In summary, this dissertation encashes previously unused public user information available on a social network for identity resolution via novel methods. The thesis work makes following advancements: a) Propose search frameworks that aim to fetch correct identity of a user in the candidate set by searching with public and discriminative attributes, b) Propose a supervised classification framework for linking identities that compares respective attribute histories in situations where state-of-the-art methods fail to predict the link, c) Study username evolution on Twitter, and d) Study mobile number sharing behavior across OSNs. Proposed methods require no user authorization for data access, yet successfully leverage innocuous user public activity and details, find her accounts across OSNs and help stakeholders with better insights on user’s likings or her suspicious intentions.
  • Who: Sonal Goel

  • What: Masters thesis defense

  • When: 1600 - 1730 hrs IST, April 25, 2016

  • Where: Board room, Fifth floor, IIIT-Delhi

  • Why: Precogs put in a lot of effort in their work, you don't want to miss seeing it.

  • Title: Image Search for Improved Law and Order: Search, Analyze, Predict image spread on Twitter.

  • Abstract: Social media is often used to spread images that can instigate anger among people, hurt their religious, political, caste, and other sentiments, this in return can create law and order situation in society. This results the need for law enforcement agencies to inspect the spread of images related to such events on social media in real time. To help the law enforcement agencies to analyse the image spread on microblogging websites, we developed an Open Source Real Time Image search system, where the user can give an image, and a supportive text related to image and the system finds the images that are similar to the input image and their count. The system proposed is robust to identify images that can be cropped, scaled (to a certain factor), images with text embedded, images stitched with other images, images with varied brightness, and some combination of all these. On the input text, the system runs a text mining algorithm to extract the keywords, retrieve images related to these keywords from Twitter, and use Image comparison methodology to extract similar images. The system can analyse the users who were propagating the content, the sentiments floating with them, and their retweet analysis. We found that Improved ORB (ORB + RANSAC) performs the best for image similarity and using it we are able to achieve an accuracy of above 85% in all the cases tested. The system developed is being used in one of the Government security agency. In addition to identifying the similar images, we also aim to predict the influence of such events on people as diffusion rate. In microblogging sites like Twitter, information provided by tweets diffuses over the users through retweets. Hence, to further enhance the understanding and controlling the diffusion of these kinds of images, we focus to predict the retweet count of such images by using visual cues from the images, content based information and structure-based features. For this, we build a random forest regression model that takes some tweet, image and structural features to predict the retweet count.

  • Who: Pradyumn Nand

  • What: Masters thesis defense

  • When: 1100 - 1230 hrs IST, April 29, 2016

  • Where: Board room, Fifth floor, IIIT-Delhi

  • Why: Precogs put in a lot of effort in their work, you don't want to miss seeing it.