Privacy in Open Government Data

As they say, ideas can be life changing… and an idea changed my life too (in a positive way, of course! :)).

It is amazing and satisfying to see how ideas turn into reality! It was an year back I, along with Mayank Gupta (B.Tech , DCE) started working on an idea which revolved around the lines of open government data and its potential malicious use. Information portals in the form of the e-governance websites (e.g., voter-id, driving license, mtnl phone directory) run by Delhi Government in India provide access to personally identifiable information (PII) of the residents of Delhi. Information like name, address, age, date of birth, voter-id, driver’s license number, and father’s name is openly and freely available. With the increase in Cyber security thefts online and increasing privacy awareness among Indian citizens, we thought it would be an interesting problem to encash. And Voila! It actually turned to be in consonance with our ideas :).

The project was planned in various phases / stages. The first phase was identifying the open government sources and going through their privacy policies to check if data collection was permissible or not. Next step was to write PHP scripts to start extracting the information. Within a month, we had approximately 8 million voter-id and 2.5 million driving license records in our local repository. We also collected data from 5 popular social networking sites viz. Facebook, Twitter, Google+, Foursquare and LinkedIn. Public API calls were used to extract the data. Now the next step was to create awareness and spread it among masses. To make this possible, we developed a system which could highlight the public availability and easy accessibility of such PII. Hence, OCEAN: Open Source Collation of eGovernment data and Networks was developed and deployed on January 21, 2013. The input to the system is the name of the individual to be searched and the system returns a candidate set with same name and personal attributes associated with each individual. Interestingly, aggregation of data within the voter-id database helped in creating a family tree which connected people within a family. Below is an image which shows the family tree of Srishti Rawat (a random name, details blackened for privacy purposes).

The system is gaining popularity and has been in talks in privacy research community since its deployment.  Within a short span of time, 398 unique visitors have been recorded in the system (as on May 18, 2013). OCEAN brought lot of accolades to me, Dr. PK and Precog.

  • Article published in national daily, Hindustan (April 16, 2013) [pic attached]
  • Best poster award at IITK Security and Privacy Symposium 2013
  • Accepted poster at IBM I-care 2012, IISc Bangalore

I would also like to thank Swetank Kumar Saha, Sudip Mittal and Daksha Yadav (B.Tech, IIITD) for doing the initial thinking and simple prototype for this work.

Hopefully this effort serves as an eye-opener to general public and other stakeholders in the country.