#TPBT: The Pin-Bang Theory

In the monsoon semester 2012, I took a course on Privacy and Security in Online Social Media. We had to do a project on a popular online social media. Pinterest, caught my eye. It was new, it was among the TIME Magazine’s top 50 websites of 2011 and then had close to 20 million users. Its growth was amazing; in a matter of 2 years it was well integrated with popular e-commerce sites like e-bay, etsy, Amazon etc. The big white-on-red “P” next to the blue bird and white-on-blue “f” motivated me to work on Pinterest.

Share Buttons on Amazon.

Without digging much into the OSN and the fact that project proposal submission deadline was like 30 minutes away, I proudly declared that my project will entail user analysis, locating spam / malware and also touch upon copyright issues on Pinterest.
The next time I opened my project, I got my “shock of the semester”. Pinterest had no API. Third-Party python-wrappers were all useless. I will have to scrape the whole network. Thought I was able to complete only a part of my project proposal in the semester, PK sir asked me to continue working. I was joined by Neha on the project and Prateek started shepherding us.
A crawler was created to push data from Pinterest to our databases. Starting from 5 extremely popular seed users.

The darker blocks had the primary data from Pinterest; lighter blocks had associated data collected from many different sources.

We collected a massive data set of 17.9 million user handles, 3.3 million user profiles and about 58 million “Pins” from 26th December 2012 to 1st February 2013.
We then began our analysis, some of our key findings were:

  • We found that the most common topics across users, and pins were design, fashion, photography, food and travel.
  • User, pin, and board characterization: We analyzed various user profile attributes, their geographical distribution, top pin sources and board categories.
  • Exploring Pinterest as a possible venue for copyright infringement: We found copyrighted images being shared publicly on Pinterest and almost half of these images did not give due credit to the copyright owners.
  • Analysis of personal information and malicious content present on Pinterest: Users were giving significant amount of Personally Identifiable Information (PII) voluntarily. We found numerous instances where users shared phone numbers, BBM pins, email IDs, marital status, and other personal information. We also found (and analyzed) traces of malwares in the form of pin sources by using blacklists.
Heatmap
Heat-map for user locations.

The final step was finding the title. So we called upon the highly imaginative and vocal members of Precog, who in a couple of 15-minuite sessions took us from nowhere to “Pinacolada”, “Pingoo” and finally agreeing on “The Pin-Bang Theory”. For more details have a look at our technical report here.

Here is the picture of the discussion (a memorable moment indeed):

All said and done working on Pinterest was indeed an amazing experience for all us ☺

Cheers!
Sudip, Neha, Prateek