Datasets

Public figures Facebook posts

The Facebook posts of six public figures from different social categories (i.e. politicians, journalists and singers). Each JSON file includes one thousand posts.

  1. Amanpour 204.3KB
  2. Macklemore 213.6KB
  3. Obama 135.8KB
  4. Renzi 569.1KB
  5. Travaglio 1.8MB
  6. Vasco 409.5KB

Aviva insurance tweets

A small dataset of 399 tweets about AVIVA insurance. The CSV file includes tweets from 26/06/2014 to 27/06/2014.

  1. AVIVA tweets 114.9KB

Million graph data

The enriched graph used for join operator. The vertices have been enriched following the LDBC Social Network Benchmark protocol. The TXT files contain vertices and edges in a comma and tab separated format.

  1. Enriched Graph Data 344MB
  2. LiveJournal edges

We also provide the dataset that we used to benchmark the graph nesting operator. In particular, we offer both the gMark -generated subgraphs and authorship's Microsoft Academic Graph subgraphs.

  1. gMark-generated operands (1.9 GB)
  2. Microsoft Academic Authorship graph (5.2 GB)

Smartphone images

The dataset includes photos taken by 19 different smartphones, both from the front camera and the rear camera. For each smartphone a subset of 100 images (50 from the front camera and 50 from the rear one) was uploaded and downloaded on the following Social Media: Facebook, Flickr, Google+, GPhoto, Instagram, LinkedIn, Pinterest, QQ, Telegram, Tumblr, Twitter, Viber, VK, WeChat, WhatsApp and WordPress. The Readme.csv file summarizes the smartphones' characteristics.

  1. Images 53GB
  2. Readme

Time in Text

This dataset comprises the time intervals extracted and normalized from the temporal expressions found in text corpora.

  1. Wikipedia - 89 Million timexes 7.3GB
  2. New York Times 1987-2007 - 15 Million timexes 1.2GB