

0 0


Slides to describe MITH's work with archiving Ferguson Twitter data.

On Github edsu / ferguson-slides

Talk about technical and ethical issues related to providing access to a Ferguson social media collection. Interested in hearing about different approaches people come up with since this is a relatively new area for archival research and practice.

the existence, preservation and availability of archives, documents, records in our society are very much determined by the distribution of wealth and power. That is, the most powerful, the richest elements in society have the greatest capacity to find documents, preserve them, and decide what is or is not available to the public. This means government, business and the military are dominant.
Zeynep Tufekci (UNC iSchool) muses on the ways that algorithms shape the media, and our experience. Twitter, and BlackTwitter provides an unprecedented view into a community that has been denied a voice in mainstream media.
% twarc search ferguson > tweets.json
% twarc filter ferguson > stream.json

13,480,00 tweets

August 10, 2014 - August 27, 2014

63G of JSON

8.4 compressed

Technical accessibility issues. How do you provide access to this content?

Not just 140 characters.

  • time
  • hashtags
  • geo coordinates
  • places
  • embedded media
  • retweet
  • reply to
  • user
  • profile
  • avatar
  • follower count
  • Twitter API


Anatomy of a tweet. Twitter's documentation is quite good.

August 10-27, 2014

417,972 Unique Unshortened URLs

UMD Ferguson Town Hall Meeting

December 3, 2014

Random imagery from tweets used as a backdrop for the townhall meeting.

#BlackLivesMatter Teach-Ins

Organizing Meeting Tweet Collection Ethics, Rights and Data Management Basic Navigagtion and Analysis Advanced Analytical Techniques


Indictment Decision of Darren Wilson

November 11 - December 8, 2015

15,080,078 tweets


Department of Justice Report

February 25 - March 21, 2015

2,033,898 tweets


April 1 - April 13, 2015

846,602 tweets


April 15, 2015 - Present



April 25, 2015 - Present


Twitter likes to embellish its socially progressive image, but it's Terms of Service unsurprisingly and necessarily reflect its business interests. Gnip as data reseller, purchased by Twitter. Shutting off DataSift.
Explain how only IDs can be shared, and how you can rehydrate data.

Tweets and Deletes: Silences in the Social Media Archive

Point out rate of deletes, and slight difference in delete rates.


Please get in touch!


Image Credits