As part of the US House Intelligence Committee investigation into how Russia may have influenced the 2016 US election, Twitter released the screen names of nearly 3000 Twitter accounts tied to Russia's Internet Research Agency. These accounts were immediately suspended, removing the data from Twitter.com and Twitter's developer API. In this talk, we show how we can reconstruct a subset of the Twitter network of these Russian troll accounts and apply graph analytics to the data using the Neo4j graph database to uncover how these accounts were spreading fake news.
This case study style presentation will show how we collected and munged the data, taking advantage of the flexibility of the property graph. We'll dive into how NLP and graph algorithms like PageRank and community detection can be applied in the context of social media to make sense of the data. We'll show how Cypher, the query language for graphs is used to work with graph data. And we'll show how visualization is used in combination with these algorithms to interpret results of the analysis and to help share the story of the data. No familiarity with graphs or Neo4j is necessary as we'll start with a brief overview of graph databases and Neo4j.
William Lyon is a Developer Relations Engineer at Neo4j, the open source graph database. As a software developer on the Developer Relations team, he works on building tools to integrate Neo4j with other technologies and helping developers build applications with Neo4j. He also leads the Neo4j Data Journalism Accelerator Program, helping data journalists use graphs to make sense of data. Prior to joining Neo4j, William worked as a software engineer for several startups, building software for the real estate, predictive analytics, and quantitative finance industries. William holds a Masters degree in Computer Science from the University of Montana.