EE380 Computer Systems Colloquium: News Diffusion - fighting misinformation

News Diffusion: Scoring (automatically) news articles to fight misinformation
Wednesday, March 14, 2018 - 4:30pm
Gates B03
Frédéric Filloux and Eun Seo Jo (Stanford)
Abstract / Description: wants to make a decisive contribution to the sustainability of the journalistic information ecosystem by addressing two problems:
1. The lack of correlation between the cost of producing great editorial content and its economic value.
2. The vast untapped potential for news editorial products. willl have a simple and accessible scoring system: the online platform receives a batch of news stories will score on a scale of 1 to 5 based on their journalistic quality. This is done automatically and in real time. This scoring system has multiple applications.

On the business side, the greatest potential is the possibility to adjust the price of an advertisement to the quality of the editorial context. There is room for improvement. Today, a story that required months of work and cost hundreds of thousands of dollars carries the same unitary value (a few dollars per thousand page views) as a short, gossipy article. But times are changing. In the digital ad business, indicators are blinking red: CPMs, click-through rates, and viewability are on a steady downward decline. We believe that inevitably, advertisers and marketers will seek high-quality content--as long as they can rely on a credible indicator of quality. will interface with ad servers to assess the value of a story and price and serve ads accordingly. The higher a story's quality score, the pricier the ad space adjacent to it can be. This adjustment will substantially raise the revenue per page to match the quality of news.

On the editorial side: The ability to assess the quality of news will open up opportunities for new products and services such as:
• Recommendation engines improvement: instead of relying on keywords or frequency, will surface stories based on substantial quality, which will increase the number of articles read per visit. (Currently, visitors to many news sites read less than two articles per visit).
• Personalization: We believe a reader's profile should not be limited to consumption analytics but should reflect his or her editorial preferences. is considering a dedicated "tag" which will be able to connect stories' metadata with a reader's affinity.
• Curation: Publishers will be able to use to offer curation services, a business currently left to players like Google and Apple. By providing technology that can automatically surface the best stories from trusted websites (even small ones), can help publishers expand their footprint.

The platform will be based on two of ML approaches: a feature-based model and a text content analytic model.

Using traditional ML methods, the first model assesses quality taking as input two sets of "signals" to assess the quality of journalistic work: Quantifiable Signals and Subjective Signals. Quantifiable Signals include the structure and patterns of the HTML page, advertising density, use of visual elements, bylines, word count, readability of the text, information density (number of quotes and named entities). This is processed data from news content. Subjective Signals are human scoring of quality based on criteria such as writing style, thoroughness, balance and fairness, timeliness, etc. These measures are produced by editors and experienced journalists.

The second approach is based on deep learning methods. Here, the goal is to build models that will be able to accurately classify an unseen incoming article purely based on the quality of the report, distinct from the metadata or the topic of discussion. The main challenge in many such deep learning approaches is the availability of labeled data. Nearly four million contemporary articles have been processed. They come from sources deemed as "good" or "commodity" (with no journalistic value-added). For the bulk of our data, the reputation and consistency of the news brand had a significant weight, but the objective is also to classify quality at a finer grained level, detached from the name of the source. To this end, various models are used to capture differences in writing that are agnostic to topical differences.


Filloux is currently a John S. Knight Senior Research Fellow at Stanford University where he works on an artifcial intelligence project applied to journalism. is aimed at surfacing quality journalism from the web, in real time, at scale and automatically. By withelisting numerous news sources and authors, will also fight misiniformation. During the academic year 2016-2017, Frederic was an International Journalism Fellow at the JSK, along with seventeen other media professionals selected for the program. He is also the editor of the Monday Note, a newsletter / blog that covers digital business models and technology since 2007. His co-author is Jean-Louis Gassé, a tech investor based in Palo Alto. The Monday Note is now hosted by Medium and republished by Quartz. It reaches between 30,000 and 60,000 media professionals each week. Prior to that, Frederic spent four years as head of digital for Groupe Les Echos, France's main business news provider. From 2007 until 2010, Frédéric Filloux was a working as an editor for the international division of the Norwegian media group Schibsted ASA. In 2002, he was part of the managing team that launched the free daily 20 Minutes, which became the most read newspaper in France. Before, he spent 12 years at Liberation, one of the most innovative French media at the time, successively as a business reporter, New York correspondent, editor of the multimedia section, manager of online operations, and finally, editor of the paper. Frederic is a board member of the Global Editors Network and Reporters sans Frontiéres (Reporters without Borders). He also has an experience in the advertising business after a one-year stop at the Paris Groupe BDDP agency (now TBWA). He is a graduate from the Bordeaux school of Journalism.

Eun Seo Jo is a PhD candidate in the digital humanities and history at Stanford. She has worked on large corpus text projects with the Stanford Literary Lab focusing on the applications of machine learning methods on historical and literary questions. She is a graduate fellow at the Center for Spatial and Textual Analysis (CESTA) and a digital humanities consultant at the Center for Interdisciplinary Research (CIDR). Her dissertation is a computational linguistic analysis of Modernization Theory in modern American foreign policy.