EE380 Computer Systems Colloquium

EE380 Computer Systems Colloquium: News Diffusion - fighting misinformation

Topic: 
News Diffusion: Scoring (automatically) news articles to fight misinformation
Abstract / Description: 

Deepnews.ai wants to make a decisive contribution to the sustainability of the journalistic information ecosystem by addressing two problems:
1. The lack of correlation between the cost of producing great editorial content and its economic value.
2. The vast untapped potential for news editorial products.

Deepnews.ai willl have a simple and accessible scoring system: the online platform receives a batch of news stories will score on a scale of 1 to 5 based on their journalistic quality. This is done automatically and in real time. This scoring system has multiple applications.

On the business side, the greatest potential is the possibility to adjust the price of an advertisement to the quality of the editorial context. There is room for improvement. Today, a story that required months of work and cost hundreds of thousands of dollars carries the same unitary value (a few dollars per thousand page views) as a short, gossipy article. But times are changing. In the digital ad business, indicators are blinking red: CPMs, click-through rates, and viewability are on a steady downward decline. We believe that inevitably, advertisers and marketers will seek high-quality content--as long as they can rely on a credible indicator of quality. Deepnews.ai will interface with ad servers to assess the value of a story and price and serve ads accordingly. The higher a story's quality score, the pricier the ad space adjacent to it can be. This adjustment will substantially raise the revenue per page to match the quality of news.

On the editorial side: The ability to assess the quality of news will open up opportunities for new products and services such as:
• Recommendation engines improvement: instead of relying on keywords or frequency, Deepnews.ai will surface stories based on substantial quality, which will increase the number of articles read per visit. (Currently, visitors to many news sites read less than two articles per visit).
• Personalization: We believe a reader's profile should not be limited to consumption analytics but should reflect his or her editorial preferences. Deepnews.ai is considering a dedicated "tag" which will be able to connect stories' metadata with a reader's affinity.
• Curation: Publishers will be able to use Deepnews.ai to offer curation services, a business currently left to players like Google and Apple. By providing technology that can automatically surface the best stories from trusted websites (even small ones), Deepnews.ai can help publishers expand their footprint.

The platform will be based on two of ML approaches: a feature-based model and a text content analytic model.

Using traditional ML methods, the first model assesses quality taking as input two sets of "signals" to assess the quality of journalistic work: Quantifiable Signals and Subjective Signals. Quantifiable Signals include the structure and patterns of the HTML page, advertising density, use of visual elements, bylines, word count, readability of the text, information density (number of quotes and named entities). This is processed data from news content. Subjective Signals are human scoring of quality based on criteria such as writing style, thoroughness, balance and fairness, timeliness, etc. These measures are produced by editors and experienced journalists.

The second approach is based on deep learning methods. Here, the goal is to build models that will be able to accurately classify an unseen incoming article purely based on the quality of the report, distinct from the metadata or the topic of discussion. The main challenge in many such deep learning approaches is the availability of labeled data. Nearly four million contemporary articles have been processed. They come from sources deemed as "good" or "commodity" (with no journalistic value-added). For the bulk of our data, the reputation and consistency of the news brand had a significant weight, but the objective is also to classify quality at a finer grained level, detached from the name of the source. To this end, various models are used to capture differences in writing that are agnostic to topical differences.

Date and Time: 
Wednesday, March 14, 2018 - 4:30pm
Venue: 
Gates B03

EE380 Computer Systems Colloquium: The Evolution of Public Key Cryptography

Topic: 
The Evolution of Public Key Cryptography
Abstract / Description: 

While public key cryptography is seen as revolutionary, after this talk you might wonder why it took Whit Diffie, Ralph Merkle and Hellman so long to discover it. This talk also highlights the contributions of some unsung (or "under-sung") heroes: Ralph Merkle, John Gill, Stephen Pohlig, Richard Schroeppel, Loren Kohnfelder, and researchers at GCHQ (Ellis, Cocks, and Williamson).

Date and Time: 
Wednesday, February 28, 2018 - 4:30pm
Venue: 
Gates B03

EE380 Computer Systems Colloquium: Graph Analysis of Russian Twitter Trolls using Neo4j

Topic: 
Graph Analysis of Russian Twitter Trolls using Neo4j
Abstract / Description: 

As part of the US House Intelligence Committee investigation into how Russia may have influenced the 2016 US election, Twitter released the screen names of nearly 3000 Twitter accounts tied to Russia's Internet Research Agency. These accounts were immediately suspended, removing the data from Twitter.com and Twitter's developer API. In this talk, we show how we can reconstruct a subset of the Twitter network of these Russian troll accounts and apply graph analytics to the data using the Neo4j graph database to uncover how these accounts were spreading fake news.

This case study style presentation will show how we collected and munged the data, taking advantage of the flexibility of the property graph. We'll dive into how NLP and graph algorithms like PageRank and community detection can be applied in the context of social media to make sense of the data. We'll show how Cypher, the query language for graphs is used to work with graph data. And we'll show how visualization is used in combination with these algorithms to interpret results of the analysis and to help share the story of the data. No familiarity with graphs or Neo4j is necessary as we'll start with a brief overview of graph databases and Neo4j.

Date and Time: 
Wednesday, February 21, 2018 - 4:30pm
Venue: 
Gates B03

EE380 Computer Systems Colloquium: Tiny functions for codecs, compilation, and (maybe) soon everything

Topic: 
Tiny functions for codecs, compilation, and (maybe) soon everything
Abstract / Description: 

Networks, applications, and media codecs frequently treat one another as strangers. By expressing large systems as compositions of small, pure functions, we've found it's possible to achieve tighter couplings between these components, improving performance without giving up modularity or the ability to debug. I'll discuss our experience with systems that demonstrate this basic idea: ExCamera (NSDI 2017) parallelizes video encoding into thousands of tiny tasks, each handling a fraction of a second of video, much shorter than the interval between key frames, and executing in parallel on AWS Lambda. This was the first system to demonstrate "burst-parallel" thousands-way computation on functions-as-a-service infrastructure. Salsify (NSDI 2018) is a low-latency network video system that uses a purely functional video codec to explore execution paths of the encoder without committing to them, allowing it to closely match the capacity estimates from a video-aware transport protocol. This architecture outperforms more loosely-coupled applications -- Skype, Facetime, Hangouts, WebRTC -- in delay and visual quality, and suggests that while improvements in video codecs may have reached the point of diminishing returns, video systems still have low-hanging fruit. Lepton (NSDI 2017) uses a purely functional JPEG/VP8 transcoder to compress images in parallel across a distributed network filesystem with arbitrary block boundaries. This free-software system is in production at Dropbox and has compressed, by 23%, more than 200 petabytes of user JPEGs.

Based on our experience, we propose a general abstraction for outsourced morsels of computation, called cloud "thunks" -- stateless closures that describe their data dependencies by content-hash. We have created a tool that uses this abstraction to capture off-the-shelf Makefiles and other build systems, letting the user treat a FaaS service like an outsourced build farm with global memoization of results. The bottom line: expressing systems and protocols as compositions of small, pure functions will lead to a new wave of "general-purpose" lambda computing, permitting us to transform many time-consuming operations into large numbers of functions executing with massive parallelism for short durations in the cloud.

Date and Time: 
Wednesday, February 7, 2018 - 4:30pm
Venue: 
Gates 403

EE380 Computer Systems Colloquium: Computational Memory - A stepping-stone to non-von Neumann computing?

Topic: 
Computational Memory: A stepping-stone to non-von Neumann computing?
Abstract / Description: 

In the advent of the data-centric AI era and the imminent end of CMOS scaling laws, the time is ripe to adopt computing units based on non-von Neumann computing architectures. A first step in this direction could be in-memory computing, where certain computational tasks are performed in place in a specialized memory unit called computational memory. Resistive memory devices, where information is represented in terms of atomic arrangements within tiny volumes of material, are poised to play a key role as elements of such computational memory units. I will present a few examples of how the physical attributes and dynamics of these devices can be exploited to achieve in-place computation. We expect that this co-existence of computation and storage at the nanometer scale could enable ultra-dense, low-power, and massively-parallel computing systems.


The Stanford EE Computer Systems Colloquium (EE380) meets on Wednesdays 4:30-5:45 throughout the academic year. Talks are given before a live audience in Room B03 in the basement of the Gates Computer Science Building on the Stanford Campus. The live talks (and the videos hosted at Stanford and on YouTube) are open to the public.

Date and Time: 
Wednesday, March 7, 2018 - 4:30pm
Venue: 
Gates B03

EE380 Computer Systems Colloquium: Stopping grinding attacks in proofs of space

Topic: 
Stopping grinding attacks in proofs of space
Abstract / Description: 

The reduced power requirements of proofs of space, which is one of its core features, opens it up to grinding attacks, in which an attacker tries many different possible histories at once and selects the most advantageous one. I'll explain how through extensive use of canonical primitives, the addition of verifiable delay functions, and careful hooking of everything together, it's possible to get grinding attacks under control.


The Stanford EE Computer Systems Colloquium (EE380) meets on Wednesdays 4:30-5:45 throughout the academic year. Talks are given before a live audience in Room B03 in the basement of the Gates Computer Science Building on the Stanford Campus. The live talks (and the videos hosted at Stanford and on YouTube) are open to the public.

Date and Time: 
Wednesday, February 14, 2018 - 4:30pm
Venue: 
Gates B03

EE380 Computer Systems Colloquium: Exploiting modern microarchitectures: Meltdown, Spectre, and other hardware attacks

Topic: 
Exploiting modern microarchitectures: Meltdown, Spectre, and other hardware attacks
Abstract / Description: 

Recently disclosed vulnerabilities against modern high performance computer microarchitectures known as 'Meltdown' and 'Spectre' are among an emerging wave of hardware focused attacks. These include cache side channel exploits against underlying shared resources, which arise as a result of common industry-wide performance optimizations. More broadly, attacks against hardware are entering a new phase of sophistication that will see more in the months ahead. This talk will describe several of these attacks, how they can be mitigated, and generally what we can do as an industry to bring performance without trading security.


The Stanford EE Computer Systems Colloquium (EE380) meets on Wednesdays 4:30-5:45 throughout the academic year. Talks are given before a live audience in Room B03 in the basement of the Gates Computer Science Building on the Stanford Campus. The live talks (and the videos hosted at Stanford and on YouTube) are open to the public.

Date and Time: 
Wednesday, January 31, 2018 - 4:30pm
Venue: 
Gates B03

EE380 Computer Systems Colloquium: Personal BioHacking

Topic: 
Personal BioHacking
Abstract / Description: 

"Cells Are Not Computers and DNA is Not a Programming Language and That's Ok"


 

The Stanford EE Computer Systems Colloquium (EE380) meets on Wednesdays 4:30-5:45 throughout the academic year. Talks are given before a live audience in Room B03 in the basement of the Gates Computer Science Building on the Stanford Campus. The live talks (and the videos hosted at Stanford and on YouTube) are open to the public.

Date and Time: 
Wednesday, January 24, 2018 - 4:30pm
Venue: 
Gates B03

EE380 Computer Systems Colloquium: Combining Physical and Statistical Models in Order to Narrow Uncertainty in Projected of Global Warming

Topic: 
Combining Physical and Statistical Models in Order to Narrow Uncertainty in Projected of Global Warming
Abstract / Description: 

A key question in climate science is How much global warming should we expect for a given increase in the atmospheric concentration of greenhouse gasses like carbon dioxide? One strategy for addressing this question is to run physical models of the global climate system but these models vary in their estimates of future warming by about a factor of two. Our research has attempted to narrow this range of uncertainty around model-projected future warming and to assess whether the upper or lower end of the model range is more likely. We showed that there are strong statistical relationships between how models simulate fundamental features of the Earth's energy budget over the recent past, and how much warming models simulate in the future. Importantly, we find that models that match observations the best over the recent past, tend to simulate more warming in the future than the average model. Thus, statistically combining information from physical models and observations tells us that we should expect more warming (with smaller uncertainty ranges) than we would expect if we were just looking at physical models in isolation and ignoring observations.

Date and Time: 
Wednesday, January 17, 2018 - 4:30pm
Venue: 
Gates B03

Special Seminar: Formal Methods meets Machine Learning: Explorations in Cyber-Physical Systems Design

Topic: 
Formal Methods meets Machine Learning: Explorations in Cyber-Physical Systems Design
Abstract / Description: 

Cyber-physical systems (CPS) are computational systems tightly integrated with physical processes. Examples include modern automobiles, fly-by-wire aircraft, software-controlled medical devices, robots, and many more. In recent times, these systems have exploded in complexity due to the growing amount of software and networking integrated into physical environments via real-time control loops, as well as the growing use of machine learning and artificial intelligence (AI) techniques. At the same time, these systems must be designed with strong verifiable guarantees.

In this talk, I will describe our research explorations at the intersection of machine learning and formal methods that address some of the challenges in CPS design. First, I will describe how machine learning techniques can be blended with formal methods to address challenges in specification, design, and verification of industrial CPS. In particular, I will discuss the use of formal inductive synthesis --- algorithmic synthesis from examples with formal guarantees — for CPS design. Next, I will discuss how formal methods can be used to improve the level of assurance in systems that rely heavily on machine learning, such as autonomous vehicles using deep learning for perception. Both theory and industrial case studies will be discussed, with a special focus on the automotive domain. I will conclude with a brief discussion of the major remaining challenges posed by the use of machine learning and AI in CPS.

Date and Time: 
Monday, December 4, 2017 - 4:00pm
Venue: 
Gates 463A

Pages

Subscribe to RSS - EE380 Computer Systems Colloquium