Information Systems Lab (ISL) Colloquium

ISL Colloquium: Delay, memory, and messaging tradeoffs in a distributed service system

Topic: 
Delay, memory, and messaging tradeoffs in a distributed service system
Abstract / Description: 

We consider the classical supermarket model: jobs arrive as a Poisson process of rate of lambda N, with 0 < lambda < 1, and are to be routed to one of N identical servers with unit mean, exponentially distributed processing times. We review a variety of policies and architectures that have been considered in the literature, and which differ in terms of the direction and number of messages that are exchanged, and the memory that they employ; for example, the "power-of-d-choices" or pull-based policies. In order to compare policies of this kind, we focus on the resources (memory and messaging) that they use, and on whether the expected delay of a typical vanishes as N increases.
We show that if (i) the message rate increases superlinearly, or (ii) the memory size increases superlogarithmically, as a function of N, then there exists a policy that drives the delay to zero, and we outline an analysis using fluid models. On the other hand, if neither condition (i) or (ii) holds, then no policy within a broad class of symmetric policies can yield vanishing delay.

Date and Time: 
Thursday, November 9, 2017 - 4:15pm
Venue: 
Packard 101

ISL Colloquium: Dykstra’s Algorithm, ADMM, and Coordinate Descent: Connections, Insights, and Extensions

Topic: 
Dykstra’s Algorithm, ADMM, and Coordinate Descent: Connections, Insights, and Extensions
Abstract / Description: 

We study connections between Dykstra's algorithm for projecting onto an intersection of convex sets, the augmented Lagrangian method of multipliers or ADMM, and block coordinate descent. We prove that coordinate descent for a regularized regression problem, in which the penalty is a separable sum of support functions, is exactly equivalent to Dykstra's algorithm applied to the dual problem. ADMM on the dual problem is also seen to be equivalent, in the special case of two sets, with one being a linear subspace. These connections, aside from being interesting in their own right, suggest new ways of analyzing and extending coordinate de- scent. For example, from existing convergence theory on Dykstra's algorithm over polyhedra, we discern that coordinate descent for the lasso problem converges at an (asymptotically) linear rate. We also develop two parallel versions of coordinate descent, based on the Dykstra and ADMM connections.

Date and Time: 
Wednesday, October 11, 2017 - 4:15pm
Venue: 
Packard 202

An Information Theoretic Perspective of Fronthaul Constrained Cloud and Fog Radio Access Networks [Special Seminar: ISL Colloquium]

Topic: 
An Information Theoretic Perspective of Fronthaul Constrained Cloud and Fog Radio Access Networks
Abstract / Description: 

Cloud radio access networks (C-RANs) emerge as appealing architectures for next-generation wireless/cellular systems whereby the processing/decoding is migrated from the local base-stations/radio units (RUs) to a control/central unit (CU) in the "cloud". Fog radio access networks (F-RAN) address the case where the RUs are enhanced by having the ability of local caching of popular contents. The network
operates via fronthaul digital links connecting the CU and the RUs. In this talk we will address basic information theoretic aspects of such networks, with emphasis of simple oblivious processing. Theoretical results illustrate the considerable performance gains to be expected for different cellular models. Some interesting theoretical directions conclude the presentation.

Date and Time: 
Wednesday, May 24, 2017 - 2:00pm
Venue: 
Packard 202

Cracking Big Data with Small Data [ISL Colloquium]

Topic: 
Cracking Big Data with Small Data
Abstract / Description: 

For the last several years, we have witnessed the emergence of datasets of an unprecedented scale across different scientific disciplines. The large volume of such datasets presents new computational challenges as the diverse, feature-rich, and usually high-resolution data does not allow for effective data-intensive inference. In this regard, data summarization is a compelling (and sometimes the only) approach that aims at both exploiting the richness of large-scale data and being computationally tractable; Instead of operating on complex and large data directly, carefully constructed summaries not only enable the execution of various data analytics tasks but also improve their efficiency and scalability.

A systematic way for data summarization is to turn the problem into selecting a subset of data elements optimizing a utility function that quantifies "representativeness" of the selected set. Often-times, these objective functions satisfy submodularity, an intuitive notion of diminishing returns stating that selecting any given element earlier helps more than selecting it later. Thus, many problems in data summarization require maximizing submodular set functions subject to cardinality and massive data means we have to solve these problems at scale.

In this talk, I will present our recent efforts in developing practical schemes for data summarization. In particular, I will first discuss the fastest centralized solution whose query complexity is only linear in data size. However, to truly summarize massive data we need to opt for scalable methods. I will then present a streaming algorithm that with a single pass over the data provides a constant-factor approximation guarantee to the optimum solution. Finally, I will talk about a distributed approach that summarizes tens of millions of data points in a timely fashion. I will also demonstrate experiments on several applications, including sparse Gaussian process inference and exemplar-based clustering using Apache-Spark.

Date and Time: 
Thursday, May 11, 2017 - 4:15pm
Venue: 
Packard 101

An information-theoretic perspective on interference management [ISL Colloquium]

Topic: 
An information-theoretic perspective on interference management
Abstract / Description: 

For high data rates and massive connectivity, next-generation cellular networks are expected to deploy many small base stations. While such dense deployment provides the benefit of bringing radio closer to end users, it also increases the amount of interference from neighboring cells. Consequently, efficient and effective management of interference is expected to become one of the main challenges for high-spectral-efficiency, low-power, broad-coverage wireless communications.

In this talk, we introduce two competing paradigms of interference management and discuss recent developments in network information theory under these paradigms. In the first "distributed network" paradigm, the network consists of autonomous cells with minimal cooperation. We explore advanced channel coding techniques for the corresponding mathematical model of the "interference channel," focusing mainly on the sliding-window superposition coding scheme that achieves the performance of simultaneous decoding through point-to-point channel codes and low-complexity decoding. In the second "centralized network" paradigm, the network is a group of neighboring cells connected via backhaul links. For uplink and downlink communications over this "two-hop relay network," we develop dual coding schemes – noisy network coding and distributed decode-forward – that achieve capacity universally within a few bits per degree of freedom.

Date and Time: 
Thursday, April 20, 2017 - 4:15pm
Venue: 
Packard 101

When Exploration is Expensive -- Reducing and Bounding the Amount of Experience Needed to Learn to Make Good Decisions [ISL]

Topic: 
When Exploration is Expensive -- Reducing and Bounding the Amount of Experience Needed to Learn to Make Good Decisions
Abstract / Description: 

Understanding the limits of how much experience is needed to learn to make good decisions is both a foundational issue in reinforcement learning, and has important applications. Indeed, the potential to have artificial agents that help augment human capabilities, in the form of automated coaches or teachers, is enormous. Such reinforcement learning agents must explore in costly domains, since each experience comes from interacting with a human. I will discuss some of our recent theoretical results on sample efficient reinforcement learning.


 

The Information Systems Laboratory Colloquium (ISLC) is typically held in Packard 101 every Thursday at 4:15 pm during the academic year. Refreshments are usually served after the talk.

The Colloquium is organized by graduate students Martin Zhang, Farzan Farnia, Reza Takapoui, and Zhengyuan Zhou. To suggest speakers, please contact any of the students.

Date and Time: 
Thursday, April 27, 2017 - 4:15pm
Venue: 
Packard 101

Self-Driving Networks Workshop [ISL]

Topic: 
Self-Driving Networks Workshop
Abstract / Description: 

Networks have become very complex over the past decade. The users and operators of large cloud platforms and campus networks have desired a much more programmable network infrastructure to meet the dynamic needs of different applications and reduce the friction they can cause to each other. This has culminated in the Software-­‐defined Networking paradigm. But you cannot program what you do not understand: the volume, velocity and richness of network applications and traffic seem beyond the ability of direct human comprehension. What is needed is a sensing, inference and learning system which can observe the data emitted by a network during the course of its operation, reconstruct the network's evolution, infer key performance metrics, continually learn the best responses to rapidly-­‐changing load and operating conditions, and help the network adapt to them in real-­‐time. The workshop brings together academic and industry groups interested in the broad themes of this topic. It highlights ongoing research at Stanford and describes initial prototype systems and results from pilot deployments.

Date and Time: 
Wednesday, April 12, 2017 (All day)
Venue: 
Arrillaga Alumni Center

New Directions in Management Science & Engineering: A Brief History of the Virtual Lab

Topic: 
New Directions in Management Science & Engineering: A Brief History of the Virtual Lab
Abstract / Description: 

Lab experiments have long played an important role in behavioral science, in part because they allow for carefully designed tests of theory, and in part because randomized assignment facilitates identification of causal effects. At the same time, lab experiments have traditionally suffered from numerous constraints (e.g. short duration, small-scale, unrepresentative subjects, simplistic design, etc.) that limit their external validity. In this talk I describe how the web in general—and crowdsourcing sites like Amazon's Mechanical Turk in particular—allow researchers to create "virtual labs" in which they can conduct behavioral experiments of a scale, duration, and realism that far exceed what is possible in physical labs. To illustrate, I describe some recent experiments that showcase the advantages of virtual labs, as well as some of the limitations. I then discuss how this relatively new experimental capability may unfold in the future, along with some implications for social and behavioral science.

Date and Time: 
Thursday, March 16, 2017 - 12:15pm
Venue: 
Packard 101

Insensitivity of Loss Systems under Randomized SQ(d) Algorithms [ISL Colloquium]

Topic: 
Insensitivity of Loss Systems under Randomized SQ(d) Algorithms
Abstract / Description: 

In many applications such as cloud computing, managing server farm resources etc. an incoming task or job has to be matched with an appropriate server in order to minimise the latency or blocking associated with the processing. Ideally the best choice would be to match a job to the fastest available server. However when there are thousands of servers requiring the information on all server tasks is an overkill.

Pioneered in the 1990's the idea of randomised sampling of a few servers was proposed by Vvedenskaya and Dobrushin in Russia and Mitzmenmacher in the US and popularised as the "power of two" schemes which basically means that sampling two servers randomly and sending the job to the "better" server (i.e. with the shortest queue, or most resources) provides most of the benefits of sampling all the servers.

In the talk I will discuss multi-server loss models under power-of-d routing scheme when service time distributions are general with finite mean. Previous works on these models assume that the service times are exponentially distributed and insensitivity was suggested through simulations. Showing insensitivity to service time distributions has remained an open problem. We address this problem by considering service time distributions as Mixed-Erlang distributions that are dense in the class of general distributions on (0, ∞). We derive the mean field equations (MFE) of the empirical distributions for the system and establish the existence and uniqueness of the fixed point of the MFE. Furthermore we show that the fixed point of the MFE corresponds to the fixed point obtained from the MFE corresponding to a system with exponential service times showing that the fixed point is insensitive to the distribution. Due to lack of uniformity of the mixed-Erlang convergence the true general case needs to be handled differently. I will conclude the case of the MFE with general service times showing that the MFE is now characterized by a pde whose stationary point coincides with the fixed point in the case with exponential service times.The techniques developed in this paper are applicable to study mean field limits for Markov processes on general state spaces and insensitivity properties of other queueing models.

Date and Time: 
Monday, March 20, 2017 - 3:00pm
Venue: 
Packard 202

Anonymity in the Bitcoin Peer-to-Peer Network [ISL Colloquium]

Topic: 
Anonymity in the Bitcoin Peer-to-Peer Network
Abstract / Description: 

Bitcoin enjoys a public perception of being a privacy-preserving financial system. In reality, Bitcoin has a number of privacy vulnerabilities, including the well-studied fact that transactions can be linked through the public blockchain. More recently, researchers have demonstrated deanonymization attacks that exploit a lower-layer weakness: the Bitcoin peer-to-peer (P2P) networking stack. In particular, the P2P network currently forwards content in a structured way that allows observers to deanonymize users by linking their transactions to the originating IP addresses. In this work, we first demonstrate that current protocols exhibit poor anonymity guarantees, both theoretically and in practice. Then, we consider a first-principles redesign of the P2P network, with the goal of providing strong, provable anonymity guarantees. We propose a simple networking policy called Dandelion, which achieves nearly-optimal anonymity guarantees at minimal cost to the network's utility.

Date and Time: 
Thursday, March 16, 2017 - 4:15pm
Venue: 
Packard 101

Pages

Subscribe to RSS - Information Systems Lab (ISL) Colloquium