# PODS Invited Talks

## Keynote

### What Next? A Half-Dozen Data Management Research Goals for Big Data and Cloud

Surajit Chaudhuri (Microsoft Research)

### Abstract

"Big Data" and the Cloud are two disruptions that are influencing our field today. In this talk, I will outline the nature of this disruption. Next, following the structure of Jim Gray's Turing award lecture, I will describe six fundamental technical challenges that will be important as a research community to address in order to take advantage of these disruptions. While some of the challenges are unique to these disruptions, others are known challenges but whose importance is amplified by Big Data and the Cloud. A good solution to several of these problems will require a close interaction between data management systems and theory sub-communities.

### Bio

**Surajit Chaudhuri** is a Distinguished Scientist at Microsoft
research. His current areas of interest are
enterprise data analytics,
self-manageability and multi-tenant
technology for cloud database services.
Working with his colleagues in Microsoft
Research and the Microsoft SQL Server team,
he helped incorporate the Index Tuning
Wizard – and subsequently Database
Engine Tuning Advisor – into Microsoft
SQL Server. He initiated a project on data
cleaning at Microsoft Research whose
technology now ships in Microsoft SQL Server
Integration Services. Surajit is an ACM
Fellow, a recipient of the ACM SIGMOD Edgar
F. Codd Innovations Award, ACM SIGMOD
Contributions Award, a VLDB 10 year Best
Paper Award, and an IEEE Data Engineering
Influential Paper Award. He was the Program
Committee Chair for ACM SIGMOD 2006, a
Co-Chair of ACM SIGKDD 1999, and has served
on the editorial boards of ACM TODS and IEEE
TKDE. Surajit received his Ph.D. from
Stanford University and B.Tech from the Indian Institute of
Technology, Kharagpur.

## Tutorial 1

### Linguistic Foundations for Bidirectional Transformations

Benjamin Pierce (University of Pennsylvania)

### Abstract

Computing is full of situations where two different structures must be "connected" in such a way that updates to each can be propagated to the other. This is a generalization of the classical view update problem, which has been studied for decades in the database community; more recently, related problems have attracted considerable interest in other areas, including programming languages, software model transformation, user interfaces, and system configuration. Among the fruits of this cross-pollination has been the development of a linguistic perspective on the problem. Rather than taking some view definition language as fixed (e.g., choosing some subset of relational algebra) and looking for tractable ways of "inverting" view definitions to propagate updates from view to source, we can directly design new bidirectional programming languages in which every expression denotes a pair of functions mapping updates on one structure to updates on the other. Such structures are often called lenses. The foundational theory of lenses has been studied extensively, and lens-based language designs have been developed in several domains, including strings, trees, relations, graphs, and software models. These languages share some common elements with modern functional languages -- in particular, they come with very expressive type systems. In other respects, they are rather novel and surprising. This tutorial surveys recent developments in the theory of lenses and the practice of bidirectional programming languages.

### Bio

**Benjamin Pierce** joined the CIS Department at Penn in
1998. Previously, he was on the faculty at Indiana University and held
research fellowships at Cambridge University, the University of
Edinburgh, and INRIA-Roquencourt.
He received his Ph.D. in Computer Science at Carnegie Mellon University in
1991. His research centers on programming languages, static type systems,
concurrent and distributed programming, and synchronization technologies.
His books include the widely used graduate text **Types and Programming
Languages**. He is also the lead designer of the popular **Unison** file
synchronizer.

## Tutorial 2

### Approximate Computation and Implicit Regularization for Very Large-scale Data Analysis

Michael Mahoney (Stanford University)

### Abstract

Database theory and database practice are typically done by computer scientists who adopt what may be termed an algorithmic perspective on their data. This perspective is very different than the perspective adopted by statisticians, scientific computers, machine learners, and other who work on what may be broadly termed statistical data analysis. I will address fundamental aspects of this algorithmic-statistical disconnect, with an eye to bridging the gap between these two very different approaches. A concept that lies at the heart of this disconnect is that of statistical regularization, a notion that has to do with how robust is the output of an algorithm to the noise properties of the input data. Although it is nearly completely absent from computer science, which historically has taken the input data as given and modeled algorithms discretely, regularization in one form or another is central to nearly every application domain that applies algorithms to noisy data. By using several case studies, I will illustrate, both theoretically and empirically, the nonobvious fact that approximate computation, in and of itself, can implicitly lead to statistical regularization. This and other recent work suggests that, by exploiting in a more principled way the statistical properties implicit in worst-case algorithms, one can in many cases satisfy the bicriteria of having algorithms that are scalable to very large-scale databases and that also have good inferential or predictive properties.

### Bio

**Michael Mahoney** is at Stanford University. His
research interests center around algorithms for very large-scale
statistical data analysis, including both theoretical and applied
aspects of problems in scientific and Internet domains. His current
research interests include geometric network analysis; developing
approximate computation and regularization methods for large
informatics graphs; applications to community detection, clustering,
and information dynamics in large social and information networks; and
the theory of randomized matrix algorithms and its application to
genetics, medical imaging, and Internet problems. He has been a
faculty member at Yale University and a researcher at Yahoo, and his
PhD was is computational statistical mechanics at Yale University.