PODS Invited Talks

Keynote

What Next? A Half-Dozen Data Management Research Goals for Big Data and Cloud

Surajit Chaudhuri (Microsoft Research)

Abstract

"Big Data" and the Cloud are two disruptions that are influencing our field today. In this talk, I will outline the nature of this disruption. Next, following the structure of Jim Gray's Turing award lecture, I will describe six fundamental technical challenges that will be important as a research community to address in order to take advantage of these disruptions. While some of the challenges are unique to these disruptions, others are known challenges but whose importance is amplified by Big Data and the Cloud. A good solution to several of these problems will require a close interaction between data management systems and theory sub-communities.

Bio

Surajit Chaudhuri is a Distinguished Scientist at Microsoft research. His current areas of interest are enterprise data analytics, self-manageability and multi-tenant technology for cloud database services. Working with his colleagues in Microsoft Research and the Microsoft SQL Server team, he helped incorporate the Index Tuning Wizard – and subsequently Database Engine Tuning Advisor – into Microsoft SQL Server. He initiated a project on data cleaning at Microsoft Research whose technology now ships in Microsoft SQL Server Integration Services. Surajit is an ACM Fellow, a recipient of the ACM SIGMOD Edgar F. Codd Innovations Award, ACM SIGMOD Contributions Award, a VLDB 10 year Best Paper Award, and an IEEE Data Engineering Influential Paper Award. He was the Program Committee Chair for ACM SIGMOD 2006, a Co-Chair of ACM SIGKDD 1999, and has served on the editorial boards of ACM TODS and IEEE TKDE. Surajit received his Ph.D. from Stanford University and B.Tech from the Indian Institute of Technology, Kharagpur.

Tutorial 1

Linguistic Foundations for Bidirectional Transformations

Benjamin Pierce (University of Pennsylvania)

Abstract

Computing is full of situations where two different structures must be "connected" in such a way that updates to each can be propagated to the other. This is a generalization of the classical view update problem, which has been studied for decades in the database community; more recently, related problems have attracted considerable interest in other areas, including programming languages, software model transformation, user interfaces, and system configuration. Among the fruits of this cross-pollination has been the development of a linguistic perspective on the problem. Rather than taking some view definition language as fixed (e.g., choosing some subset of relational algebra) and looking for tractable ways of "inverting" view definitions to propagate updates from view to source, we can directly design new bidirectional programming languages in which every expression denotes a pair of functions mapping updates on one structure to updates on the other. Such structures are often called lenses. The foundational theory of lenses has been studied extensively, and lens-based language designs have been developed in several domains, including strings, trees, relations, graphs, and software models. These languages share some common elements with modern functional languages -- in particular, they come with very expressive type systems. In other respects, they are rather novel and surprising. This tutorial surveys recent developments in the theory of lenses and the practice of bidirectional programming languages.

Bio

Benjamin Pierce joined the CIS Department at Penn in 1998. Previously, he was on the faculty at Indiana University and held research fellowships at Cambridge University, the University of Edinburgh, and INRIA-Roquencourt. He received his Ph.D. in Computer Science at Carnegie Mellon University in 1991. His research centers on programming languages, static type systems, concurrent and distributed programming, and synchronization technologies. His books include the widely used graduate text Types and Programming Languages. He is also the lead designer of the popular Unison file synchronizer.

Tutorial 2

Approximate Computation and Implicit Regularization for Very Large-scale Data Analysis

Michael Mahoney (Stanford University)

Abstract

Database theory and database practice are typically done by computer scientists who adopt what may be termed an algorithmic perspective on their data. This perspective is very different than the perspective adopted by statisticians, scientific computers, machine learners, and other who work on what may be broadly termed statistical data analysis. I will address fundamental aspects of this algorithmic-statistical disconnect, with an eye to bridging the gap between these two very different approaches. A concept that lies at the heart of this disconnect is that of statistical regularization, a notion that has to do with how robust is the output of an algorithm to the noise properties of the input data. Although it is nearly completely absent from computer science, which historically has taken the input data as given and modeled algorithms discretely, regularization in one form or another is central to nearly every application domain that applies algorithms to noisy data. By using several case studies, I will illustrate, both theoretically and empirically, the nonobvious fact that approximate computation, in and of itself, can implicitly lead to statistical regularization. This and other recent work suggests that, by exploiting in a more principled way the statistical properties implicit in worst-case algorithms, one can in many cases satisfy the bicriteria of having algorithms that are scalable to very large-scale databases and that also have good inferential or predictive properties.

Bio

Michael Mahoney is at Stanford University. His research interests center around algorithms for very large-scale statistical data analysis, including both theoretical and applied aspects of problems in scientific and Internet domains. His current research interests include geometric network analysis; developing approximate computation and regularization methods for large informatics graphs; applications to community detection, clustering, and information dynamics in large social and information networks; and the theory of randomized matrix algorithms and its application to genetics, medical imaging, and Internet problems. He has been a faculty member at Yale University and a researcher at Yahoo, and his PhD was is computational statistical mechanics at Yale University.

Welcome

Organization

Links

Calls For Submissions

PODS Program

SIGMOD Program

Workshops Program

PODS Invited Talks

Keynote

What Next? A Half-Dozen Data Management Research Goals for Big Data and Cloud

Abstract

Bio

Tutorial 1

Linguistic Foundations for Bidirectional Transformations

Abstract

Bio

Tutorial 2

Approximate Computation and Implicit Regularization for Very Large-scale Data Analysis

Abstract

Bio