SIGMOD 2012: Experimental Reproducibility
The goal of establishing reproducibility is to ensure your SIGMOD 2012 research paper stands as reliable work that can be referenced by future research. The premise is that experimental papers will be most useful when their results have been tested and generalized by objective third parties.
The Review Process
The committee contacts the authors of accepted papers, who can submit experiments for review --on a voluntary basis-- from April to September 2012. Details about the submission process will be communicated directly to authors. The committee makes the decision to award or not the following labels:
- Reproducible Label: The experiments reproduced by the committee support the central results reported in the paper.
- Sharable Label: The experiments are made available to the community and they have been tested by the committee --a URL is provided.
How does the committee assess whether the experiments reproduced by the committee support the central results reported in the paper? To get a reproducible label, a submission must fulfill the following three criteria:
- Depth: Each submitted experiment contains:
- A prototype system provided as a white box (source, configuration files, build environment) or a commercial system fully specified
- The set of experiments (system configuration and initialization, scripts, workload, measurement protocol) used to produce the raw experimental data
- The scripts needed to transform the raw data into the graphs included in the paper
- Portability: The results can be reproduced on a different environment (i.e., on a different OS or machine) than the original development environment.
- Coverage: Central results and claims from the paper are supported by the submitted experiments.
Some Guidelines
Authors should make it easy for reviewers (and the community at large) to reproduce the central experimental results reported in a paper. Here are some guidelines for authors based on the experience from previous years.
We distinguish two phases in any experimentation effort, namely, primary data acquisition and data derivation:
- Primary Data Acquisition: Here the issue is how to obtain the raw data upon which the conclusions are drawn. Sometimes the reproducibility committee can simply rerun software (e.g., rerun some existing benchmark). At other times, obtaining raw data may require special hardware (e.g., sensors in the arctic). In the latter case, the committee will not be able to reproduce the acquisition of raw data, but then you should provide the committee with a protocol including detailed procedures for system set-up, experiment set-up, and measurements. In contrast, whenever raw data acquisition can be produced, the following information should be provided:
- Environment: Authors should explicitly specify the OS and tools that should be installed as the environment. Such specification should include dependencies with specific hardware features (e.g., 25 GB of RAM are needed) or dependencies within the environment (e.g., the compiler that should be used must be run with a specific version of the OS). Note that a virtual machine allows authors to distribute open source environments for single site systems.
- System: System setup is the most challenging aspect when repeating an experiment, as the system needs to be installed and configured in a new environment before experiments can be run. System setup will be easier to conduct if it is automatic rather than manual. Authors should test that the system they distribute can actually be installed in a new environment. The documentation should detail every step in system setup:
- How to obtain the system?
- How to configure the environment if need be (e.g., environment variables, paths)?
- How to compile the system (existing compilation options should be mentioned)?
- How to use the system? (What are the configuration options and parameters to the system?)
- How to make sure that the system is installed correctly?
- Experiments: Given a system, authors provide a set of experiments to reproduce the paper's results. Typically, each experiment consists of a setup phase (where parameters are configured and data is loaded), a running phase (where a workload is applied and measurements are taken), and a clean-up phase (where the system is prepared to avoid interference with the next round of experiments). Authors should document (i) how to perform the setup, running and clean-up phases, and (ii) how to check that these phases complete as they should. Authors should document the expected effect of the setup phase (e.g., a cold file cache is enforced) and the different steps of the running phase (e.g., by documenting the combination of command line options used to run a given experiment script). Needless to say, experiments will be easier to reproduce if they are automatic (e.g., via a script that takes a range of values for each experiment parameter as arguments) rather than manual (e.g., via a script that must be edited so that a constant takes the value of a given experiment parameter).
- Data Derivation: For each graph in the paper, the authors should describe how the graph is obtained from the experimental measurements. Ideally, the authors release the scripts (or spreadsheets) that are used to compute derivations (typically statistics) and generate the graphs.
The experiments published by Jens Teubner and Rene Mueller, from ETH Zurich, together with their SIGMOD 2011 article titled "How Soccer Players Would Do Stream Joins," are an excellent illustration of these guidelines.
Reproducibility Committee
Philippe Bonnet, ITU, Denmark, chair
Juliana Freire, NYU, USA, chair
Matias Bjorling, ITU, Denmark
Wei Cao, Renmin University, China
Eli Cortez, Universidade Federal do Amazonas, Brazil
Stratos Idreos, CWI, Netherlands
Ryan Johnson, University of Toronto, Canada
Martin Kaufmann, ETHZ, Switzerland
David Koop, University of Utah, USA
Lucja Kot, Cornell University, USA
Willis Lang, University of Wisconsin-Madison, USA
Mian Lu, HKUST, China
Dan Olteanu, Oxford University, UK
Paolo Papotti, Qatar Computing Research Institute (QCRI), Qatar
Ben Sowell, Cornell University, USA
Radu Stoica, EPFL, Switzerland
Dimitris Tsirogiannis, Microsoft, USA