March 2013 SIGMOD Blog features Gerhard Weikum on "Where’s the Data in the Big Data Wave?"
Visit the SIGMOD Blog (wp.sigmod.org) to read Prof. Gerhard Weikum's blog on "Where’s the Data in the Big Data Wave?" Prof. Weikum writes that "Big Data should be Interesting Data! There are various definitions of Big Data; most center around a number of V’s like volume, velocity, variety, veracity – in short: interesting data (interesting in at least one aspect). However, when you look into research papers on Big Data, in SIGMOD, VLDB, or ICDE, the data that you see here in experimental studies is utterly boring. Performance and scalability experiments are often based on the TPC-H benchmark: completely synthetic data with a synthetic workload that has been beaten to death for the last twenty years. Data quality, data cleaning, and data integration studies are often based on bibliographic data from DBLP, usually old versions with less than a million publications, prolific authors, and curated records. I doubt that this is a real challenge for tasks like entity linkage or data cleaning. So where’s the – interesting – data in Big Data research?"
Visit http://wp.sigmod.org/?p=786 to read the blog posting.