Document Server@UHasselt >
Research >
Research publications >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/23344

Title: Parallel Evaluation of Multi-Semi-Joins
Authors: Daenen, Jonny
Neven, Frank
Tan, Tony
Vansummeren, Stijn
Issue Date: 2016
Citation: Proceedings of the VLDB Endowmen, 9(10), p. 732-743
Abstract: While services such as Amazon AWS make computing power abundantly available, adding more computing nodes can in- cur high costs in, for instance, pay-as-you-go plans while not always significantly improving the net running time (aka wall-clock time) of queries. In this work, we provide algo- rithms for parallel evaluation of SGF queries in MapReduce that optimize total time, while retaining low net time. Not only can SGF queries specify all semi-join reducers, but also more expressive queries involving disjunction and negation. Since SGF queries can be seen as Boolean combinations of (potentially nested) semi-joins, we introduce a novel multi- semi-join (MSJ) MapReduce operator that enables the eval- uation of a set of semi-joins in one job. We use this op- erator to obtain parallel query plans for SGF queries that outvalue sequential plans w.r.t. net time and provide addi- tional optimizations aimed at minimizing total time without severely affecting net time. Even though the latter optimiza- tions are NP-hard, we present effective greedy algorithms. Our experiments, conducted using our own implementation Gumbo on top of Hadoop, confirm the usefulness of parallel query plans, and the effectiveness and scalability of our op- timizations, all with a significant improvement over Pig and Hive.
URI: http://hdl.handle.net/1942/23344
Link to publication: http://www.vldb.org/pvldb/vol9/p732-daenen.pdf
DOI: 10.14778/2977797.2977800
ISSN: 2150-8097
Category: A1
Type: Journal Contribution
Appears in Collections: Research publications

Files in This Item:

Description SizeFormat
Published version324.84 kBAdobe PDF

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.