Document Server@UHasselt >
Research >
Research publications >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/10209

Title: Scalable multi-query optimization over federated scientific databases
Authors: VAN DE CRAEN, Dieter
Advisors: Neven, Frank
Issue Date: 2009
Publisher: UHasselt Diepenbeek
Abstract: We will not focus on the actual integration problem. Instead we focus on how to efficiently evaluate distributed queries. One of the key characteristics of the databases discussed in Section 1.2 is that sources are frequently updated. Updates lead to interesting challenges even if one is interested in simple queries. Indeed, answers to queries may vary over time as more data becomes available. However, it is cumbersome to repeat all queries over time especially if they combine information from several sources. We therefore propose a monitoring approach. Users regist er their queries once and these queries are then executed periodically in batch mode. Users are then notified as soon as new answers to t heir queries arrive. As t hese queries are evaluated repeatedly, it is natural to look at multi query optimization (MQO) in this setting. An important characteristic of monitoring systems is that they typically support multiple users and therefore we must consider a large number of queries. We have chosen to focus on t he optimization of the communication cost, one of t he main bottlenecks in our setting with large amounts of distributed data. In the development of our systems we ensured t hat users need no special expertise in some query language to formulate their queries. Being non-experts in computer science, the scientists are faced with two major challenges: (i) How to express such distributed queries. Expressing distributed queries is a non-trivial task, even if we assume that scientists are familiar with query languages like SQL. Such queries can get arbitrarily complex as more sources are considered; (ii) How to efficiently evaluate such distributed queries. An efficient evaluation must account for batches of hundreds (or even t housands) of submitted queries and must optimize all of t hem as a whole.
Notes: doctoraat wetenschappen informatica
URI: http://hdl.handle.net/1942/10209
Category: T1
Type: Theses and Dissertations
Appears in Collections: PhD theses
Research publications

Files in This Item:

Description SizeFormat
N/A21.24 MBAdobe PDF

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.