Document Server@UHasselt >
Research >
Research publications >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/10652

Title: Temporal Support of Regular Expressions in Sequential Pattern Mining
Authors: VAISMAN, Alejandro
Gomez, Letitia
Issue Date: 2009
Publisher: Dagstuhl
Citation: Kuijpers, Bart & Pedreschi, Dino & Saygin, Yucel & Spaccapietra, Stefano (Ed.) Dagstuhl Seminar Proceedings 08471 "Geographic Privacy-Aware Knowledge Discovery and Delivery".
Abstract: Classic algorithms for sequential pattern discovery,return all frequent sequences present in a database. Since, in general, only a few ones are interesting from a user's point of view, languages based on regular expressions (RE) have been proposed to restrict frequent sequences to the ones that satisfy user-specified constraints. Although the support of a sequence is computed as the number of data-sequences satisfying a pattern with respect to the total number of data-sequences in the database, once regular expressions come into play, new approaches to the concept of support are needed. For example, users may be interested in computing the support of the RE as a whole, in addition to the one of a particular pattern. As a simple example, the expression $(A|B).C$ is satisfied by sequences like A.C or B.C. Even though the semantics of this RE suggests that both of them are equally interesting to the user, if neither of them verifies a minimum support although together they do), they would not be retrieved. Also, when the items are frequently updated, the traditional way of counting support in sequential pattern mining may lead to incorrect (or, at least incomplete), conclusions. For example, if we are looking for the support of the sequence A.B, where A and B are two items such that A was created after B, all sequences in the database that were completed before A was created, can never produce a match. Therefore, accounting for them would underestimate the support of the sequence A.B. The problem gets more involved if we are interested in categorical sequential patterns. In light of the above, in this paper we propose to revise the classic notion of support in sequential pattern mining, introducing the concept of temporal support of regular expressions, intuitively defined as the number of sequences satisfying a target pattern, out of the total number of sequences that could have possibly matched such pattern, where the pattern is defined as a RE over complex items (i.e., not only item identifiers, but also attributes and functions). We present and discuss a theoretical framework for these novel notion of support.
URI: http://hdl.handle.net/1942/10652
Link to publication: http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=2008
Category: C2
Type: Proceedings Paper
Appears in Collections: Research publications

Files in This Item:

Description SizeFormat
N/A299.12 kBAdobe PDF

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.