Document Server@UHasselt >
Education >
School for Information Technology >
Master theses >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/12737

Title: Hidden Markov Modellen voor het infereren van XSDs
Authors: Fonteyn, Dominique
Advisors: NEVEN, Frank
Issue Date: 2011
Publisher: tUL Diepenbeek
Abstract: XML is the most popular languages for storing data on the web. Using schemas we can specify the structure of these documents. Its presence is used for automatic validation and. However, half of the online XML fragments do not refer to a schema and about two-thirds of the XSDs are not valid w.r.t. the W3C specifications. Thus we look for algorithms to infer an XSD for a set of XML fragments. In this thesis we explore inference techniques. This boils down to inferring regular expressions. However we cannot learn all regular expressions from positive data only and restrict us to SOREs. We present iXSD for local SOXSDs. Next we identify k-occurrence REs which are harder. We focus on HMMs to infer kOREs with iDRegEx. We combine these algorithms to infer local k-OXSDs. We present a similarity measure for two XSDs used for evaluating the experimental results. We see that it does not perform well on precision and generalisation but rather well on similarity and runtime.
Notes: master in de informatica-databases
URI: http://hdl.handle.net/1942/12737
Category: T2
Type: Theses and Dissertations
Appears in Collections: Master theses

Files in This Item:

There are no files associated with this item.

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.