Document Server@UHasselt >
Education >
Archive >
PhD theses >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/8837

Title: Unsupervised Learning of Binary Vectors
Authors: Copelli Lopes da Silva, Mauro
Advisors: Van den Broeck, Christian
Issue Date: 1999
Publisher: UHasselt Diepenbeek
Abstract: In this thesis unsupervised learning of binary vectors from data is studied using methods from Statistical Mechanics of disordered systems. In the model data vectors are distributed according to a single symmetry breaking direction The aim of unsupervised learning is to provide a good approximation to this direction The difference with respect to previous studies is the knowledge that this preferential direction has binary components It is shown that sampling from the posterior distribution 'Gibbs learning' leads for general smooth distributions to an exponentially fast approach to perfect learning in the asymptotic limit of large number of examples. If the distribution is non-smooth then first order phase transitions to perfect learning are expected. In the limit of poor performance at the other end of the asymptotics the binary nature of the preferential direction is irrelevant and the results are the same as for the spherical case a second order phase transition 'retarded learning' is predicted to occur if the data distribution is not biased or if the distribution is biased learning starts off immediately. Using concepts from Bayesian inference the center of mass of the Gibbs ensemble is shown to have maximal average Bayes-optimal performance. This upper bound for continuous vectors is extended to a discrete space resulting in the clipped center of mass of the Gibbs ensemble having maximal average performance among the binary vectors In order to calculate the performance of this best binary vector the geometric properties of the center of mass of binary vectors are first studied. The surprising result is found that the center of mass of innite binary vectors which obey some simple constraints is again a binary vector. When disorder is taken into account in the calculation however the properties of the Bayes-optimal center of mass change completely leading to a vector with continuous components. The performance of the best binary vector is calculated and shown to always lie above that of Gibbs learning and below the Bayes-optimal performance. Making use of a variational approach under the replica symmetric ansatz an optimal potential is constructed in the limits of zero temperature and mutual overlap. Under these assumptions minimization of this potential in the binary space is shown not to saturate the best binary bound except asymptotically and for a special case. The alternative technique of transforming the components of a continuous vector is studied showing that asymptotically and for the same special case saturation of both bounds can occur.
URI: http://hdl.handle.net/1942/8837
Type: Theses and Dissertations
Appears in Collections: PhD theses
Research publications

Files in This Item:

Description SizeFormat
N/A1.06 MBAdobe PDF

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.