Document Server@UHasselt >
Research >
Research publications >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/788

Title: The Distribution of N-Grams
Authors: EGGHE, Leo
Issue Date: 2000
Citation: Scientometrics, 47(2). p. 237-252
Abstract: N-grams are generalized words consisting of N consecutive symbols, as they are used in a text. This paper determines the rank-frequency distribution for redundant N-grams. For entire texts this is known to be Zipf's law (i.e., an inverse power law). For N-grams, however, we show that the rank (r)-frequency distribution is P-N(r)=C/(psi(N)(r))(beta), where psi(N) is the inverse function of f(N)(x)=x ln(N-1)x. Here we assume that the rank-frequency distribution of the symbols follows Zipf's law with exponent beta.
URI: http://hdl.handle.net/1942/788
DOI: 10.1023/A:1005634925734
ISI #: 000089449100005
ISSN: 0138-9130
Category: A1
Type: Journal Contribution
Validation: ecoom, 2001
Appears in Collections: Research publications

Files in This Item:

Description SizeFormat
Published version393.51 kBAdobe PDF
Peer-reviewed author version287.05 kBAdobe PDF
Peer-reviewed author version287.05 kBAdobe PDF

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.