Document Server@UHasselt >
Education >
Archive >
Applied Statistics: Master theses >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/3695

Title: Dealing with missing data in cross sectional data on transport
Authors: RUMISHA, Susan Fred
Advisors: HENS, N.
Issue Date: 2007
Abstract: In sample surveys and most research work non-response is often a major problem, this means, sometimes the required data are not obtained for all elements that are selected for observation, and this leads to missing data. Missingness can occur in cross-sectional, longitudinal or multivariate studies. Different imputation methods are available and have been used to fill-in the missing data (either response or covariates) and the produced data is expected, under certain conditions, to lead to valid inference. This study explores efficiency of several imputation methods in cross-sectional data, including parametric and nonparametric, in estimating the effect of covariates in linear models. Simple and advanced imputation methods, such as multiple imputations were considered. Since our data was from a cross-sectional study, univariate patterns and behaviors of missingness were used. Two main scenarios were considered, including a case where the missingness is in the response variable and when the missingness occurs in the covariate. An approach followed was that, a new data was generated, missingness was invocated using different types of missingness models depending on the assumed mechanism, and then imputation was employed to the missing values. Assessment of the accuracy was done by comparing results with the true estimates, which were obtained from original generated data. The focus was in the regression model parameters estimates (with their SE) and the variability introduced in the response values. To evaluate the efficiency of methods and variability of parameters of interest, simulation studies were done. With the runs obtains, MASE values were calculated for each method and compared. Parametric methods for imputation were found to be not adequate, especially when the missing proportion in the response is high. Results from nonparametric methods were good despite slight over or underestimation of the variability in the data. For the case of missingness in the covariate, unbiased results were obtained under MCAR and MAR and biased results under MNAR. However, in this case, single parametric methods seem to perform better than multiple imputation methods or nonparametric ones. It was observed that missingness mechanism could be influenced by the magnitude of the effect of covariate in the fitted model or in the missingness model involved. In other words, one can say that, the strength of the relationship between covariates and the response variable plays a role in manipulating the missingness mechanism. These results were observed using simple exploration hence more research is needed to provide more support.
Notes: Master in Biostatistics
URI: http://hdl.handle.net/1942/3695
Category: T2
Type: Theses and Dissertations
Appears in Collections: Applied Statistics: Master theses

Files in This Item:

Description SizeFormat
N/A4.3 MBAdobe PDF

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.