|
Document Server@UHasselt >
Education >
Archive >
Applied Statistics: Master theses >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/1942/3695
|
Title: | Dealing with missing data in cross sectional data on transport |
Authors: | RUMISHA, Susan Fred |
Advisors: | HENS, N. |
Issue Date: | 2007 |
Abstract: | In sample surveys and most research work non-response is often a major problem, this means,
sometimes the required data are not obtained for all elements that are selected for observation, and
this leads to missing data. Missingness can occur in cross-sectional, longitudinal or multivariate
studies. Different imputation methods are available and have been used to fill-in the missing data
(either response or covariates) and the produced data is expected, under certain conditions, to lead to
valid inference. This study explores efficiency of several imputation methods in cross-sectional data,
including parametric and nonparametric, in estimating the effect of covariates in linear models. Simple
and advanced imputation methods, such as multiple imputations were considered. Since our data was
from a cross-sectional study, univariate patterns and behaviors of missingness were used. Two main
scenarios were considered, including a case where the missingness is in the response variable and when
the missingness occurs in the covariate. An approach followed was that, a new data was generated,
missingness was invocated using different types of missingness models depending on the assumed
mechanism, and then imputation was employed to the missing values. Assessment of the accuracy was
done by comparing results with the true estimates, which were obtained from original generated data.
The focus was in the regression model parameters estimates (with their SE) and the variability
introduced in the response values. To evaluate the efficiency of methods and variability of parameters
of interest, simulation studies were done. With the runs obtains, MASE values were calculated for each
method and compared. Parametric methods for imputation were found to be not adequate, especially
when the missing proportion in the response is high. Results from nonparametric methods were good
despite slight over or underestimation of the variability in the data. For the case of missingness in the
covariate, unbiased results were obtained under MCAR and MAR and biased results under MNAR.
However, in this case, single parametric methods seem to perform better than multiple imputation
methods or nonparametric ones. It was observed that missingness mechanism could be influenced by
the magnitude of the effect of covariate in the fitted model or in the missingness model involved. In
other words, one can say that, the strength of the relationship between covariates and the response
variable plays a role in manipulating the missingness mechanism. These results were observed using
simple exploration hence more research is needed to provide more support. |
Notes: | Master in Biostatistics |
URI: | http://hdl.handle.net/1942/3695 |
Category: | T2 |
Type: | Theses and Dissertations |
Appears in Collections: | Applied Statistics: Master theses
|
Files in This Item:
|
Description |
Size | Format |
 | N/A | 4.3 MB | Adobe PDF |
|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
|