Two-Part Models and Quantile Regression for the Analysis of Survey Data With a Spike. The Example of Satisfaction With Health Care
Background: Results of patient satisfaction questionnaires can contain a spike at the value corresponding to a complete satisfaction. A possible interpretation is that there are two types of respondents, those who are willing to provide a negative evaluation to one or more items proposed in the questionnaire and those who will always provide a completely positive evaluation irrespective of the item. The aim of the present study is to compare various statistical approaches to the analysis of such data using data from a rehabilitation patient survey of the German Statutory Pension Insurance Scheme as an example.
Method: We used data from 272,806 respondents who participated in the survey from 2008 to 2011. We illustrate four models: linear regression, logistic regression, a two-part model based on the assumption of two underlying populations and quantile regression, which does not require any distributional assumptions. For each model we consider the relationship of the satisfaction score with two covariates.
Results: While providing correct estimates of the mean values (marginal effects), the assumptions of the linear model are violated which can lead to false interpretations. A two-part regression which consists of a logistic regression followed by a linear regression conditional on not being fully satisfied is a useful alternative. For research questions focusing on specific parts of the distribution, logistic regression as well as quantile regression are to be considered.
Discussion: Data with a spike represents a statistical challenge but a range of modeling approaches is available to provide sound interpretations and correct answers to research questions.
7
Frontiers Media
application/pdf
1