# Binomial outcomes in dataset with some clusters of size two: can the dependence of twins be accounted for? A simulation study comparing the reliability of statistical methods based on a dataset of preterm infants

Sauzet O, Peacock JL (2017) *BMC Medical Research Methodology* 17: 110.

Download

bmc.s12874-017-0369-6.sauzet.pdf
1.17 MB

*Journal Article*|

*Original Article*|

*Published*|

*English*

Author

Sauzet, Odile

^{UniBi}; Peacock, Janet L.Department

Abstract

Background
The analysis of perinatal outcomes often involves datasets with some multiple births. These are datasets mostly formed of independent observations and a limited number of clusters of size two (twins) and maybe of size three or more. This non-independence needs to be accounted for in the statistical analysis. Using simulated data based on a dataset of preterm infants we have previously investigated the performance of several approaches to the analysis of continuous outcomes in the presence of some clusters of size two. Mixed models have been developed for binomial outcomes but very little is known about their reliability when only a limited number of small clusters are present.
Methods
Using simulated data based on a dataset of preterm infants we investigated the performance of several approaches to the analysis of binomial outcomes in the presence of some clusters of size two. Logistic models, several methods of estimation for the logistic random intercept models and generalised estimating equations were compared.
Results
The presence of even a small percentage of twins means that a logistic regression model will underestimate all parameters but a logistic random intercept model fails to estimate the correlation between siblings if the percentage of twins is too small and will provide similar estimates to logistic regression. The method which seems to provide the best balance between estimation of the standard error and the parameter for any percentage of twins is the generalised estimating equations.
Conclusions
This study has shown that the number of covariates or the level two variance do not necessarily affect the performance of the various methods used to analyse datasets containing twins but when the percentage of small clusters is too small, mixed models cannot capture the dependence between siblings.

Publishing Year

ISSN

Financial disclosure

Article Processing Charge funded by the Deutsche Forschungsgemeinschaft and the Open Access Publication Fund of Bielefeld University.

PUB-ID

### Cite this

Sauzet O, Peacock JL. Binomial outcomes in dataset with some clusters of size two: can the dependence of twins be accounted for? A simulation study comparing the reliability of statistical methods based on a dataset of preterm infants.

*BMC Medical Research Methodology*. 2017;17: 110.Sauzet, O., & Peacock, J. L. (2017). Binomial outcomes in dataset with some clusters of size two: can the dependence of twins be accounted for? A simulation study comparing the reliability of statistical methods based on a dataset of preterm infants.

*BMC Medical Research Methodology*,*17*, 110. doi:10.1186/s12874-017-0369-6Sauzet, O., and Peacock, J. L. (2017). Binomial outcomes in dataset with some clusters of size two: can the dependence of twins be accounted for? A simulation study comparing the reliability of statistical methods based on a dataset of preterm infants.

*BMC Medical Research Methodology*17:110.Sauzet, O., & Peacock, J.L., 2017. Binomial outcomes in dataset with some clusters of size two: can the dependence of twins be accounted for? A simulation study comparing the reliability of statistical methods based on a dataset of preterm infants.

*BMC Medical Research Methodology*, 17: 110. O. Sauzet and J.L. Peacock, “Binomial outcomes in dataset with some clusters of size two: can the dependence of twins be accounted for? A simulation study comparing the reliability of statistical methods based on a dataset of preterm infants”,

*BMC Medical Research Methodology*, vol. 17, 2017, : 110. Sauzet, O., Peacock, J.L.: Binomial outcomes in dataset with some clusters of size two: can the dependence of twins be accounted for? A simulation study comparing the reliability of statistical methods based on a dataset of preterm infants. BMC Medical Research Methodology. 17, : 110 (2017).

Sauzet, Odile, and Peacock, Janet L. “Binomial outcomes in dataset with some clusters of size two: can the dependence of twins be accounted for? A simulation study comparing the reliability of statistical methods based on a dataset of preterm infants”.

*BMC Medical Research Methodology*17 (2017): 110.**All files available under the following license(s):**

**Copyright Statement:**

**This Item is protected by copyright and/or related rights.**[...]

**Main File(s)**

File Name

bmc.s12874-017-0369-6.sauzet.pdf
1.17 MB

Access Level

Open Access

Last Uploaded

2017-09-19T07:20:03Z

This data publication is cited in the following publications:

This publication cites the following data publications:

### 1 Citation in Europe PMC

Data provided by Europe PubMed Central.

Systematic review and simulation study of ignoring clustered data in surgical trials.

Dell-Kuster S, Droeser RA, Schäfer J, Gloy V, Ewald H, Schandelmaier S, Hemkens LG, Bucher HC, Young J, Rosenthal R.,

PMID: 29405280

Dell-Kuster S, Droeser RA, Schäfer J, Gloy V, Ewald H, Schandelmaier S, Hemkens LG, Bucher HC, Young J, Rosenthal R.,

*Br J Surg*105(3), 2018PMID: 29405280

### 23 References

Data provided by Europe PubMed Central.

The statistical analysis of data from small groups.

Kenny DA, Mannetti L, Pierro A, Livi S, Kashy DA.,

PMID: 12088122

Kenny DA, Mannetti L, Pierro A, Livi S, Kashy DA.,

*J Pers Soc Psychol*83(1), 2002PMID: 12088122

A comparison between traditional methods and multilevel regression for the analysis of multicenter intervention studies.

Moerbeek M, van Breukelen GJ, Berger MP.,

PMID: 12767411

Moerbeek M, van Breukelen GJ, Berger MP.,

*J Clin Epidemiol*56(4), 2003PMID: 12767411

Comparison of methods for analysing cluster randomized trials: an example involving a factorial design.

Peters TJ, Richards SH, Bankhead CR, Ades AE, Sterne JA.,

PMID: 14559762

Peters TJ, Richards SH, Bankhead CR, Ades AE, Sterne JA.,

*Int J Epidemiol*32(5), 2003PMID: 14559762

High-frequency oscillatory ventilation for the prevention of chronic lung disease of prematurity.

Johnson AH, Peacock JL, Greenough A, Marlow N, Limb ES, Marston L, Calvert SA; United Kingdom Oscillation Study Group.,

PMID: 12200550

Johnson AH, Peacock JL, Greenough A, Marlow N, Limb ES, Marston L, Calvert SA; United Kingdom Oscillation Study Group.,

*N. Engl. J. Med.*347(9), 2002PMID: 12200550

Late outcomes of a randomized trial of high-frequency oscillation in neonates.

Zivanovic S, Peacock J, Alcazar-Paris M, Lo JW, Lunt A, Marlow N, Calvert S, Greenough A, Zivanovic S, Peacock J, Alcazar-Paris M, Lo JW, Marlow N, Calvert S, Greenough A, Halliday H, Henderson J, Cunningham S, Vyas H, Kerry S, Dromgoole J, Coker B, Oedra R, Thomas F, D'eath T, Nguyen J, Lovestone J.,

PMID: 24645944

Zivanovic S, Peacock J, Alcazar-Paris M, Lo JW, Lunt A, Marlow N, Calvert S, Greenough A, Zivanovic S, Peacock J, Alcazar-Paris M, Lo JW, Marlow N, Calvert S, Greenough A, Halliday H, Henderson J, Cunningham S, Vyas H, Kerry S, Dromgoole J, Coker B, Oedra R, Thomas F, D'eath T, Nguyen J, Lovestone J.,

*N. Engl. J. Med.*370(12), 2014PMID: 24645944

A survey of methods for analyzing clustered binary response data

Pendergast JF, Gange SJ, Newton MA, Lindstrom MJ, Palta M, Fisher MR., 1996

Pendergast JF, Gange SJ, Newton MA, Lindstrom MJ, Palta M, Fisher MR., 1996

Modelling the hierarchical structure in datasets with very small clusters: a simulation study to explore the effect of the proportion of clusters when the outcome is continuous.

Sauzet O, Wright KC, Marston L, Brocklehurst P, Peacock JL.,

PMID: 23027676

Sauzet O, Wright KC, Marston L, Brocklehurst P, Peacock JL.,

*Stat Med*32(8), 2013PMID: 23027676

How should randomised trials including multiple pregnancies be analysed?

Gates S, Brocklehurst P.,

PMID: 14961881

Gates S, Brocklehurst P.,

*BJOG*111(3), 2004PMID: 14961881

Regression models for twin studies: a critical review.

Carlin JB, Gurrin LC, Sterne JA, Morley R, Dwyer T.,

PMID: 16087687

Carlin JB, Gurrin LC, Sterne JA, Morley R, Dwyer T.,

*Int J Epidemiol*34(5), 2005PMID: 16087687

Analysis of repeated pregnancy outcomes.

Louis GB, Dukic V, Heagerty PJ, Louis TA, Lynch CD, Ryan LM, Schisterman EF, Trumble A; Pregnancy Modeling Working Group.,

PMID: 16615652

Louis GB, Dukic V, Heagerty PJ, Louis TA, Lynch CD, Ryan LM, Schisterman EF, Trumble A; Pregnancy Modeling Working Group.,

*Stat Methods Med Res*15(2), 2006PMID: 16615652

Analysis of neonatal clinical trials with twin births.

Shaffer ML, Kunselman AR, Watterberg KL.,

PMID: 19245713

Shaffer ML, Kunselman AR, Watterberg KL.,

*BMC Med Res Methodol*9(), 2009PMID: 19245713

Regression models for clustered binary responses: implications of ignoring the intracluster correlation in an analysis of perinatal mortality in twin gestations.

Ananth CV, Platt RW, Savitz DA.,

PMID: 15780777

Ananth CV, Platt RW, Savitz DA.,

*Ann Epidemiol*15(4), 2005PMID: 15780777

Analyzing binary outcome data with small clusters: A simulation study

Xu Y, Lee CF, Cheung YB., 2014

Xu Y, Lee CF, Cheung YB., 2014

Modeling sparsely clustered data: design-based, model-based, and single-level methods.

McNeish DM.,

PMID: 25110903

McNeish DM.,

*Psychol Methods*19(4), 2014PMID: 25110903

The continuing value of the Apgar score for the assessment of newborn infants.

Casey BM, McIntire DD, Leveno KJ.,

PMID: 11172187

Casey BM, McIntire DD, Leveno KJ.,

*N. Engl. J. Med.*344(7), 2001PMID: 11172187

Fitting Linear Mixed-Effects Models Using lme4

Bates D, Mächler M, Bolker B., 2015

Bates D, Mächler M, Bolker B., 2015

Diggle PJ, Heagerty P, Liang K-Y, Zeger SL., 2002

The design of simulation studies in medical statistics.

Burton A, Altman DG, Royston P, Holder RL.,

PMID: 16947139

Burton A, Altman DG, Royston P, Holder RL.,

*Stat Med*25(24), 2006PMID: 16947139

Dichotomising continuous data while retaining statistical power using a distributional approach.

Peacock JL, Sauzet O, Ewings SM, Kerry SM.,

PMID: 22865598

Peacock JL, Sauzet O, Ewings SM, Kerry SM.,

*Stat Med*31(26), 2012PMID: 22865598

Estimating dichotomised outcomes in two groups with unequal variances: a distributional approach.

Sauzet O, Peacock JL.,

PMID: 24989698

Sauzet O, Peacock JL.,

*Stat Med*33(26), 2014PMID: 24989698

Gauss-hermite quadrature approximation for estimation in generalised linear mixed models

Pan J, Thompson R., 2003

Pan J, Thompson R., 2003

Convergence of a stochastic approximation version of the EM algorithm

Delyon B, Lavielle M, Moulines E., 1999

Delyon B, Lavielle M, Moulines E., 1999

### Export

0 Marked Publications### Web of Science

View record in Web of Science®### Sources

PMID: 28728549

PubMed | Europe PMC