Analyzing Dyadic Sequence Data - Comparing Different Statistical Models with Respect to their Applicability and Interpretation
This thesis compares several statistical modeling techniques for interactions between
two persons across time, in which the dependent variable is categorical (*dyadic sequence
data*). This is done by linking each presented model to prototypical research questions that
might arise in psychological research, applying it to example datasets, and then translating
the results back into the context of psychological research. Finally, simulation studies are
conducted so that recommendations on sample size can be made. Moreover, potential biases were detected, and post-hoc simulations were used for investigating when biases occur and how those can be avoided. An R package along with an R script is provided in the appendix so that results can be reproduced.
Two empirical datasets were used for demonstrating how to apply the presented models. One, the couples-cope dataset, stems from the field of relationship research and the
other, the give-some dataset, stems from the field of social dilemmas. The couples-cope
dataset is used as the leading example and contains observational data about romantically
involved couples right after stress was induced to one partner. The data describes whether,
and when, the stressed partner communicates his or her stress, and also whether, and when, the other partner reacts with a certain type of coping response. The following research questions were investigated: A) How to get an overview of dyadic sequence data? B) How long does a certain behavior such as stress communication last, and does it depend on time-independent or time-dependent covariates? C) Is it possible to describe the observed behavior of both partners by a not directly observable process (latent process), such as stress solving, that might explain the observed behavior? D) What is the stability of a behavior? E) Does the behavior of one partner trigger immediate responses of the other, and vice versa? F) And finally, is it possible to identify subgroups that might differ regarding their interaction patterns? The following models and approaches were compared: Data visualization and descriptive statistics for research question (A); time-to-event analysis such as Cox regressions and a shared frailty model for (B); aggregated logit models, multilevel models, and basic Markov models for (D) and (E); OM-clustering and mixture Markov models were compared for (F). A hidden Markov model was used for answering research question \(C\).
The frailty model performed badly in the simulation studies, the rest, however, per-
formed overall well. All models could be applied to datasets with N < 100 under the right
conditions. The multilevel model is recommended for most applications regarding research
question (D) and (E). OM-clustering worked better for detecting latent groups than mix-
ture Markov models when sequences were long and the sample sizes were big (both > 100).
Markov modeling proved to be the most versatile modeling technique, but also, the most
challenging one because estimations might become unstable when models are too complex
or when true values are near their natural boundary. In these cases, careful consideration
must be made regarding model specification and optimization. A summary of all findings
regarding the model applications and regarding insides about the couples-cope data can be
found in the final chapter.
Universität Bielefeld
application/pdf