Moments that Matter? On the Complexity of using Triggers Based on Skin Conductance to Sample Arousing Events within an Experience Sampling Framework

To sample situations that are psychologically arousing in daily life, we implemented an experience sampling strategy in which 82 Dutch young adults (Mage = 20.73) were triggered based on random time intervals and based on physiological skin conductance scores across a period of 5 days. When triggered, participants had to fill in short surveys on affect, situational characteristics and event characteristics on their smartphone. We found theoretically expected relationships between the skin conductance signal on the one hand and self–reported arousal and positive energy (e.g. energetic and enthusiastic) on the other hand, although effect sizes were small. Unexpectedly, none of the negative affective scales (i.e. irritation, anxiety, and negative valence) were predicted by skin conductance levels. Despite the (partial) validity of the signal, a simple algorithm that triggered the survey based on relative increases of skin conductance levels produced counterintuitive results due to a dependence between level and slope. Additional exploratory analyses highlighted other skin conductance signal characteristics (i.e. autocorrelation, number of peaks, and change points) that might be worth examining when designing future algorithms to sample arousing moments. Overall, our experiences highlight not only the promise but also the complexity of real–time measurement of physiological processes in daily life. © 2020 The Authors. European Journal of Personality published by John Wiley & Sons Ltd on behalf of European Association of Personality Psychology

Our daily lives are filled with routines. We regularly spend our time doing familiar activities within highly repetitive and similar contexts (Wood & Neal, 2007;Wood, Quinn, & Kashy, 2002). Although the systematic patterns of our routines are informative and predictive of several psychological well-being outcomes (Csikszentmihalyi & Hunter, 2003;Quinn, Pascoe, Wood, & Neal, 2010), every now and then something exciting occurs that awakes our spirit, be it positive or negative. For instance, an argument with our partner, an important job interview, or watching the final episode of our favourite show might rattle our routines. These events tap into key goals and motivations of humans, either because they allow important rewards to be obtained or important resources to be threatened (Carver & White, 1994;Frijda, 2004;Kuppens, 2008;Sideridis, 2008). The arousal associated with these moments in turn fuels some of the most consequential human responses, such as romantic behaviour, exploration, aggression, and flight. Identifying these arousing situations as well as people's response to them therefore has relevance for educational, clinical, social, and personality psychology. For example, in clinical psychology, identifying what situation triggers anxiety can facilitate the training of coping skills to better cope with the associated demands. Moreover, personality psychologists might investigate individual differences in the types of situations that trigger arousing responses, for example, individual differences in affective responses to situations of affiliation (cf., Dufner, Arslan, Hagemeyer, Schönbrodt, & Denissen, 2015). The current study explores a relatively new method of capturing these arousing moments in daily life and summarizes first findings on its feasibility, validity, and usefulness.
Psychological arousal is understood as a dimension denoting the level of a person's activation or alertness ranging from states of low arousal-described by words such as relaxed, calm, depressed, or bored-to states of high arousal-described by words such as fear, anger, excitement, or pure joy (Russell, 2003). Because our daily lives are mostly composed of routine activities (Wood et al., 2002;Wood & Neal, 2007), moments in which participants are psychologically aroused only constitute a small part of our daily lives and are therefore relatively rare. This makes it difficult to filter these moments of psychological arousal from a participant's day-to-day routine. Experience sampling methodology (ESM; Csikszentmihalyi & Larson, 1987;Larson & Csikszentmihalyi, 1983) might be a useful approach to study arousing moments. Studies might for example ask participants to selectively report on arousing events every time they occur, referred to as event-contingent sampling (Dimotakis, Ilies, & Judge, 2013;Reis & Gable, 2000;Scollon, Prieto, & Diener, 2009). However, participants are usually busy during these moments, and they might not always consciously experience that they are aroused or forget soon thereafter, resulting in missing data. Alternatively, using fixed or random time intervals in experience sampling designs might work relatively well for sampling routine activities (Neal, Wood, & Quinn, 2006) but might sample only few arousing moments. Some earlier approaches have asked participants end-of-day questions to describe a conflict or a stressful episode that took place during the day (e.g. Nezlek & Plesko, 2003), but this approach suffers from retrospective bias. A more objective assessment method is therefore needed.
As an alternative, to more selectively sample certain contextual information, studies across different psychological fields have started to use different types of sensors, such as GPS, accelerometer, and heart rate sensors (for applications, see Ebner-Priemer, Koudela, Mutz, & Kanning, 2013;Myrtek et al., 1988;Pejovic, Lathia, Mascolo, & Musolesi, 2016). Skin conductance sensors appear particularly suited for sampling moments of psychological arousal more objectively. By applying continuous currents of electricity on the skin, researchers are able to study the electric resistance or skin conductance of the skin, which varies in response to sweat secretion (Boucsein, 2012;Boucsein et al., 2012). Increases in sweat secretion are a common physical marker for autonomic activation and have been related to arousal, attention, and emotional responses (e.g. Boucsein, 2012;Boucsein et al., 2012;Kreibig, 2010). Kreibig (2010) reviewed 134 published studies to provide more insight in the nature of the relationship between autonomic response patterns, such as skin conductance, and emotions. Specifically, some typical low arousal emotions such as (non-crying) sadness, contentment, and relief were related to a decreased skin conductance level (SCL), whereas almost all other typical high arousal emotions, such as anger, anxiety, happiness, amusement, and joy, were accompanied by increases in SCL (Kreibig, 2010). The Kreibig (2010) review concluded that relationships between specific emotions and skin conductance overall followed a general dimension denoting a person's level of activation or tendency for action (see Russell, 2003). Consequently, measures of skin conductance could be used to objectively sample moments of psychological arousal from individuals' daily life.
Nevertheless, it is unclear how the lab findings from studies in the Kreibig (2010) review translate to measurements of emotions as reported in everyday life. Autonomic response patterns in the 134 studies included by Kreibig (2010) mostly followed experimental tasks such as looking at pictures, watching movies, by reading text, or by recalling certain events. In the lab, researchers can standardize emotional stimuli. In most cases, researchers have exact information of when the stimulus starts and can therefore easily track the skin conductance response during and after the stimulus takes place. This is obviously different in daily life, in which there is no control over a participant's environment.
Although some studies have recorded skin conductance in daily life over longer periods of time (e.g. Doberenz, Roth, Wollburg, Breuninger, & Kim, 2010;Hoehn-Saric, McLeod, Funderburk, & Kowalski, 2004), we are aware of only one study that specifically attempted using skin conductance sensor data in daily life to sample arousing moments (i.e. Westerink et al., 2009). In a sample of 31 participants that were measured across a period of eight hours, Westerink et al. (2009) compared momentary self-reports of arousal between fixed time interval triggers (i.e. every two hours) with triggers based on continuous scores of skin conductance. To trigger questionnaires based on skin conductance, Westerink et al. (2009) took a dynamic approach. For any given moment, the algorithm generated a distribution of the last half hour of SCL scores. If the live SCL score would surpass the 95 percentile of this distribution, a questionnaire was sent to the participant's phone. This way the algorithm ensured the cut-off value was person specific and adaptive across time. Westerink et al. (2009) found that self-reported arousal was higher during triggers based on skin conductance compared with fixed time interval triggers. Although the results of Westerink et al. (2009) are promising, there are some limitations to their design. Participants were aware that they would receive triggers based on their physiological skin conductance responses. This could have created demand characteristics towards high self-reports of arousal. In addition, Westerink et al. (2009) only reported about self-reported arousal in their study, but there is a range of other variables that could be relevant in relation to these triggers. It is for instance questionable if such a triggering approach works equally well for positive and negative affective scales. Finally, in the study of Westerink et al. (2009), participants wore skin conductance wristbands with dry electrodes located at underside of the wrist. As the device is almost identical to a common watch, this makes it convenient to wear for longer periods of time. Nonetheless, the inside of the wrist is not a preferred location for measuring skin conductance (Van Dooren & Janssen, 2012). Additionally, using dry electrodes is generally not recommended because of the slow humidity buildup under the metal plate, which can result in extended periods of unstable electrodes Fowles et al., 1981). Doberenz, Roth, Wollburg, Maslowski, and Kim (2011) have shown that it is also feasible to use skin conductance measures with gel-based electrodes for longer periods of time (i.e. 24 hours). Westerink et al. (2009) acknowledged that the quality of the data signal could influence the appropriateness of their signal. Nonetheless, despite these limitations, the study by Westerink et al. (2009) provides at least some support for the potential of this methodology.
implements an ESM sampling strategy in which participants are triggered based on their continuous physiological SCL scores. First, following the recommendations of Boucsein et al. (2012), to ensure the quality of signal, we used a skin conductance device that relies on gel-based electrodes on the palmar site. Second, by collecting continuous scores of skin conductance alongside a variety of momentary self-reports in the daily life of participants, we could validate previous lab findings on these relationships (Kreibig, 2010). Based on the results of these studies, we expected to find a relationship between skin conductance and self-reports of psychological arousal. Finally, following previous studies that have used contextual sensors to selectively sample specific moments in the daily life of participants (e.g. Ebner-Priemer et al., 2013;Myrtek et al., 1988;Pejovic et al., 2016), the present study explored if the skin conductance signal can be used to specifically sample moments that are psychologically arousing.

Sample
For this study, we aimed to have a sample of at least 60 participants. The study of Westerink et al. (2009) found a mean self-reported arousal difference between skin conductance triggers and random triggers of d = 1.10. With a sample of 60 participants, the power to detect such an effect is greater than 0.99. In total, 84 participants signed up for our study. As two participants did not show up for the ESM part of the study, the final sample consisted of 82 young adults, aged between 18 and 32 years old (M = 20.73, SD = 2.65). In this sample, 52 were female (63%), and for 67 participants (82%), both parents were born in the Netherlands. All participants in the current study were first-year psychology students at Tilburg University and were recruited through the university's online participant system.

Procedure
Two weeks before the start of our study, we conducted a pilot study to test and optimize our study design. In the pilot study, we also collected surveys and continuous skin conductance data among a sample of 10 participants, M age = 19.67, SDage = 2.55, 90% female, and 90% both Dutch parents. On the first day, we tested whether the skin conductance set-up would work as intended for the entire day. For many participants on the first day, the electrodes fell off during the day. After adjustments on the second day, the electrodes were holding. The pilot data are not part of the current study.
After the pilot, we started the data collection of the present study, which took place across 10 weeks between March and June 2019. The data collection consisted of two parts: a baseline questionnaire followed by five consecutive days of momentary assessments. The online baseline questionnaire included demographic questions and several psychological questionnaires to measure personality, emotional clarity, depression, and so forth. For the daily assessments, participants completed several state questionnaires on their smartphones while wearing a skin conductance wristband. If the phone was compatible with the experience sampling mobile app, participants used their own smartphone. In all other cases, a phone was provided by the experimenters (MovisenXS is not compatible with iPhone or androids with a MediaTek chipset). The momentary assessments always took place between Monday and Friday (10:00 a.m.-5:00 p.m.), and measurement cycles roughly consisted of testing between 6 and 12 participants each week. We trained 13 student assistants, which were divided in groups of two or three students and were each responsible for 1 day of the week.
For the present study, participants were required to visit the lab twice a day: once in the morning between 9 and 10 a.m. and once in the afternoon between 5 and 6 p.m. Between 9 and 10 a.m., a research assistant would apply the skin conductance device and couple the ESM app to the participant's smartphone. Hereafter, participants could go about their days as usual while filling in ESM surveys when notified. On the first morning of data collection, we briefed participants to fill in the ESM questionnaires directly after being notified. We specifically told participants to report on their mood and whereabouts focused on the moment leading up to the notification. Participants were notified by means of the smartphone's alarm that lasted maximally 10 seconds, after which participants had a maximum of 20 minutes to fill in the questionnaire before becoming unavailable. Each day at 4:55 p.m., participants would also fill in a small questionnaire that provided insight about the most important event of the day as well as about their experience of wearing the skin conductance device. The day would end between 5 and 6 p.m. when participants came back to return the devices, and their data were saved.
To measure skin conductance, there are many options available, such as the ring-mounted Moodmetric (Vigofere Oy) or the wrist-worn E4 (Empatica) (Cowley et al., 2016). We chose to use the EdaMove 3 (movisens GmbH, see www.movisens. com), which is a wrist-worn skin conductance sensor that contains wired gelled electrodes. We preferred to use gelled electrodes as these came recommended over using dry electrodes in the review by Boucsein et al. (2012). In addition, movisens already had the option of sending triggers based on sensor data preprogrammed in their ESM environment, which made it convenient for us to use the EdaMove 3 compared with other sensors. The EdaMove 3 sensor has a sampling rate of 32 Hz and applies a constant voltage of 0.5 V on the sensor's silver-chloride electrodes. The two electrodes are filled with electrode gel before they are attached to the skin of the palmar surface using adhesive tape rings. To ensure that the device would hold for the period of the study, we applied kinesiology tape on the electrodes, and participants wore special gloves that were made of a breathable fabric (i.e. bamboo viscose), had good stretch, and fitted tightly. The research assistant applied the device to the participant's non-dominant hand. Please see Figure 1 for an image of this set-up.
We programmed two different triggers for this study: random triggers and triggers based on skin conductance. For the triggers, we had the following general restrictions: questionnaires would only be sent when participants wore the skin conductance device-i.e. between 10 a.m. and 5 p.m. Triggers were sent at least 15 minutes apart, and participants would receive a maximum of five random triggered questionnaires. For triggers based on skin conductance, real-time skin conductance scores were averaged across one minute (i.e. SCL) and compared with SCL scores of the minute before that. If the relative increase surpassed a threshold that was set at the start of the day (see below) a trigger was sent to the smartphone. To control for influences of physical activity, we suppressed triggers when SCL increases accompanied increases in step count, which is an established procedure (Myrtek, Aschenbrenner, & Brügner, 2005).
The triggering algorithm was provided to us by movisens. As we were limited in adjusting the algorithm, our algorithm deviated from the personalized algorithm of Westerink et al. (2009). To determine the threshold for sending a trigger, we therefore had to account for personal differences in SCL, because some people are more reactive or have higher baselines skin conductance scores (Boucsein, 2012). Using the same threshold for the entire sample might mean different things for each participant. We took the following steps to realize a more personalized approach: 1 Day 1. For the first day, we did not have any information on the person's skin conductance scores. We therefore used prior information of our pilot study to formulate an average threshold that would be the same for all participants on the first day. For each day of skin conductance data that were available for the participants of our pilot study, we created a distribution of the relative changes in skin conductance, excluding the moments in which also an increase in step count was observed. To determine the threshold for what would be a substantial increase for that particular day, we used the value that corresponded to two standard deviations above the mean in this distribution. This threshold represented the average of all the cut-off scores across measures in the pilot and equalled an increase of 19.3% relative to the preceding minute. We used this threshold on the first weekday for all participants in the study. 2 Day 2-Day 5. For the other days, we used the same principle to define the cut-offs but then for each person individually. Again, we defined two standard deviations above the mean in the distribution of relative changes as substantial. The cut-off value was then averaged with the cut-offs that were set on the previous days for the participant, giving each day an equal weight in the equation. As with each day, we obtained more personalized information, we let the weight of the general sample cut-off that was used for Day 1 (i.e. 19.3%) decrease with each day of data collection. The weights for this value were as follows: 1 on Day 2, 0.75 on Day 3, 0.5 on Day 4, and 0.25 on Day 5.
Our study was approved by the Ethics Review Board (EC-2018.83) of the School of Social and Behavioral Sciences of Tilburg University, and all participants provided active consent for participation. Participants received 10 participation hours when they completed the baseline questionnaire and at least 75% of the ESM surveys. The participation hours were part of a first-year academic course in which students were required to participate in different studies for a total of 20 hours. Additionally, we provided personalized feedback reports to the participants when they indicated they were interested. These reports included several descriptive statistics on personality, emotions, and situations.

Self-reported arousal
To operationalize self-reported momentary arousal, we used two measures that were administered in the experience sampling part of the study. First, we used the item "I feel active" as a direct indication of psychological arousal without referencing to valence. Second, we relied on 19 affective adjectives, which were selected to cover both dimensions of the circumplex model of affect (e.g. Russell, 2003), that is, valence and arousal. All items were administered on a visual analogue scale (VAS) ranging from 0 to 100. Because this specific set of items had not been used previously, we first explored the underlying factor structure using multilevel exploratory factor analyses to reduce the data and possibly extract an arousal scale from scores of self-reported affect. The analyses and items are reported in the Results section.

Situational characteristics
To measure situational characteristics, we used 14 items that in pairs represented the seven CAPTIONs dimensions. The CAPTIONs represents a situational taxonomy that was Using SCL to sample arousing events 797 developed by Parrigon, Woo, Tay, and Wang (2017) using a lexical approach. They used an extensive sample of adjectives and reduced them to seven dimensions representing different psychological situation characteristics: Complexity, Adversity, Positivity, Typicality, Importance, humOr, and Negative valence. The items were phrased as "the current situation is …" and were administered on a VAS (0-100). The specific situational adjectives of the CAPTIONs as well as the inter-item correlations on both the between-person level and within-person level are reported in Table 1.

Event characteristics
We used three VAS (0-100) items to ask participants about the event that took place. Participants were asked to rate whether the event that took place on the moment of the trigger was expected, important, and pleasant.

Analytical strategy
We used multilevel exploratory factor analyses for an indication of how our affect items clustered together and to determine whether it was justified to formulate different scale scores from the 19 items that were included in our questionnaire. For our factor structure, we focused only on the within-person variance-covariance matrix. The between-person variance covariance matrix was estimated without any restrictions. Additionally, to avoid that our factor structure was influenced by demand characteristics, we conducted the factor analyses on the input from the random triggers only. After determining the best model fit using absolute fit indices (i.e. comparative fit index, Tucker-Lewis index, root mean square error of approximation, and standardized root mean square residual) and model tests based on the deviance statistics, we computed the scale scores by aggregating the items that were indicative for each factor in the analysis. For each scale, we selected the items that had a loading of at least 0.40 and did not cross-load more than 0.10 on any of the other factors. We estimated these models using Mplus version 6 (Muthén & Muthén, 2010).
To examine how skin conductance relates to affective and situational self-reports in everyday life, we examined how SCL related to our affect dimensions, situational characteristics, and event characteristics, accounting for variance between individuals. We used the lme4 package version 1.1-21 (Bates, Mächler, Bolker, & Walker, 2015) in R version 3.6.0 (R Development Core Team, 2016) to estimate multiple separate multilevel models with skin conductance as independent variable. Both the intercept of the model and the slope of skin conductance with the outcome variable (i.e. affect, situation, and event) were allowed to vary across individuals. We centred our predictor variables in these models within each person before running the analyses.
In the final part of our analyses, we directly examined differences between the triggering conditions-that is, random triggers versus triggers based on skin conductancein self-reported affect, situational characteristics, and event characteristics. The intercepts of the models and slopes of the triggering condition on the self-reported outcome variables (i.e. affect, situation, and event) were estimated freely across individuals. The predictor variable (i.e. type of trigger) was not centered. Instead, we upheld its dichotomous coding to facilitate a condition-relative interpretation. We pre-registered these final analyses before collecting the data 1 .
For all our multilevel regressions, explained variances were estimated using the guidelines of Nakagawa and Schielzeth (2013), who provided an R function for calculating both the explained variance of a model's fixed effects, as well as the explained variance of the random effects. To obtain this statistic, the function divides the variance of the fixed effects by the total variance.

Exploratory analyses
Guided by suggestions that were made during the review process, we also ran additional exploratory analyses. These analyses were conducted after having obtained the outcomes Note: Situational items were administered as "The current situation is …." N = 82, surveys = 3868.
1 For our pre-registration, see https://osf.io/prs7j/?view_only= a6b1bfebe7fb49f1970b8372fc334bbb. Some things were not included in the preregistration but added or changed during the study. First, initially we assumed that we could use only the first day of data collection to estimate the personalized thresholds. We reconsidered this strategy after having collected and examined the pilot data. Additionally, we pre-registered the multilevel analyses in which we examined the differences between both triggers (skin conductance triggers and random trigger) on our outcome variables. The analyses with the raw SCL data were not included before collecting the data but nonetheless planned before analysing the data. Finally, the exploratory analyses were added during the review process. Our data and scripts are openly available. See https://osf.io/v4qh9/?view_only= 054234b95e454a7db553e38d0490f686. of the planned analyses presented above and were not preregistered. For these analyses, we set out to examine the skin conductance features of arousing moments more in detail. To do so, we extracted people's most and least arousing moments, to then further compare and characterize the skin conductance signatures of each category.
To obtain a large enough sample for 'high' and 'low' arousing moments, we selected the two most and two least arousing and emotional moments of each individual. For this purpose, we used both self-reports of arousal as well as each of our four affective scales: positive energy, irritation, anxiety, and negative valence. This resulted in a total of five categorizations. Ties for the top two highest or lowest scores were handled randomly. With 82 participants for each split off, this resulted in a sample of 164 high arousing moments and 164 low arousing moments. After making these selections, we extracted the skin conductance signal of each of these moments across three different timeframes: 1 minute, 10 minutes, and 30 minutes before the trigger.
We used multilevel regression analyses to compare both categories on five different outcomes measures related to the skin conductance signal: (i) mean, (ii) standard deviation, (iii) autocorrelation, (iv) number of peaks, and (v) number of change points. Going beyond simply extracting mean levels of the signal, these parameters together better capture the complex and dynamic characteristics of physiological time series data. For the autocorrelation measure, to avoid ceiling effects (e.g. autoregressive correlations approaching 1), we used relatively broad lags of four seconds to calculate the autocorrelation. To estimate the number of peaks in the signal, we first centred the data within timeframes of four seconds. This way the tonic signal was largely removed from the data. Next, to smoothen the signal and remove background noise, we used a 500-ms frame Savitzky-Golay filter. This signal was used to detect the peaks using the findpeak function in the pracma package in R. We specified a peak when a consistent increase of 0.5 seconds would follow a consistent decrease of 0.5 seconds. Our method of identifying peaks is comparable with standard methods of extracting frequencies of non-specific skin conductance responses, although in contrast these methods include absolute cut-offs values, such as minimum amplitudes of 0.05 micro Siemens. Finally, we calculated the number of change points using the change point package in R. A change point is defined by a substantial change in the distributional properties of the skin conductance data in terms of mean and variance. We used the cpt. meanvar function with a penalty value of 10 000 (Killick & Eckley, 2014).

RESULTS
In total, 3868 questionnaires were sent, which on average is 47.17 questionnaires per participant (SD = 8.62). Of these questionnaires, 1516 (39%) were randomly triggered and 2356 questionnaires (61%) were triggered by our skin conductance algorithm. The response rate for this study was 89%, and overall, 3432 questionnaires were completed. On average, it took participants 77 seconds to start the form after being notified and two minutes and 27 seconds to finish the entire survey of 40 items. Overall, participants did not feel bothered by wearing the skin conductance devices. When we asked participants about their experiences wearing the skin conductance device, on a VAS (1-100), they reported an average of M = 30.42 (SD = 19.24) on whether they were annoyed by the device. We found an average of M = 29.34 (SD = 23.28) on whether the device had impeded them in their daily routines and an average of M = 19.32 (SD = 18.15) on whether they thought the device influenced their behaviour. The means of all three items seem to suggest participants experienced the EdaMove as relatively unobtrusive. We visually inspected the physiological skin conductance data each day and made a note when we suspected that electrodes might have shifted or became loose during the day. On 17 (4.21%) occasions, we listed the data to contain probable artefacts from loose electrodes. Most of the time this occurred during the final hours of the day. As this was only a small proportion of the total data, we kept these data in the sample. Nevertheless, we conducted the analyses of this manuscript with and without these cases included. Overall, removing the data with the potential artefacts resulted in almost identical results.

Affective structure
We used multilevel exploratory factor analyses to explore the underlying within-person structure of the 19 affect items. The models were estimated on the data that were collected with random triggers, using an unrestricted between-person variance covariance matrix. Fit indices of acceptable models ranged from a one-factor solution to a five-factor solution, which are depicted in Table 2. Based on these indices, a four-factor solution fitted the data best. Both the root mean square error of approximation and standardized root mean square residual for this solution were below 0.05 (i.e. both equalled 0.03), and the Tucker-Lewis index and comparative fit index were 0.96 and 0.85, respectively. Table 3 shows the four-factor solution including the factor loadings of the items on each of the four factors. By selecting the items with a primary loading of at least 0.40 and without any substantial cross-loadings (i.e. secondary loadings higher than 0.40 or a difference with the primary loading of less than 0.10), the first factor was marked by five items: energetic, enthusiastic, cheerful, happy, and lifeless-the latter having a negative loading. We labelled this factor (1) positive energy. The second factor had four indicators: irritated, agitated, calm, and relaxed, which we labelled as (2) irritation. The third factor was marked by the items nervous, worried, fearful, stressed, and insecure and was labelled (3) anxiety. The final factor was marked by the items angry, gloomy, and sad and was labelled as (4) negative valence.
After reverse coding the negative items and constructing the scale scores of the four constructs by aggregating the scores of the indicators of each scale, we used McDonald's omega total to examine the reliability of the scale including the data that were administered by random triggers and skin conductance triggers. To estimate the between-person and within-person omega coefficients, for each scale, we fitted a multilevel confirmatory factor analysis in Mplus (see Geldhof, Preacher, & Zyphur, 2014). The person omega's for the scales were positive energy (Ω b = 0.84 and Ω w = 0.74), irritation (Ω b = 0.82 and Ω w = 0.77), anxiety (Ω b = 0.95 and Ω w = 0.83), and negative valence (Ω b = 0.94 and Ω w = 0.81).

Descriptive analyses
We calculated the intercorrelations of all continuous variables that were included in the present study. Both the within-person and between-person correlations are depicted in Table 4 together with the means and standard deviations of the variables. Although correlations at the within-person level were somewhat weaker than the correlations at the between-person level, the direction of the relationships was consistent across both levels. All four affective scales were correlated with another, with positive energy having negative relationships with all three other affective scales. Concerning the situational characteristics, there were relationships between importance, complexity, and, to a lesser extent, adversity on both levels. Additionally, at both levels, adversity was  related to negative valence. Medium correlations were found between typicality of the situation and the expectedness of the event that took place. Surprisingly, humour was mainly related to negatively valenced items. Instead of referring to funny situations, this construct could refer more to weird or odd situations. Despite this surprising relation, within this nomological network, all other variables mostly behaved as expected.
Skin conductance signal and triggering algorithm

Skin conductance level
The average SCL values across participants of the minute preceding the trigger was M = 6.89, SD = 3.42 expressed in micro Siemens (mS). As can be expected, this is considerably higher than the averages (between 3 and 4 mS) reported by Westerink et al. (2009) who used dry electrodes. Table 5 provides insight into how these SCL values relate to our other study variables in daily life. Relationships were mostly in the expected direction. Higher SCL values in the minute preceding the trigger were related to moments in which participant reported being more active. Also, higher levels of positive energy were reported during moments of higher SCL. We did not find a relationship between SCL and irritation, anxiety, or negative valence, however. Finally, looking at the situations and events, situations of high SCL were rated as more complex and generally occurred during events that were rated as less expected. The effect sizes of the significant fixed relationships as reported in terms of explained variance (R 2 ) ranged between 0.005 and 0.008. This means that SCL explained between 0.5% and 0.8% of the variance of the outcome variable. We found no relationships for any of the other situation and event variables. Nonetheless, for all affect variables and most of the other situational and event characteristic variables, there was a significant variance around the slope, indicating that the effects differed among individuals.

Triggering algorithm
To have an indication of stability of the personalized threshold that we used, we estimated the intraclass correlations across days. The intraclass correlation was 0.67, which indicated that the personalized thresholds were relatively consistent across days.
To examine whether our algorithm successfully selected moments of relative increase, we extracted skin conductance Figure 2. Aggregated skin conductance scores of the 10 minutes preceding the trigger and the 10 minutes following the trigger split by random triggers and triggers based on skin conductance. The vertical line represents the trigger. The two grey surfaces represent the two minutes that were used by the skin conductance algorithm. We standardized the skin conductance scores within-person, within the day.
Using SCL to sample arousing events 801 values of the 668 seconds preceding a trigger (i.e. 10 minutes + 68 seconds) and 600 seconds following a trigger (i.e. 10 minutes). Figure 2 shows the skin conductance values leading up to either a random trigger or a trigger based on skin conductance. The two minutes in which the algorithm determines an increase took place between 480 and 600 seconds, denoted by the grey areas. The other 68 seconds before the trigger represent the delay between administering the increase and sending a trigger. This delay was not intentional but was necessary for the algorithm to process the data and send out the trigger. There are three things to note based on this graph. First, it shows that the random triggers on average had a relatively flat approach towards the trigger, whereas for the trigger based on skin conductance, a clear positive slope can be observed. In line with this observation, multilevel regressions showed that the relative increase in SCL across the two minutes preceding a trigger was significantly higher for triggers based on skin conductance (B = 0.31, 95% CI [0.29, 0.34], p < .001, R 2 = .134), which is consistent with the way our algorithm was programmed. A second point that stands out from the graph is that the skin conductance values of the triggers sent by our algorithm generally had a lower baseline value than the random trigger. Although there was no significant difference across the SCL in the minute before the trigger (i.e. the dark grey area; B = 0.24, 95% CI [À0.02, 0.49], p = .067, R 2 < .001), the overall SCL value across the entire 10 minutes of Figure 2 preceding the trigger turned out to be significantly lower for the trigger based on skin conductance (B = À0.95, 95% CI [À1.24, À0.67], p < .001, R 2 = .013). It therefore seems that there is a dependency between level and slope. Moments during which SCLs showed a steep increase occurred mostly when participants had low SCLs to begin with. Finally, both types of triggers sparked a spike in skin conductance that overall led up to the highest value of skin conductance in the graph.

Differences between random triggers and triggers based on skin conductance
We examined if both types of triggers differed in step count, the time of the day that the survey was administered, and whether there were differences on how much time had passed since the last trigger was sent. We found that in the minute preceding the trigger (i.e. the dark grey area in Figure 2), participants reported to have taken more steps during the random trigger (B = À1.79, 95% CI [À3.54, À0.05], p = .046, R 2 = .001). This corresponds with the fact that we controlled for step count increases in our algorithm. Additionally, no differences were found between both types of triggers for the time of the day the trigger was sent (M random = 13.53, M-SCL = 13.43, t = 1.47, p = .142), but the time that passed (in hours) since the last trigger was significantly shorter for skin conductance-based trigger (M random = 0.90, M SCL = 0.57, t = 20.76, p < .001).
Multilevel regression as reported in Table 6 indicated that participants reported to be less aroused before skin conductance-based triggers and felt less positively energized. There were no differences on the other negative affective scales irritation, anxiety, or negative valence. Concerning the contextual variables about the situational and event characteristics, we found that participants generally reported the situation and the event to be less important during triggers that were based on increased skin conductance. The differences we found were of small effect size with explained variances ranging from 0.001 to 0.002. In exception of negative valence for which there was significant variance around the slope, none of the other random variance estimates indicated that the effects were different across individuals.

Exploratory analyses
To examine the skin conductance features of arousing moments more in depth, we used multilevel regressions to compare the skin conductance signatures of participants' two most and two least arousing and emotional moments. We compared both categories (i.e. high and low) on five different outcomes measures related to the skin conductance signal: (i) mean, (ii) standard deviation, (iii) autocorrelation, (iv) number of peaks, and (v) number of change points. The results of these additional analyses are presented across five tables, each using a different self-report measure: arousal, positive energy, irritation, anxiety, and negative valence.
In general, we found most of our effects for self-reported arousal (Table 7: M high arousal = 79.19; M low arousal = 19.51) and positive energy (Table 8: M high positive energy = 80.77; Mlow positive energy = 37.33). During moments in which individuals reported relatively high levels of arousal, the skin conductance signal generally also had a higher mean, a lower autocorrelation, and less change points than in the low condition. For positive energy, we found similar results. During moments in which individuals were relatively positively energized, their skin conductance was overall higher, and the signal contained fewer change points. Across 30 minutes, we also found that the autocorrelation was lower. Finally, in contrast to arousal, for positive energy, the number of peaks were more numerous during moments of high positive energy. It is notable that most of the effects that we found in Tables 7 and 8 were present across all the three different timeframes. For the negative affective scales, irritation (Table 9: M high irritation = 54.51; M low irritation = 11.49), anxiety (Table 10: M high anxiety = 41.50; M low anxiety = 4.65), and negative valence (Table 11: M high negative valence = 33.16; M low negative valence = 1.66), in line with our previous findings (see Table 5), relationships were mostly absent for these three negative affect scales.

DISCUSSION
To capture moments that were psychologically arousing, the present study adopted a sampling technique in which 5 days of experience sampling self-reports were linked to physiological measures of skin conductance. Our study indicates that although it is feasible to obtain continuous measurements of skin conductance in a naturalistic setting, an integration of skin conductance signals with momentary self-reports in daily life appears challenging. Specifically, sampling situations based on relative increases of skin conductance in this study did not actually result in a sample of more psychologically arousing situations. In fact, because of an unexpected dependency between level and slope, an inverse pattern was found. Our additional exploratory analyses show that there might be additional skin conductance features related to arousal in daily life. In what follows, we first discuss measuring skin conductance outside the lab in real life as well as the relationships between skin conductance measures and self-reports. Second, we turn our attention to our findings concerning the triggering algorithm and inform future studies that aim to use physiological signals to sample arousing situations.

Skin conductance and momentary self-reports in daily life
Our study is among the few to have linked physiological measures of skin conductance to momentary self-reports as they occurred in everyday life. Based on our experiences and the physiological output that was administered by the skin conductance devices, measuring skin conductance for longer periods of time throughout the day outside the lab Note: Each row represents a separate multilevel model with a random intercept. The high category was coded as 0, and the low category was coded as 1. B, unstandardized regression coefficient of the fixed effect; CI, confidence interval; R 2 , explained variance for the fixed effect. We have displayed significant p values in bold. Using SCL to sample arousing events 803 appeared feasible. Participants were not too bothered by the device and only in a small proportion our measurements we had to deal with loose electrodes. This gives some rise to optimism regarding future use of continuous measures of skin conductance in daily life. Particularly in the case of the EdaMove that uses gel-based electrodes that are fixed to the skin and that are recommended as opposed to rings or bracelets that use dry electrodes Fowles et al., 1981). Nonetheless, there are some practical challenges to consider when using skin conductance devices in daily life. To make sure the electrodes would stick for an entire day, we used additional tape. Although this worked well, it can also produce possible artefacts in the data. It might for instance further induce sweating, the pressure on the palmar surface could elicit changes in blood circulation, and changes in pressure could produce local changes in skin resistance (i.e. Ebbecke waves; Boucsein, 2012). Also, participants in our study had to come in twice a day to get a device administered, which not only intensified the design for participants but also made it challenging to test many participants concurrently or to measure during evenings and weekends. Students are in this regard a convenient population as in many cases it was easy for them to adjust their schedule. This is something to bear in mind when testing among other, possibly less flexible populations. With the help of 13 student assistants, it took us a total of 10 weeks to gather the data for 82 participants.
In addition to these practical considerations, it is important to consider the complexity of skin conductance data. While skin conductance data are generally sensitive to arousing events, the signal lacks specificity as it depends on many non-psychological factors such as digestion, movement, and temperature. It is therefore at least promising that when we examined the raw skin conductance signal for the minute preceding a trigger (i.e. both random triggers and triggers based on skin conductance), we found several relationships with self-reported affect, situations, and events that were in the expected direction. In line with skin conductance being indicative for psychological arousal (Boucsein, 2012), we found that self-reported arousal was positively related to SCL in the minute preceding a questionnaire. Additionally, higher levels of positive energy were related to higher levels of skin conductance, which aligns with findings of previous lab studies that found that experimentally induced positive emotions such as happiness, joy, and amusement generally coincide with elevated skin conductance responses (Kreibig, 2010). Also, we found relationships with the complexity of the situation, as well as the expectedness of the event, which is again consistent with levels of skin conductance denoting moments of psychological significance. Accordingly, we collected some evidence that skin  Note: Each row represents a separate multilevel model with a random intercept. The high category was coded as 0, and the low category was coded as 1. B, unstandardized regression coefficient of the fixed effect; CI, confidence interval; R conductance in daily life is meaningfully associated with experience sampling measures of emotions. It was surprising that we did not find any relationships between skin conductance and the negative affective dimensions: anxiety, irritation, and negative valence. We expected anxiety and irritation to be accompanied by states of high physiological activation. Anxiety, for example, is a common emotion that has been related to elevated skin conductance responses in previous lab studies (Kreibig, 2010). The negative valence scale comprised sadness, gloominess, and anger. Previous lab studies found sadness to be one of the emotions that was accompanied by decreases in skin conductance but found the opposite for anger (Kreibig, 2010). It could be that for negative valence, the low arousal of sadness and gloominess and the high arousal of anger balanced each other out. Post hoc, we have looked at each item separately, but for all three items separately, no relationship was found. Looking at the significant random variance around the slopes of anxiety, irritation, and negative valence, we conclude that the absence of these effects might not hold for the entire sample. Also, based on the low means that we found of the three negative affect dimensions, moments of severe anxiety, negative valence, or irritation were relatively rare. As a result, it could be that for many participants, we did not have enough variance on the higher end of the scale to find a potential effect.
Of note, the effect sizes of the relationships that we found were relatively small (between 0.5% and 0.8% explained variance), and they were only significant because of the large sample size. The small effect size is not completely unexpected due to common discrepancies between using two converging operationalizations of emotional experiences. First, for the skin conductance signal, this might be due to the complex nature of everyday physiological information. In the lab, participants are generally requested to remain as still as possible while working through an experimental task in a controlled environment. In daily life, we cannot, and do not desire, to constrain participants similarly because participants must be able to go about their day undisturbed. Because of this, there are many factors that can cause artificial jumps in the data. Participants could for instance squeeze their hands or grab something. To deal with this, researchers could consider triangulating the skin conductance data with other relevant sensor data, such as heart rate or respiratory rate. If only one of the sensors shows an increase due to noise in the data, researchers are less likely to misinterpret this increase as potentially relevant. In addition to these artificial jumps, skin conductance is not an exclusive indicator of emotional experiences. Skin conductance is a complex physiological signal that relates to non-psychological processes as well. It, for instance, also varies as a function of digestion, homeostasis, medicine intake, and/or temperature (Boucsein, 2012;Mauss & Robinson, 2009). As for our study design, the signal had to be processed instantly, we were limited in options for data filtering. In this particular design, we only filtered the data on movement, but future research should consider incorporating additional features to ensure the algorithm processes a clean signal.
For self-reports of emotions, although a common, fast, and relatively effective method for assessing subjective emotional experiences, researchers are dependent on subjective judgements of participants. In the moment, participants might not always consciously experience their feelings. They might also lack the knowledge to cluster their feelings into emotional adjectives, over-rationalize their feelings, or even report about their feelings more desirably (Mauss & Robinson, 2009). Previous research indicates that people tend to respond to items of affective scales more as continuous valence dimensions, instead of pertaining to concrete situations that elicit certain discrete emotions (Barrett, 1998). As these general dimensions are not necessarily bound to specific situations, it might be harder to relate this to the specific timeframe of the skin conductance signal (i.e. a minute before the trigger). Our exploratory analyses further underline this as we found relationships between self-reports and skin conductance which were generally present across three different time spans: 1 minute, 10 minutes, and 30 minutes. Finally, what further complicates relating skin conductance signal dynamics with self-reported measures is that in ESM, participants usually get some time to fill in the questionnaire. Where our participants were generally fast responders, to ensure participants report in the Note: Each row represents a separate multilevel model with a random intercept. The high category was coded as 0, and the low category was coded as 1. B, unstandardized regression coefficient of the fixed effect; CI, confidence interval; R 2 , explained variance for the fixed effect. We have displayed significant p values in bold.
Using SCL to sample arousing events 805 moment, we recommend future studies to use stricter timeframes to fill in the questionnaire.
Using the skin conductance signal to sample arousing moments Our study indicates that sampling arousing situations based on physiological skin conductance data is more complex than simply administering the relative increase of the current minute with the preceding minute. We found that selectively sampling situations based on relative increases in skin conductance actually resulted in a sample of less psychologically arousing situations as experienced by participants-the opposite of what we tried to achieve. In our case, the algorithm successfully selected moments of skin conductance increases, but the overall SCL in the 10 minutes approaching the trigger was lower in our triggering condition. This general level-slope dependency was not anticipated but of course makes sense if relative increases are the focus. After all, as the algorithm selected moments on the basis of relative increase, lower values required a less substantial increase to be selected. As a result, these moments had an increased chance to trigger a questionnaire, yet participants in these moments were in a relatively low arousal mood. The negative relation between level and slope has earlier been described as the law of initial values (Wilder, 2014). This law predicts that low initial values are generally related to a higher difference scores. Our additional analyses indicated that there might be other skin conductance characteristics worth looking into when designing algorithms to sample arousing moments besides relative increases. Particularly in the case of positive arousal, we found that mean levels, autocorrelations, number of peaks, and number of change points were all in some way related to high arousing moments. Our results indicate that people's skin conductance signature may change in many different and complex ways when an arousing situation is encountered. For future studies, it could be interesting to create an algorithm that draws upon an even more extensive list of time series parameters. Ideally, such an algorithm also contains an extensive learning period. Participants should first wear the skin conductance device for some time and regularly provide feedback when they are in an arousing situation. The algorithm could then adjust the parameters each time the participant provides feedback. With each round of feedback, the predictive capabilities of the algorithm should then become more precise.

CONCLUSION
Overall, in this study we found that relationships between skin conductance and momentary self-report measures in daily life were not straightforward. We found that we were able to use a skin conductance device (EdaMove 3) that makes use of gel-based electrodes which were stuck to the skin for longer periods of time outside of the lab. Participants overall did not experience much discomfort, and electrodes overall remained relatively well attached to the skin. Nonetheless, bearing the complexity of the skin conductance signals in mind, an algorithm that was concerned with selecting relative increases of SCL across intervals of a minute appeared oversimplified. Before we can confidently answer if and how it will be possible to use the skin conductance measures to select situations of psychological arousal, we need to better understand the discrepancies between skin conductance measures and self-reports. Overall, we found some indication of relationships between skin conductance and self-reported arousal in daily life, but effect sizes were small, and for many of our negatively valenced constructs, we did not find a relationship at all. Additional analyses showed that it is important for future studies to also account for other relevant parameters of skin conductance (i.e. autocorrelation, number of peaks, and number of change points) and also consider multiple and broader timeframes. Having exact information on where both measures deviate can further inform future studies that intend to use both physiological skin conductance and self-reports in their design. For additional validations, future studies might also investigate other psychological correlates of physiological arousal in daily life (e.g. personality; Eysenck, 1967;Fowles, 1980) or use multiple indicators of arousal, such as heart rate or respiratory rate.

SUPPORTING INFORMATION
Additional supporting information may be found online in the Supporting Information section at the end of the article.