This appendix details the process of creating the simulated MICS data that is employed in the examples on this website.

MICS data are freely available, but usage of MICS requires completing a user agreement, and registering for a user account, on the MICS website, and thus MICS data should not be shared openly on a public website.

This Appendix is highly technical. It is not necessary to understand this Appendix to benefit from the rest of this website. However, the details of creating this simulated data may be of interest to some users.

B.1 Call Relevant Libraries

We need to call a number of relevant R libraries to simulate the data.

Because simulation is a random process, we set a random seed so that the simulation produces the same data set each time it is run.

We are going to simulate data with 30 countries, and 100 individuals per country.

Show the code

set.seed(1234) # random seedN_countries <-30# number of countriesN <-100# sample size / country

B.3 Simulate Data Based on MICS

This is multilevel data where individuals are nested, or clustered, inside countries. Excellent technical and pedagogical discussions of multilevel models can be found in Raudenbush & Bryk (2002), Singer & Willett (2003), Rabe-Hesketh & Skrondal (2022), Luke (2004), and Kreft & de Leeuw (1998).

B.3.1 Level 2

Simulating the second level of the data is relatively easy. We simply need to provide the number of countries, and then generate random effects for each country. Random effects are discussed in the above references, but essentially represent country level differences in the data.

country <-seq(1:N_countries) # sequence 1 to 30GII <-rbinom(N_countries, 100, .25) # gender inequality indexHDI <-rbinom(N_countries, 100, .25) # Human Development Indexu0 <-rnorm(N_countries, 0, .25) # random interceptu1 <-rnorm(N_countries, 0, .05) # random sloperandomeffects <-data.frame(country, GII, HDI, u0, u1) # dataframe of random effects

B.3.2 Level 1

Simulating the Level 1 data is more complex.

We uncount the data by 100 to create 100 observations for each country. We then create an id number.

We create randomly simulated parental discipline variables with proportions similar to those in MICS.

Lastly, we need to create the dependent variable. Because this is a dichotomous outcome, the process is somewhat complex. We need to craete a linear combination z, using regression weights derived from MICS. We then calculate predicted probabilities, and lastly generate a dichotomous aggression outcome from those probabilities.

pander(labelled::look_for(MICSsimulated)[1:4]) # list out variable labels

Table B.1: Variable Labels

pos

variable

label

col_type

1

id

id

int

2

country

country

int

3

GII

Gender Inequality Index

int

4

HDI

Human Development Index

int

5

cd1

spank

int

6

cd2

beat

int

7

cd3

shout

int

8

cd4

explain

int

9

aggression

aggression

int

B.4 Explore The Simulated Data With A Graph

Exploring the simulated data with a graph helps us to ensure that we have simulated plausible data.

Show the code

ggplot(MICSsimulated,aes(x = cd1, # x is spankingy = aggression, # y is aggressioncolor =factor(country))) +# color is countrygeom_smooth(method ="glm", # glm smoothermethod.args =list(family ="binomial"),alpha = .1) +# transparency for CI'slabs(title ="Aggression as a Function of Spanking",x ="spank",y ="aggression") +scale_color_viridis_d(name ="Country") +# nice colorstheme_minimal()

B.5 Explore The Simulated Data With A Logistic Regression

Similarly, exploring the data with a logistic regression confirms that we have created plausible data.

Show the code

fit1 <-glmer(aggression ~ cd1 + cd2 + cd3 + cd4 + GII + (1| country), family ="binomial",data = MICSsimulated,control =glmerControl(optimizer ="bobyqa"))summary(fit1) # table for PDF

Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: aggression ~ cd1 + cd2 + cd3 + cd4 + GII + (1 | country)
Data: MICSsimulated
Control: glmerControl(optimizer = "bobyqa")
AIC BIC logLik deviance df.resid
4043.0 4085.1 -2014.5 4029.0 2993
Scaled residuals:
Min 1Q Median 3Q Max
-2.3105 -1.0575 0.6729 0.8799 1.3032
Random effects:
Groups Name Variance Std.Dev.
country (Intercept) 0.04616 0.2148
Number of obs: 3000, groups: country, 30
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.40705 0.35298 -1.153 0.24883
cd1 0.21337 0.07766 2.748 0.00600 **
cd2 0.58546 0.17948 3.262 0.00111 **
cd3 0.45124 0.07778 5.801 6.57e-09 ***
cd4 -0.36836 0.09156 -4.023 5.74e-05 ***
GII 0.02269 0.01388 1.635 0.10201
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) cd1 cd2 cd3 cd4
cd1 -0.070
cd2 -0.022 -0.010
cd3 -0.157 0.002 0.020
cd4 -0.204 -0.009 -0.004 -0.020
GII -0.954 -0.011 -0.003 0.023 0.004

B.6 Write data to various formats

Lastly, we write the data out to various formats: R, Stata, and SPSS.

Rabe-Hesketh, S., & Skrondal, A. (2022). Multilevel and longitudinal modeling using Stata. In Stata Press (4th ed.). Stata Press.

Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (pp. xxiv, 485 p.). Sage Publications.

Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis : Modeling change and event occurrence. In Applied longitudinal data analysis : modeling change and event occurrence. Oxford University Press.