Exploratory Data Analysis in R (edar)

Introduction

The package Exploratory Data Analysis in R (edar) allows efficient exploratory data analyses with few lines of code. It contains some functions to:

  • overview and summarise the data set
  • check balance of covariates among control and treatment groups
  • create organized and ready to export (to latex, html, etc) tables with results of model estimation
  • easily create plots with fitted values comparing one or more models under different treatment conditions
  • create plots with point estimates and their intervals (dotwisker plots)
  • conduct robustness checks of the model results (multiple imputation, post-stratification, etc).

    Quantitative researchers conduct those tasks repeatedly. The package provides functions to do them more efficiency and with minimum code.

  1. Check the numerical and categorical variables of the data set
    • Look for outliers and missing values
    • Check distribution of the variables
  2. Fit a multivariate regression model
  3. Display and check the results
  4. Do multiple imputation and post-stratification (in surveys)
  5. Repeat 2 and 3
  6. Recode some Variables or change model specifications
  7. Repeat

Execpt for item 6, the package edar can spead up all those tasks. For instance, suppose data contains the data set. Those tasks can be performed with few lines of code:

# summary tables
data %>% summarise_alln(.) # summarise all numerical variables of the data in a table
data %>% summarise_allc(.) # summarise all categorical variables of the data in a table 
data %>% summarise_allcbundle(.) # summarise all categorical variables of the data in a table
data %>% ebalance(., treatmentVar="treat") ## summary of numerical variables for different levels of "treat"

# summary plots
data %>% gge_describe(.)  ## marginal distribution of all variables
data %>% gge_density(.) ## marginal distribution of numerical variables only
data %>% gge_histogram(.)## marginal distribution of numerical variables only using histograms
data %>% gge_barplot(.) ## marginal distribution of non-numerical variables

# afeter fitting the models model1, model2, etc ...
tidye(model1, hc=T) ## summarise put summary in a tidy data.frame (using robust std.errors)
tidye(list(model1,model2)) ## same, but summarise both models at once

# plots
gge_coef(model1) ## dotwisker plot (plot with coefficients and std. errors)
model %>%  gge_fit(., data, "y", "x1") ## plot with fitted values as function of covariate x1

# multiple imputation and post-stratification
emultimputation(data, formula,  dep.vars = c(...), ind.vars=c(...)) 
epoststrat(data, population.proportion, strata = ~ stratification.variable1 + stratification.variable2...) 

Workflow

Data

Here is an example of workflow with edar. We will use the data set edar_survey that comes with the package:

library(magrittr)
library(edar)

data(edar_survey)
help(edar_survey)

data = edar_survey

A National Survey from Brazil
Description:

The data set is a subset of a national suvery conducted in Brazil
in 2013. The survey measures preferences of individuals for
interpersonal and interregional redistribution of income as well
as preferences for centralization of political authority.

Usage:

data(edar_survey)

Format:

A data frame with 700 rows and 16 columns:

gender factor with “men” and “woman”

educ factor with “high” if the individual completed high school or
more, and “low” otherwise

age integer with age in years

yi numeric variable with household income per capita

yi.iht inverse hyperbolic transformation of yi

state factor with the state in which the individual lives

region factor with macroregion

ys.mean average household percapita income in the state, computed
using the 2013 Brazilian National Household Survey (PNAD)

trust factor, “high” or “low” trust in the federal government

treat numeric, 0 for control group or 1 for treatment group. It is
a randomly generated variable for used for ilustration of the
examples and vignettes only

ys.gini numeric, Gini coefficient of the state computed using the
2013 Brazilian National Household Survey (PNAD)

racial.frag.ratio numeric, racial fractionalization at the state
over racial fractionalization at the national level

reduce.income.gap factor, “A”=Agree, “A+”=Strongly Agree,
“D”=Disagree, “D+”=Strongly Disagree, “N”=Neither Agree or
Disagree that “Government should reduce income gap between
rich and poor”

transfer.state.tax factor, “A”=Agree, “A+”=Strongly Agree,
“D”=Disagree, “D+”=Strongly Disagree, “N”=Neither Agree or
Disagree that the “Government should redistribute resources
from rich to poor states”

minimum.wage factor, captures the answer to “Who should decide
about the minimum wage policy?”. The levels are “Each city
should decide”, “Each state should decide”, “Should be the
same accros the country”

unemployment.policy factor, captures the answer to “Who should
decide about the unemployment policy?”. The levels are “Each
city should decide”, “Each state should decide”, “Should be
the same accros the country”

red.to.poor factor, captures the answer to “Who should decide
about policies to redistribute income to poor?”. The levels
are “Each city should decide”, “Each state should decide”,
“Should be the same accros the country”

Source:

options(crayon.enabled = FALSE)
options(tibble.width=100)
options(tibble.digits=4)
options(dplyr.width=100)
options(width=70)
options(scipen=999)
options(digits=4)
options(knitr.kable.NA ="")

Summary tables

First, we can have a quick overview of the data set using the functions summarise_alln and summarise_allc provided by edar package. They show the summary of numerical and categorical variables in the data set, respectively:

NOTE: throughout this document, I use the package kable for better visualization of some tables

data %>% 
    summarise_alln(., digits=2) %>%
    kableExtra::kable(., "latex", booktabs = T ) %>%
    kableExtra::kable_styling(latex_options = c("scale_down"))

data %>% summarise_allc(.) %>% print(width=70)

# A tibble: 10 x 7
   var        N   NAs Categories Frequency     Table Categories.Labe…
                                  
 1 educ     700     0          2 high  (39.71… <dat… high, low       
 2 gender   700     0          2 man   (41.29… <dat… man, woman      
 3 minim…   695     5          4 Each  (8.63 … <dat… Each city shoul…
 4 red.t…   679    21          4 Each  (11.78… <dat… Each city shoul…
 5 reduc…   700     0          5 A     (72.43… <dat… A, A+, D, D+, N 
 6 region   700     0          5 CO    (6.14 … <dat… CO, NE, NO, SE,…
 7 state    700     0         27 AC    (0.29 … <dat… AC, AL, AM, AP,…
 8 trans…   700     0          5 A     (70.14… <dat… A, A+, D, D+, N 
 9 trust    694     6          3 high  (56.92… <dat… high, low       
10 unemp…   699     1          4 Each  (9.59 … <dat… Each city shoul…

The summary of categorical variables produced by summarise_allc contains a column named Table, which contains a table with the counts for each category value of the variable.

tab = data %>% summarise_allc(.)
tab$Table[[6]]
Variable CO NE NO SE SU NA
region 43 333 46 174 104 0

It is common to have data sets in which many categorical variables have the same categories. The function summarise_allcbundle provides a summary of all categorical variables of the data set and aggregate those with same categories. The output contain columns named Table, Tablep, and Tablel. Table contains a table with counts of the categories of the variables. Tablep presents the same information, but in percentage. Tablel presents both the counts and percentage, which can be exported directly for reports and articles. The column Variables in the output contains the name of all the variables that have the same Category.Labels

data %>% summarise_allcbundle(.)  %>% print(., width=70)
# A tibble: 6 x 6
  N.Variables Variables Categories.Labels      Table   Tablep  Tablel
        <int> <list>    <chr>                  <list>  <list>  <list>
1           2 <chr [2]> A, A+, D, D+, N        <data.… <data.… <data…
2           1 <chr [1]> AC, AL, AM, AP, BA, C… <data.… <data.… <data…
3           1 <chr [1]> CO, NE, NO, SE, SU     <data.… <data.… <data…
4           3 <chr [3]> Each city should deci… <data.… <data.… <data…
5           2 <chr [2]> high, low              <data.… <data.… <data…
6           1 <chr [1]> man, woman             <data.… <data.… <data…

tab = data %>% summarise_allcbundle(.)
tab$Table[[5]]
Variable high low NA
educ 278 422 0
trust 395 299 6
Variable high low NA
educ 278 422 0
trust 395 299 6
tab$Tablep[[5]]
Variable high low NA
educ 39.71 60.29 0
trust 56.43 42.71 0.86
tab$Tablel[[5]]
Variable high low NA
educ 39.71 % (N=278) 60.29 % (N=422) 0 % (N=0)
trust 56.43 % (N=395) 42.71 % (N=299) 0.86 % (N=6)

Checking balance of covariates

We can easily check the distribution of covariates among two factor levels. Consider the variable treat, which represents the treatment condition (1=treatment, 0=control). We can describe the distribution of covariates using ebalance(). The table follows recomendations in imbens2015causal.

data %>% ebalance(., treatmentVar='treat') %>% print(., digits=2)

Variable mut st muc sc NorDiff lnRatioSdtDev pit pic
age 45.83 16.61 44.9 16.27 0.06 0.02 0.03 0.06
yi 946.36 1671.84 916.04 1418.32 0.02 0.16 0.03 0.07
yi.iht 6.95 1.07 6.98 1.08 -0.03 -0.01 0.03 0.07
ys.mean 981.54 297.97 966.04 302.6 0.05 -0.02 0.04 0.02
ys.gini 0.52 0.03 0.53 0.03 -0.16 -0.1 0.03 0.05
racial.frag.ratio 0.87 0.13 0.87 0.14 0.02 -0.07 0 0.05
MahalanobisDist nil nil nil nil 0.22 nil nil nil
pscore 0.5 0.5 0.46 0.5 0.07 0 0.02 0.04
LinPscore -0.09 26.61 -1.92 26.54 0.07 0 0.04 0.07
N 337 nil 363 nil nil nil nil nil

Summary plots

The package also provides some functions to easily visualise the marginal distribution of many variables at once. The marginal densities can be grouped by factors using the parameter group. When the marginal densities are presented by group, the plot include the p-value of the Kolmogorov-Smirnov distance.

g = data[,1:8] %>% gge_describe(.)
print(g)

gge_describe.png

g = data[,1:9] %>% gge_describe(., group='educ')
print(g)

gge_describe_group.png

Other similar functions provided by the package are:

  • gge_barplot()
  • gge_density()
  • gge_histogram()
  • gge_barplot()

Analyzing output of model estimation

Fitting models

The package edar make it easy to display results of estimation. It can be achieved with minimum code. Suppose we estimated five different models:

set.seed(77)
data = tibble::data_frame(n = 300,
                          x1   = rnorm(n,3,1),
                          x2   = rexp(n),
                          cat1 = sample(c(0,1), n, replace=T),
                          cat2 = sample(letters[1:4], n, replace=T),
                          y    = -10*x1*cat1 + 10*x2*(3*(cat2=='a') -3*(cat2=='b') +1*(cat2=='c') -1*(cat2=='d')) + 
                              rnorm(n,0,10), 
                          y.bin = ifelse(y < mean(y), 0, 1),
                          y.mul = 1+ifelse( - x1 - x2 + rnorm(n,sd=10) < 0, 0,
                                    ifelse( - 2*x2 + rnorm(n,sd=10) < 0, 1, 2)),
                          )

formula1    = y ~ x1
formula2    = y ~ x1 + x2
formula3    = y ~ x1*cat1 + x2*cat2
formula4bin = y.bin ~ x1+x2*cat2
formula4bin1 = y.bin ~ x1+x2
formula4bin2 = y.bin ~ x1*cat1+x2*cat2
formula5mul = y.mul ~ x1 + x2

model.g1    = lm(formula1, data)
model.g2    = lm(formula2, data)
model.g3    = lm(formula3, data)
model.bin   = glm(formula4bin, data=data, family='binomial')
model.bin1  = glm(formula4bin, data=data, family='binomial')
model.bin2  = glm(formula4bin, data=data, family='binomial')
model.mul   = nnet::multinom(formula5mul, data)

Tables

We want to vizualize the model estimate. The function tidye creates tidy summary tables with the output. It is a wrap function for broom::tidy(), and it works with list of models. Here are some examples:


tidye(model.g3)

## works with other types of dependent variables
# tidye(model.bin)
# tidye(model.mul)

term estimate std.error conf.low conf.high statistic p.value
(Intercept) 3.6042 3.0375 -2.3742 9.5826 1.1866 0.2364
x1 -0.9053 0.8167 -2.5126 0.7021 -1.1085 0.2686
cat1 -2.2011 3.6151 -9.3164 4.9142 -0.6089 0.5431
x2 28.0061 1.3544 25.3403 30.6719 20.6774 0
cat2b -0.1835 2.3532 -4.8151 4.4481 -0.078 0.9379
cat2c -0.9414 2.2746 -5.4184 3.5355 -0.4139 0.6793
cat2d -1.4556 2.4636 -6.3044 3.3932 -0.5909 0.5551
x1:cat1 -9.2755 1.1527 -11.5442 -7.0069 -8.0471 0
x2:cat2b -58.1667 1.8639 -61.8352 -54.4982 -31.2071 0
x2:cat2c -17.6127 1.7246 -21.0071 -14.2183 -10.2125 0
x2:cat2d -38.3783 2.0687 -42.4499 -34.3068 -18.5523 0

We can have robust standard errors, and keep or not information of non-corrected values for comparison.

## with robust std.errors
tidye(model.g3, hc=T)

term estimate std.error conf.low conf.high statistic p.value
(Intercept) 3.6042 3.2952 -2.8544 10.0628 1.0938 0.275
x1 -0.9053 0.8481 -2.5676 0.7571 -1.0673 0.2867
cat1 -2.2011 3.7761 -9.6023 5.2001 -0.5829 0.5604
x2 28.0061 1.5784 24.9124 31.0998 17.7432 0
cat2b -0.1835 2.5577 -5.1965 4.8295 -0.0717 0.9429
cat2c -0.9414 2.4039 -5.6531 3.7703 -0.3916 0.6956
cat2d -1.4556 2.691 -6.7299 3.8187 -0.5409 0.589
x1:cat1 -9.2755 1.2346 -11.6953 -6.8558 -7.5131 0
x2:cat2b -58.1667 1.8969 -61.8846 -54.4488 -30.664 0
x2:cat2c -17.6127 1.8342 -21.2077 -14.0176 -9.6023 0
x2:cat2d -38.3783 2.3255 -42.9364 -33.8203 -16.5029 0
tidye(model.g3, hc=T, keep.nohc=T)  %>%
    kableExtra::kable(., "latex", booktabs = T ) %>%
    kableExtra::kable_styling(latex_options = c("scale_down"))

Finally, we can create tables with list of models.


## list of models
tidye(list(Gaussian=model.g3, Binomial=model.bin, Multinomial=model.mul)) %>%
    kableExtra::kable(., "latex", booktabs = T ) %>%
    kableExtra::kable_styling(latex_options = c("scale_down"))

It can easily be exported to standard publication format using the package kable or the function etab() provided by edar

list(Binomial=model.bin, Multinomial=model.mul,Gaussian=model.g3) %>%
    etab %>%
    kableExtra::kable(., "latex", booktabs = T , align = c("l",rep('c',4))) %>%
    kableExtra::kable_styling(latex_options = c("scale_down"))

Plot fitted values

After the estimation a good way to visualize and present marginal effects are plots with fitted values. It is easy to do with edar package.

model.g1 %>% gge_fit(., data, 'y', "x1")

fig-fitted-value-1.png

There are many options avaiable with the gge_fit() function. We can at once:

  • Compare fitted values for different groups
  • Compare fitted values for different model specifications, given a list of models
  • Create a grid of plots with fitted values for different groups and model specifications
  • Fitted values for different groups
    model.g3 %>% gge_fit(., data, 'y', "x2", cat.values=list(cat2=c('a',"b")))
    

    fig-fiited-cat-1.png

    g1 = model.g3 %>% gge_fit(., data, 'y', "x2",  cat.values=list(cat2=c('a')), title='Variable cat2 fixed at a')
    g2 = model.g3 %>% gge_fit(., data, 'y', "x2",  cat.values=list(cat2=c('b')), title='Variable cat2 fixed at b')
    ggpubr::ggarrange(g1,g2)
    

    fig-fiited-cat-2.png

    model.g3 %>% gge_fit(., data, 'y', "x2", facets='cat2' )
    

    fig-fiited-cat-3.png

    model.g3 %>% edar::gge_fit(., data, 'y', 'x1', facets='cat2', pch.col.cat='cat1', pch.col.palette=c(brewer="Set2"))
    

    fig-fitted-4.png

    We can also compare a list of models

    formulas = list("Model 1" = formula1, "Model 2" = formula2, "Model 3" = formula3)
    models   = list("Model 1" = model.g1, "Model 2" = model.g2, "Model 3" = model.g3)
    
    models %>%  gge_fit(., data, "y", "x2", formulas)
    
    

    fig-fitted-many-models-1.png

    formulas = list("Model 1" = formula1, "Model 2" = formula2, "Model 3" = formula3)
    models   = list("Model 1" = model.g1, "Model 2" = model.g2, "Model 3" = model.g3)
    
    models %>%  gge_fit(., data, "y", "x2", formulas,  legend.ncol.fill=3, facets='cat2')
    
    

    fig-fitted-many-models-1.png

    The same applies for logistic regressions.

    formula.bin1 = y.bin ~ x1+x2
    formula.bin2 = y.bin ~ x1+x2*cat2
    model.bin1   = glm(formula.bin1, data=data, family='binomial')
    model.bin2   = glm(formula.bin2, data=data, family='binomial')
    
    formulas = list("Model 1" = formula.bin1, "Model 2" = formula.bin2)
    models   = list("Model 1" = model.bin1, "Model 2" = model.bin2)
    
    models %>%  gge_fit(., data, "y.bin", "x1", formulas)
    
    
    

    fig-fitted-many-models-bin.png

    formula.bin1 = y.bin ~ x1+x2
    formula.bin2 = y.bin ~ x1+x2*cat2
    model.bin1   = glm(formula.bin1, data=data, family='binomial')
    model.bin2   = glm(formula.bin2, data=data, family='binomial')
    
    formulas = list("Model 1" = formula.bin1, "Model 2" = formula.bin2)
    models   = list("Model 1" = model.bin1, "Model 2" = model.bin2)
    models %>%  gge_fit(., data, "y.bin", "x2", formulas, facets='cat2')
    
    

    fig-fitted-many-models-bin-2.png

Plot with coefficients (dotwisker)

The edar package also provides a wrap function for the dotwisker() plot from the package with same name. As before, the function accepts list of models or tidy summaries of the estimation. There are also options to use robust standard errors in the plot.

models=tidye(list('Standard Model'=model.bin2)) %>%
    dplyr::bind_rows(tidye(list('Robust std. error'=model.bin2), hc=T) )
gge_coef(models, model.id='model')

dotwisker-1.png

Multiple-imputation and post-stratification

Multiple imputation and post-stratification are easy to conduct. The options are limited. Tha package survey and the package mice contain more options.

Here is an example of multiple imputation for two models with different output variables.

data = tibble::data_frame(x1 = rnorm(200,3,1),
                          x2 = rexp(200),
                          cat.var  = sample(c(0,1), 200, replace=T),
                          cat.var2 = sample(letters[1:4], 200, replace=T),
                          y1 = 10*x1*cat.var+rnorm(200,0,10) +
                              3*x2*(6*(cat.var2=='a') -3*(cat.var2=='b') +
                                    1*(cat.var2=='c') +1*(cat.var2=='d')),
                          y2 = -10*x1*cat.var+rnorm(200,0,10) +
                              10*x2*(3*(cat.var2=='a') -3*(cat.var2=='b') +
                                     1*(cat.var2=='c') -1*(cat.var2=='d'))
                          )  %>%
    dplyr::mutate(cat.var=as.factor(cat.var)) 
data$x1[sample(1:nrow(data), 10)] = NA


formula = "x1*cat.var+x2*cat.var2"
imp = emultimputation(data, formula,  dep.vars = c("y1", "y2"), ind.vars=c("x1", "x2", "cat.var", "cat.var2"))
imp$y1 %>%
    kableExtra::kable(., "latex", booktabs = T ) %>%
    kableExtra::kable_styling(latex_options = c("scale_down"))

imp$y2 %>%
    kableExtra::kable(., "latex", booktabs = T ) %>%
    kableExtra::kable_styling(latex_options = c("scale_down"))

Post-stratification for simple probabilistic sample is also straightforward.

data = tibble::data_frame(educ = sample(c("Low", "High"), 200, T), gender=sample(c('Man', "Woman"), 200, T), other.variable=rnorm(200)) 
pop.prop = tibble::data_frame(educ = c("Low", "High"))  %>%
    tidyr::crossing(gender=c("Man", "Woman")) %>%
    dplyr::mutate(Freq = 100*c(.3,.25,.3,.15)) 

epoststrat(data, pop.prop, strata = ~educ+gender) 
$weights
  [1] 0.5455 0.2542 0.7895 0.2542 0.5455 0.7895 0.5208 0.7895 0.2542
 [10] 0.2542 0.5208 0.5455 0.7895 0.5455 0.5455 0.2542 0.5208 0.7895
 [19] 0.5455 0.5455 0.5208 0.5455 0.5208 0.2542 0.5208 0.2542 0.7895
 [28] 0.7895 0.5455 0.7895 0.5208 0.5455 0.2542 0.7895 0.2542 0.5208
 [37] 0.7895 0.2542 0.7895 0.5208 0.2542 0.2542 0.5455 0.5208 0.5455
 [46] 0.5208 0.5455 0.5455 0.7895 0.5208 0.7895 0.2542 0.5455 0.2542
 [55] 0.5455 0.7895 0.5208 0.7895 0.2542 0.2542 0.5455 0.2542 0.5455
 [64] 0.2542 0.5455 0.2542 0.2542 0.5208 0.2542 0.2542 0.2542 0.5455
 [73] 0.5208 0.2542 0.5208 0.5455 0.2542 0.5455 0.2542 0.5455 0.5455
 [82] 0.7895 0.7895 0.2542 0.2542 0.7895 0.2542 0.7895 0.5208 0.5455
 [91] 0.5208 0.7895 0.5208 0.5455 0.5208 0.7895 0.5455 0.2542 0.5455
[100] 0.7895 0.5208 0.5208 0.2542 0.5208 0.2542 0.2542 0.7895 0.5208
[109] 0.2542 0.7895 0.5455 0.7895 0.5455 0.5455 0.5455 0.2542 0.2542
[118] 0.7895 0.5208 0.2542 0.5455 0.2542 0.2542 0.5208 0.2542 0.5208
[127] 0.7895 0.2542 0.5455 0.7895 0.5455 0.5455 0.2542 0.5455 0.5455
[136] 0.2542 0.5455 0.2542 0.2542 0.5208 0.5455 0.2542 0.5208 0.2542
[145] 0.5455 0.5455 0.5208 0.7895 0.2542 0.2542 0.5208 0.5455 0.5208
[154] 0.5208 0.5455 0.5208 0.7895 0.7895 0.5208 0.5455 0.5208 0.7895
[163] 0.7895 0.7895 0.5455 0.7895 0.5208 0.5455 0.5208 0.2542 0.2542
[172] 0.2542 0.5455 0.5208 0.2542 0.5455 0.5208 0.2542 0.5208 0.5208
[181] 0.5455 0.5208 0.5455 0.5208 0.7895 0.5208 0.2542 0.5455 0.5208
[190] 0.2542 0.2542 0.5208 0.5455 0.5455 0.2542 0.5208 0.5455 0.2542
[199] 0.7895 0.7895

$weights.trimmed
  [1] 0.5455 0.2542 0.7895 0.2542 0.5455 0.7895 0.5208 0.7895 0.2542
 [10] 0.2542 0.5208 0.5455 0.7895 0.5455 0.5455 0.2542 0.5208 0.7895
 [19] 0.5455 0.5455 0.5208 0.5455 0.5208 0.2542 0.5208 0.2542 0.7895
 [28] 0.7895 0.5455 0.7895 0.5208 0.5455 0.2542 0.7895 0.2542 0.5208
 [37] 0.7895 0.2542 0.7895 0.5208 0.2542 0.2542 0.5455 0.5208 0.5455
 [46] 0.5208 0.5455 0.5455 0.7895 0.5208 0.7895 0.2542 0.5455 0.2542
 [55] 0.5455 0.7895 0.5208 0.7895 0.2542 0.2542 0.5455 0.2542 0.5455
 [64] 0.2542 0.5455 0.2542 0.2542 0.5208 0.2542 0.2542 0.2542 0.5455
 [73] 0.5208 0.2542 0.5208 0.5455 0.2542 0.5455 0.2542 0.5455 0.5455
 [82] 0.7895 0.7895 0.2542 0.2542 0.7895 0.2542 0.7895 0.5208 0.5455
 [91] 0.5208 0.7895 0.5208 0.5455 0.5208 0.7895 0.5455 0.2542 0.5455
[100] 0.7895 0.5208 0.5208 0.2542 0.5208 0.2542 0.2542 0.7895 0.5208
[109] 0.2542 0.7895 0.5455 0.7895 0.5455 0.5455 0.5455 0.2542 0.2542
[118] 0.7895 0.5208 0.2542 0.5455 0.2542 0.2542 0.5208 0.2542 0.5208
[127] 0.7895 0.2542 0.5455 0.7895 0.5455 0.5455 0.2542 0.5455 0.5455
[136] 0.2542 0.5455 0.2542 0.2542 0.5208 0.5455 0.2542 0.5208 0.2542
[145] 0.5455 0.5455 0.5208 0.7895 0.2542 0.2542 0.5208 0.5455 0.5208
[154] 0.5208 0.5455 0.5208 0.7895 0.7895 0.5208 0.5455 0.5208 0.7895
[163] 0.7895 0.7895 0.5455 0.7895 0.5208 0.5455 0.5208 0.2542 0.2542
[172] 0.2542 0.5455 0.5208 0.2542 0.5455 0.5208 0.2542 0.5208 0.5208
[181] 0.5455 0.5208 0.5455 0.5208 0.7895 0.5208 0.2542 0.5455 0.5208
[190] 0.2542 0.2542 0.5208 0.5455 0.5455 0.2542 0.5208 0.5455 0.2542
[199] 0.7895 0.7895

Bibliography

  • [imbens2015causal] Imbens & Rubin, Causal Inference in Statistics, Social, and Biomedical Sciences: An Introduction, Cambridge University Press (2015).