Gentle introduction to DeclareDesign
DVM Bishop
GitHub: oscci
Twitter: deevybee
22nd September 2018
Huge thanks to Dorothy Bishop for writing this guide. Dorothy gave the DeclareDesign and DesignLibrary packages a spin and noticed how much of the action goes on behind the scenes. DeclareDesign uses a pipe structure to pass dataframes along from step to step, but this pipe structure is mostly invisible making it very hard to see what DeclareDesign is actually doing. Very kindly, in this guide Dorothy walks you through the pipe structure, showing what happens when a design gets “run” and a diagnosis gets implemented.
For getting set up to use DeclareDesign, see this website.
We’ll be looking at a simplified version of the code that you can find here. For the sake of clarity, I’ve modified the script in parts.
The script analyses the effectiveness of an experimental design that is used to assess an intervention effect by comparing the pre vs posttest difference score for treated vs untreated cases. For instance, you could think of this as testing the effectiveness of brain training on a memory test. We take a sample of people and measure their memory at time 1. Half of them are then given the braintraining and the others have a control intervention. Memory is measured again at time 2. If the training is effective, we expect the improvement in memory from time 1 to time 2 to be higher for the trained than for the untrained people.
Note that using change scores is not the optimal way to analyse intervention data, but it provides a simple example of how the package works. Later on we shall see how DeclareDesign makes it easy to compare different approaches to analysing trial data, in terms of their efficiency and power.
The user is asked to estimate certain features of the results of the study, so that data can be simulated with those characteristics. This can then be used to evaluate how effective the design is in testing the hypothesis of interest  in this case the treatment effect.
Before we go further, you should include the R packages you need. If you have not already installed these, you will need to do that first. Go to ToolsInstall Packages on the menu bar and type the names of the three packages that have ‘library’ commands below. Then run the lines with those commands.
library(DeclareDesign)
library(tidyverse)
library(DesignLibrary)
library(beeswarm)
library(knitr)
DeclareDesign differentiates four aspects of experimental design that are not always distinguished. The first of these is the Model of the world (M). In our example, the Model for the study requires us to specify a sample size (N), and pre and posttest scores for those who do and don’t have the intervention.
It makes life simpler if we think of the test scores as normal random deviates  i.e., zscores  where the initial pretest values have mean of zero and SD of one.
Regardless of intervention, we expect people’s scores to correlate between time 1 and time 2: i.e., there will some stability in rank ordering of people’s memory skills across the two measurement occasions. You have to provide an estimate of this relationship as a correlation coefficient.
In addition, an intervention effect will be expected just for the treated group: you have to provide an estimate of this in standard deviation units  i.e. as Cohen’s d. In the script this is referred to as the average treatment effect (ate
).
In intervention studies, it is very rare for all the participants to remain in the study up to followup. A final parameter to be estimated is the attrition rate  i.e. the proportion of cases that will be lost to followup.
Let’s start by specifying these sample characteristics for our Model.
N < 100 # sample size
ate < 0.25 # average treatment effect
sd_1 < 1 # SD for group 1
sd_2 < 1 # SD for group 2
rho < 0.5 # correlation between time1 and time2
attrition_rate < .1 # no outcome data for N*attrition_rate subjects
In the next chunk of code, we input these parameters into our model. When we do this using the declare_population()
function, we create a data frame which contains simulated data. Usually, when we run scripts from DeclareDesign, we would not see the data frame that is gradually built up as the script proceeds, but to improve understanding of the functions, we will inspect the simulated data for each chunk, looking just at the first few rows that are created.
# M: Model
population < declare_population(
N = N,
u_t1 = rnorm(N)*sd_1,
u_t2 = rnorm(N, rho * u_t1, sqrt(1  rho^2))*sd_2
)
df1 < population()
kable(head(df1))
ID  u_t1  u_t2 

001  0.3722855  0.4981104 
002  0.4700278  0.3207423 
003  0.7181537  2.0456900 
004  0.7210037  1.4360184 
005  1.3461157  1.6070232 
006  0.2275517  0.3872682 
You will see that we have a data frame with four columns. There are 100 rows (or as many as you have specified in N
), but we just display the first six. The first column is just a subject identifier. We then have the two columns u_t1
and u_t2
. You can think of these as the underlying variables from which the observed test scores will be derived.
We specified that there should be a correlation of rho
between u_t1
and u_t2
. Let’s see what this looks like:
mytitle < paste0('r = ', round(cor(df1$u_t1, df1$u_t2), 3))
plot(df1$u_t1, df1$u_t2, main = mytitle, col = 2, pch = 16)
It is unlikely that the correlation will be exactly the one you specified. That is because we are taking a sample from a population. You can think of the population as an enormous number of paired observations of u_t1
and u_t2
from which we are just selecting a subset of pairs. The larger our sample, the closer the observed correlation will be to the value of rho
that was specified. Unless you set the random number generator to give the same answer every time (see set.seed command), the specific values of u_t1
and u_t2
will change every time you run the script.
Next, we need to define the potential outcomes of the study – i.e., the outcome that unit 1 exhibits when assigned to the treatment versus when they are assigned to the control. We first define the treatment and control potential outcomes at the pretest phase, (Y_t1). In the chunk below, you’ll notice that our pretest score potential outcomes, Y_t1_Z_0
and Y_t1_Z_1
, are identical to u_t1
. Although this may seem redundant, there are two good reasons to specify the outcomes in that way. First, specifying the pretreatment outcomes as identical in the pretest period makes explicit our assumption that the treatment only affects outcomes in the second period. Thus, we’ve explicitly ruled out anticipation effects.
Second, in general it is helpful to keep separate the original underlying variables that are generated and the observed variables that we will enter into the analysis. This provides the flexibility to have observed variables that are not normally distributed. We could, for example, generate Y_t1
as a binary variable through a probit transformation of u_t1
by changing the declaration of Y_t1
in our population to draw_binary(N = N,latent = u_t1,link = "probit")
.[1]
Y_t2
is not a simple function of u_t2
, because it depends on the intervention. For those who have the intervention, we need to add the average treatment effect, ate
, to u_t2
whenever the treatment condition, Z
, is specified as 1
. We have not yet specified which individuals have treatment, but we create two new columns giving potential outcomes that tell us what would happen if a case were treated or untreated.
Again, we will inspect the first few rows, and see the new columns created by the data frame. We’ll also plot the values for the potential outcomes to observe the treatment effect. Note that the points for the treated condition are simply increased by an amount equivalent to ate
. When we come to the data strategy step below, we will allocate cases as treated or not treated, but at the model stage, we are just representing potential outcomes.
potential_outcomes < declare_potential_outcomes(
Y_t1_Z_0 = u_t1,
Y_t1_Z_1 = u_t1,
Y_t2_Z_0 = u_t2,
Y_t2_Z_1 = u_t2 + ate)
df1 < potential_outcomes(df1)
kable(head(df1))
ID  u_t1  u_t2  Y_t1_Z_0  Y_t1_Z_1  Y_t2_Z_0  Y_t2_Z_1 

001  0.3722855  0.4981104  0.3722855  0.3722855  0.4981104  0.2481104 
002  0.4700278  0.3207423  0.4700278  0.4700278  0.3207423  0.0707423 
003  0.7181537  2.0456900  0.7181537  0.7181537  2.0456900  1.7956900 
004  0.7210037  1.4360184  0.7210037  0.7210037  1.4360184  1.6860184 
005  1.3461157  1.6070232  1.3461157  1.3461157  1.6070232  1.3570232 
006  0.2275517  0.3872682  0.2275517  0.2275517  0.3872682  0.1372682 
We now move to the Inquiry part of the model. Here we just have to specify the true underlying quantity we want to estimate. In this case, we specify two. The first is the average treatment effect in the posttest period (ATE
), defined as the true mean posttest difference between the treatment and control potential outcomes of the dependent variable, Y
. We also define the true underlying differenceindifferences (DiD
) as an estimand, i.e., the true average difference in prepost differences between treatment and control. In fact, because potential outcomes are identical in the pretest period, these estimands are equivalent to one another. If we incorporated a violation of the noanticipation effects assumption, for example, the estimands would no longer be equivalent. Specifying our inquiry thus helps clarify the assumptions on which our inferences are based.
# I: Inquiry
estimand < declare_estimand(ATE = mean(Y_t2_Z_1  Y_t2_Z_0),
DiD = mean((Y_t2_Z_1  Y_t1_Z_1)  (Y_t2_Z_0  Y_t1_Z_0)))
kable(estimand(df1))
estimand_label  estimand 

ATE  0.25 
DiD  0.25 
The next step is termed the Data Strategy. At this step we determine how individuals are assigned to groups, and we also take into account any sample attrition. In effect, we determine which cases will feature in which groups in the analysis.
By looking at how df1
changes at each step of the script, we can see what is being achieved.
First, we declare how cases are assigned to treatments (Z
). The variable Z_cond_prob
is the conditional probability that a given case is in the treatment group, given it was assigned to treatment (and vice versa for control).[2] The default is for 50% to be assigned to each of two groups. In effect, the script acts to toss a coin that determines assignment of each case to Z = 1
or Z = 0
.
# D: Data Strategy
assignment < declare_assignment(prob = .5)
df1 < assignment(df1)
kable(head(df1))
ID  u_t1  u_t2  Y_t1_Z_0  Y_t1_Z_1  Y_t2_Z_0  Y_t2_Z_1  Z  Z_cond_prob 

001  0.3722855  0.4981104  0.3722855  0.3722855  0.4981104  0.2481104  0  0.5 
002  0.4700278  0.3207423  0.4700278  0.4700278  0.3207423  0.0707423  1  0.5 
003  0.7181537  2.0456900  0.7181537  0.7181537  2.0456900  1.7956900  1  0.5 
004  0.7210037  1.4360184  0.7210037  0.7210037  1.4360184  1.6860184  1  0.5 
005  1.3461157  1.6070232  1.3461157  1.3461157  1.6070232  1.3570232  0  0.5 
006  0.2275517  0.3872682  0.2275517  0.2275517  0.3872682  0.1372682  0  0.5 
Next, we take into account the attrition rate. New columns are added to the simulation: R
specifies whether or not the case is lost to followup, according to the attrition rate we specified. Since attrition is set to only 10%, you probably will only see cases where R = 1
in the brief part of the data frame that is shown below, but 10% of cases will have R
set to zero. (If you wish you can inspect all values by typing df1$R
at the console).
report < declare_assignment(prob = (1  attrition_rate),
assignment_variable = R)
df1 < report(df1)
kable(head(df1))
ID  u_t1  u_t2  Y_t1_Z_0  Y_t1_Z_1  Y_t2_Z_0  Y_t2_Z_1  Z  Z_cond_prob  R  R_cond_prob 

001  0.3722855  0.4981104  0.3722855  0.3722855  0.4981104  0.2481104  0  0.5  1  0.9 
002  0.4700278  0.3207423  0.4700278  0.4700278  0.3207423  0.0707423  1  0.5  1  0.9 
003  0.7181537  2.0456900  0.7181537  0.7181537  2.0456900  1.7956900  1  0.5  1  0.9 
004  0.7210037  1.4360184  0.7210037  0.7210037  1.4360184  1.6860184  1  0.5  1  0.9 
005  1.3461157  1.6070232  1.3461157  1.3461157  1.6070232  1.3570232  0  0.5  1  0.9 
006  0.2275517  0.3872682  0.2275517  0.2275517  0.3872682  0.1372682  0  0.5  1  0.9 
Next, we need to create the actual values of Y_t1
and Y_t2
that apply with this assignment  for cases where Z
is 0 we use the values of Y_t1_Z_0
and Y_t2_Z_0
, and for cases where Z
is 1 we use the value of Y_t1_Z_1
and Y_t2_Z_1
. The declare_reveal function automatically reveals the potential outcomes corresponding to a given assignment of treatment.
reveal_t1 < declare_reveal(Y_t1)
reveal_t2 < declare_reveal(Y_t2)
df1 < reveal_t1(df1)
df1 < reveal_t2(df1)
kable(head(df1))
ID  u_t1  u_t2  Y_t1_Z_0  Y_t1_Z_1  Y_t2_Z_0  Y_t2_Z_1  Z  Z_cond_prob  R  R_cond_prob  Y_t1  Y_t2 

001  0.3722855  0.4981104  0.3722855  0.3722855  0.4981104  0.2481104  0  0.5  1  0.9  0.3722855  0.4981104 
002  0.4700278  0.3207423  0.4700278  0.4700278  0.3207423  0.0707423  1  0.5  1  0.9  0.4700278  0.0707423 
003  0.7181537  2.0456900  0.7181537  0.7181537  2.0456900  1.7956900  1  0.5  1  0.9  0.7181537  1.7956900 
004  0.7210037  1.4360184  0.7210037  0.7210037  1.4360184  1.6860184  1  0.5  1  0.9  0.7210037  1.6860184 
005  1.3461157  1.6070232  1.3461157  1.3461157  1.6070232  1.3570232  0  0.5  1  0.9  1.3461157  1.6070232 
006  0.2275517  0.3872682  0.2275517  0.2275517  0.3872682  0.1372682  0  0.5  1  0.9  0.2275517  0.3872682 
Now we compute the difference score for each individual, which is the statistic we are using in our analysis. This is computed as the difference between Y
at time 1 and Y
at time 2. The beeswarm plot below shows the scores for all cases in the two groups. Our specified attrition rate will determine the proportion of cases lost to followup, shown in the plot as black dots. The mean effect size of treatment is shown both for the complete sample and the available sample after excluding those lost to attrition. Note this is unlikely to be exactly equal to the average treatment effect (ate
), because the estimate is based on a sample, whereas ate
is defined in terms of the population.
manipulation < declare_step(difference = (Y_t2  Y_t1), handler = fabricate)
df1 < manipulation(df1)
kable(head(df1))
ID  u_t1  u_t2  Y_t1_Z_0  Y_t1_Z_1  Y_t2_Z_0  Y_t2_Z_1  Z  Z_cond_prob  R  R_cond_prob  Y_t1  Y_t2  difference 

001  0.3722855  0.4981104  0.3722855  0.3722855  0.4981104  0.2481104  0  0.5  1  0.9  0.3722855  0.4981104  0.8703959 
002  0.4700278  0.3207423  0.4700278  0.4700278  0.3207423  0.0707423  1  0.5  1  0.9  0.4700278  0.0707423  0.3992855 
003  0.7181537  2.0456900  0.7181537  0.7181537  2.0456900  1.7956900  1  0.5  1  0.9  0.7181537  1.7956900  1.0775362 
004  0.7210037  1.4360184  0.7210037  0.7210037  1.4360184  1.6860184  1  0.5  1  0.9  0.7210037  1.6860184  0.9650147 
005  1.3461157  1.6070232  1.3461157  1.3461157  1.6070232  1.3570232  0  0.5  1  0.9  1.3461157  1.6070232  0.2609075 
006  0.2275517  0.3872682  0.2275517  0.2275517  0.3872682  0.1372682  0  0.5  1  0.9  0.2275517  0.3872682  0.6148200 
mydiff < round(mean(df1$difference[df1$Z==1])mean(df1$difference[df1$Z==0]),3)
mytemp < filter(df1, R==1)
mean1 < mean(mytemp$difference[mytemp$Z==1])
mean0 < mean(mytemp$difference[mytemp$Z==0])
mydiff2 < round((mean1mean0),3)
mytitle < paste0('Average effect without attrition= ', mydiff,'\nAverage effect with attrition=',mydiff2)
beeswarm(df1$difference~df1$Z,xlab='Treatment',pwcol=(df1$R+1),pch=16,main=mytitle,ylab='Memory score difference')
segments(.8, mean0, x1=1.2,y1 = mean0,lty=1,col=1,lwd=2) #add line for mean
segments(1.8, mean1, x1=2.2,y1 = mean1,lty=1,col=1,lwd=2)
text(1.5,1,'Black dots are\n cases lost to \nfollowup\n(R = 0)')
The final stage is the Answer strategy. This specifies the statistical procedure used to evaluate the inquiry, using the data simulated from the model. In the chunk below, we can see the output from the Answer strategy. You can if you wish view the computed variable mylm
, which shows that the output from estimated_effect(df1)
is identical to the result obtained by running the code for the robust regression analysis on df1
, after excluding those with values of R = 0
(i.e., the cases lost to followup).
# A: Answer Strategy
# Use difference score between pretest and posttest
estimated_effect < declare_estimator(
difference ~ Z,
model = lm_robust,
estimand = c("ATE", "DiD"),
subset = R == 1,
label = "Change score"
)
kable(estimated_effect(df1)) # print the results of the regression
estimator_label  term  estimate  std.error  statistic  p.value  conf.low  conf.high  df  outcome  estimand_label 

Change score  Z  0.4070568  0.1904039  2.137859  0.0353006  0.028669  0.7854446  88  difference  ATE 
Change score  Z  0.4070568  0.1904039  2.137859  0.0353006  0.028669  0.7854446  88  difference  DiD 
mylm < lm_robust(formula = difference ~ Z, data = subset(df1,R == 1))
Evaluating the design
We’ve worked through the analysis building up simulated data for one run, but the principal purpose of this exercise is to evaluate the design by repeatedly running the simulation so that the distribution of statistics can be obtained.
To do this, you specify the design by creating a list of all the processes that were involved above, joined by +
.
The diagnose_design()
function then computes various diagnostic statistics for that design. The default number of simulations is 500. This chunk of code takes time to run, with the amount of time dependent on the number of simulations.
We can then see the diagnostic statistics for the design. In this script we’ve assigned the diagnostic information to a variable, diagnosis1
, so we can display it differently, but usually you’d just have this displayed automatically on the console, using the command diagnose_design(pretest_posttest_design)
.
# Design
pretest_posttest_design < population + potential_outcomes + estimand +
assignment + reveal_t1 + reveal_t2 + report + manipulation +
estimated_effect
diagnosis1 < diagnose_design(pretest_posttest_design)
kable(reshape_diagnosis(diagnosis1))
Design Label  Estimand Label  Estimator Label  Term  N Sims  Bias  RMSE  Power  Coverage  Mean Estimate  SD Estimate  Mean Se  Type S Rate  Mean Estimand 

pretest_posttest_design  ATE  Change score  Z  500  0.02  0.22  0.25  0.93  0.27  0.22  0.21  0.01  0.25 
(0.01)  (0.01)  (0.02)  (0.01)  (0.01)  (0.01)  (0.00)  (0.01)  (0.00)  
pretest_posttest_design  DiD  Change score  Z  500  0.02  0.22  0.25  0.93  0.27  0.22  0.21  0.01  0.25 
(0.01)  (0.01)  (0.02)  (0.01)  (0.01)  (0.01)  (0.00)  (0.01)  (0.00) 
With the design in hand, we can also use it to quickly generate data, estimates, and estimands. You can try running the code below multiple times. You’ll get different answers because the experiment is being simulated from start to finish each time.
kable(head(draw_data(pretest_posttest_design)))
ID  u_t1  u_t2  Y_t1_Z_0  Y_t1_Z_1  Y_t2_Z_0  Y_t2_Z_1  Z  Z_cond_prob  Y_t1  Y_t2  R  R_cond_prob  difference 

001  0.3536787  0.8582277  0.3536787  0.3536787  0.8582277  1.1082277  0  0.5  0.3536787  0.8582277  1  0.9  1.2119065 
002  0.4119265  0.5881840  0.4119265  0.4119265  0.5881840  0.3381840  0  0.5  0.4119265  0.5881840  1  0.9  1.0001106 
003  1.1111596  0.2637439  1.1111596  1.1111596  0.2637439  0.0137439  1  0.5  1.1111596  0.0137439  1  0.9  1.0974157 
004  0.1743786  0.2856948  0.1743786  0.1743786  0.2856948  0.0356948  1  0.5  0.1743786  0.0356948  1  0.9  0.1386838 
005  0.9590143  0.7943895  0.9590143  0.9590143  0.7943895  1.0443895  1  0.5  0.9590143  1.0443895  1  0.9  0.0853752 
006  0.7166857  0.1869223  0.7166857  0.7166857  0.1869223  0.0630777  1  0.5  0.7166857  0.0630777  0  0.1  0.6536080 
kable(head(get_estimates(pretest_posttest_design)))
estimator_label  term  estimate  std.error  statistic  p.value  conf.low  conf.high  df  outcome  estimand_label 

Change score  Z  0.495138  0.2046965  2.418889  0.0176274  0.0883468  0.9019292  88  difference  ATE 
Change score  Z  0.495138  0.2046965  2.418889  0.0176274  0.0883468  0.9019292  88  difference  DiD 
kable(head(get_estimands(pretest_posttest_design)))
estimand_label  estimand 

ATE  0.25 
DiD  0.25 
Interpreting diagnostic information
To understand the output, it helps to read the working paper.
I’ve transposed the output and added some explanation in the chunk below.
myoutput<data.frame(t(diagnosis1$diagnosands_df[1,]))
#Add a column for explanation
myoutput$explanation<'.'
colnames(myoutput)[1]<'Estimate'
myoutput$explanation[5]<'Expected difference between estimate and estimand'
myoutput$explanation[7]<'Root mean squared error'
myoutput$explanation[9]<'Probability of rejecting null hypothesis of no effect'
myoutput$explanation[11]<'Probability that conf. interval contains estimand'
myoutput$explanation[19]<'Probability that a significant estimate has incorrect sign'
kable(myoutput)
Estimate  explanation  

design_label  pretest_posttest_design  . 
estimand_label  ATE  . 
estimator_label  Change score  . 
term  Z  . 
bias  0.02072121  Expected difference between estimate and estimand 
se(bias)  0.009920455  . 
rmse  0.2217726  Root mean squared error 
se(rmse)  0.007138123  . 
power  0.254  Probability of rejecting null hypothesis of no effect 
se(power)  0.01851247  . 
coverage  0.932  Probability that conf. interval contains estimand 
se(coverage)  0.01172167  . 
mean_estimate  0.2707212  . 
se(mean_estimate)  0.009920455  . 
sd_estimate  0.2210236  . 
se(sd_estimate)  0.00714128  . 
mean_se  0.2101448  . 
se(mean_se)  0.0006704137  . 
type_s_rate  0.007874016  Probability that a significant estimate has incorrect sign 
se(type_s_rate)  0.007321023  . 
mean_estimand  0.25  . 
se(mean_estimand)  0  . 
n_sims  500  . 
Design library
The goal of this explanation is to give you some understanding of the processes involved in DeclareDesign; this will make it easier for you to modify scripts for specific purposes. In practice, however, for the main types of common design, you can run the same processes with minimal scripting by using the functions in the DesignLibrary package, which simplifies the process of using DeclareDesign.
In effect, you just have to specify the key parameters, and the rest is done for you by the pretest_posttest_designer()
function. For example, if you wanted to see what would happen to power if the attrition rate increased to 20%, you could look at diagnose_design(pretest_posttest_designer(attrition_rate = .20))
. Or you could compare power under three different assumptions about how well your baseline predicts the endline using diagnose_design(expand_design(pretest_posttest_designer,rho = c(.2,.5,.8)))
.
Note that with the pretest_posttest_designer()
function, you get diagnoses for three possible ways of running the analysis. The first is ‘Change score’, which is the same as we have used above  you compare the pretestposttest difference for the two treatment conditions. An alternative approach is to compare the groups on the posttest score, treating the pretest score as a covariate: this is labelled ‘Condition on pretest’. The final approach is to just use posttest scores, ignoring pretest data: this is ‘Posttest only’.
The Design Library is very helpful for understanding strengths and weaknesses of different approaches.
mydes < pretest_posttest_designer(N = 100, ate = 0.25, sd_1 = 1, sd_2 = 1,
rho = 0.5, attrition_rate = 0.1)
mydeslib < diagnose_design(mydes)
kable(data.frame(t(mydeslib$diagnosands_df)))
X1  X2  X3  

design_label  mydes  mydes  mydes 
estimand_label  ATE  ATE  ATE 
estimator_label  Change score  Condition on pretest  Posttest only 
term  Z  Z  Z 
bias  0.005895015  0.001104015  0.004828581 
se(bias)  0.008142163  0.008124608  0.010158817 
rmse  0.2058560  0.1850635  0.2137697 
se(rmse)  0.006757644  0.005390701  0.006427614 
power  0.220  0.262  0.248 
se(power)  0.01881289  0.02170804  0.01898218 
coverage  0.95  0.95  0.93 
se(coverage)  0.009686312  0.009628640  0.011327610 
mean_estimate  0.2558950  0.2511040  0.2451714 
se(mean_estimate)  0.008142163  0.008124608  0.010158817 
sd_estimate  0.2059776  0.1852456  0.2139292 
se(sd_estimate)  0.006735176  0.005396290  0.006391084 
mean_se  0.2122948  0.1844499  0.1997280 
se(mean_se)  0.0007206330  0.0006652485  0.0007557927 
type_s_rate  0  0  0 
se(type_s_rate)  0  0  0 
mean_estimand  0.25  0.25  0.25 
se(mean_estimand)  0  0  0 
n_sims  500  500  500 
Session information
Session information is included for technical reasons: it can help diagnose issues if a script does not run in a given environment.
sessionInfo()
## R version 3.5.1 (20180702)
## Platform: x86_64appledarwin15.6.0 (64bit)
## Running under: macOS High Sierra 10.13.5
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF8/en_US.UTF8/en_US.UTF8/C/en_US.UTF8/en_US.UTF8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] bindrcpp_0.2.2 knitr_1.20 beeswarm_0.2.3
## [4] DesignLibrary_0.1.1 forcats_0.3.0 stringr_1.3.1
## [7] dplyr_0.7.6 purrr_0.2.5 readr_1.1.1
## [10] tidyr_0.8.1 tibble_1.4.2 ggplot2_3.0.0
## [13] tidyverse_1.2.1 DeclareDesign_0.11.0 estimatr_0.11.0
## [16] fabricatr_0.5.0 randomizr_0.16.1
##
## loaded via a namespace (and not attached):
## [1] tidyselect_0.2.4 listenv_0.7.0 haven_1.1.2.9000
## [4] lattice_0.2035 colorspace_1.32 htmltools_0.3.6
## [7] yaml_2.2.0 rlang_0.2.1 pillar_1.3.0
## [10] glue_1.3.0 withr_2.1.2 modelr_0.1.2
## [13] readxl_1.1.0 bindr_0.1.1 plyr_1.8.4
## [16] munsell_0.5.0 gtable_0.2.0 cellranger_1.1.0
## [19] rvest_0.3.2 future_1.9.0 codetools_0.215
## [22] evaluate_0.11 parallel_3.5.1 highr_0.7
## [25] broom_0.5.0 Rcpp_0.12.18 scales_0.5.0
## [28] backports_1.1.2 jsonlite_1.5 hms_0.4.2
## [31] digest_0.6.15 stringi_1.2.4 grid_3.5.1
## [34] rprojroot_1.32 cli_1.0.0 tools_3.5.1
## [37] magrittr_1.5 lazyeval_0.2.1 Formula_1.23
## [40] future.apply_1.0.0 crayon_1.3.4 pkgconfig_2.0.1
## [43] xml2_1.2.0 lubridate_1.7.4 assertthat_0.2.0
## [46] rmarkdown_1.10 httr_1.3.1 rstudioapi_0.7
## [49] globals_0.12.1 R6_2.2.2 nlme_3.1137
## [52] compiler_3.5.1

Note that, generally speaking, nonlinear transformations of
u_t1
andu_t2
might lead to correlations betweenY_t1
andY_t2
that
are not equal torho
in expectation. 
For example, if the probability of assignment was .8, then units
assigned to the control would haveZ_cond_prob = .2
and those
assigned to treatment would haveZ_cond_prob = .8
. This is useful
for designs where you want to weight observations by their
assignment probabilities.