Here is a simple within subjects design.
The idea is that you care about the effect of Z on Y in time 1
- Here Z is randomly assigned, though that’s not always the case in a within subjects design.
- You apply Z and get to measure Y1 = Y(Z)
- You could just analyze the results as a between subjects design but you decide to apply the opposite condition (1-Z) in time 2, and you get Y2.
- You now want to use Y2 as a measure of the unobserved time 1 potential outcome Y(1-Z) and do a within subject analysis.
- We allow for the possibility however that responses in time 2 are not Y(1-Z) but rather a mixture of Y(1-Z) and the response they just gave, Y(Z). So, consistency bias.
N = 100 # study size
b = .5 # A parameter to model the extent to which the second response depends on the first response
within_subjects <-
declare_population(N = N, u = rnorm(N)) +
declare_potential_outcomes(Y1 ~ Z + u) +
declare_assignment() +
declare_reveal(Y1) +
declare_potential_outcomes(Y2 ~ b*(1-Z) + (1-b)*Y1 + u) + # Y2 is a combination of the outcome in condition 1-Z and outcome in Y(Z)
declare_estimand(ate = mean(Y1_Z_1 - Y1_Z_0)) + # we are interested in the effect of Z on Y1
declare_reveal(Y2) +
declare_estimator(Y1~Z, label = "between") +
declare_estimator(c(Y1, Y2) ~ c(Z, 1-Z), model = lm_robust, cluster = c(ID, ID), label = "within")
Diagnosis:
> diagnose_design(within_subjects)
Research design diagnosis based on 500 simulations. Diagnosand estimates with bootstrapped standard errors in parentheses (100 replicates).
Design Label Estimand Label Estimator Label Term N Sims Bias RMSE Power Coverage Mean Estimate SD Estimate Mean Se Type S Rate Mean Estimand
within_subjects ate between Z 500 0.00 0.20 1.00 0.94 1.00 0.20 0.20 0.00 1.00
(0.01) (0.01) (0.00) (0.01) (0.01) (0.01) (0.00) (0.00) (0.00)
within_subjects ate within c(Z, 1 - Z) 500 -0.50 0.50 1.00 0.00 0.50 0.05 0.05 0.00 1.00
(0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00)
The interesting bit of this is how you model Y2 ~ …