Power for a design where a treatment alters an effect on continuous variables

The question (first part of this: https://twitter.com/annemscheel/status/1062333771716927488):

  • X affects Y1 (both X and Y1 continuous)
  • Z (“feedback”) is then applied randomly (dichotomous)
  • X then affects Y2, but its effect depends on Z
  • What is the power for assessing the effect of Z on the effect of X on Y?

This has a diff-in-diff feel to it so maybe worth looking at the DesignLibrary::pretest_posttest_designer and at Dorothy Bishop’s post. Gentle introduction to DeclareDesign (by Dorothy Bishop)

Lots of interesting features to this and many ways you could take it. One tricky feature is how to think about the average effect of X on Y given that X is continuous. Let’s assume you want something like an “average average causal effect” which averages over units the average marginal effect of X on Y, over the range of X. This is a scalar even if the relationship between X and Y is very nonlinear.

Here is a design that simulates a wide dataset and then at the estimation stage stacks and runs an interacted model to get a coefficient for the effect of Z on the effect of X on Y. Arguments you can change easily include the N, the baseline effect (assumed to differ across units), a shift in the effect, and a function relating X and Z to Y. In the example below the effect of Z depends on the unit specific effect of X on Y1, but you could take this many different ways. In this design we assume that X is randomly assigned, though you could add in confounding easily enough by introducing a correlation between X and b.

There is a custom function to stack and estimate but otherwise the design declaration is quite straightforward.

N <- 100
b <- 1
d <- 1
f <- function(X, u, Z, b, d)  (Z*d + 1)*b*X + u  # unit/period shock, effect augmented if Z=1

# Custom estimator stacks data and runs model with an interaction between X and Z
my_estimator <- function(data){
  with(data, {
    df <- data.frame(YL = c(Y1, Y2), ZL = c(Z0, Z), XL = c(X,X))
    tidy(lm_robust(YL ~ XL*ZL, data = df))[4,]

# The design
design <- 
  declare_population(N, u1 = rnorm(N), u2 = rnorm(N), X = runif(N), b0 = runif(N)) +
  declare_estimand(shift = mean((f(1, u2, 1, b, d) - f(0, u2, 1, b, d)) - 
                                (f(1, u1, 0, b, d) - f(0, u1, 0, b, d))))  +
  declare_assignment(prob = .5)  +
  declare_reveal(Z0 = 0,  Y1 = f(X, u1, 0, b, d), Y2 = f(X, u2, Z, b, d), handler = fabricate) + 
  declare_estimator(handler = tidy_estimator(my_estimator),  estimand = "shift")

You can get power and other diagnosands like this:

> diagnose_design(design, sims = 2000)

Research design diagnosis based on 2000 simulations. Diagnosand estimates with bootstrapped standard errors in parentheses (100 replicates).

 Design Label Estimand Label Estimator Label  Term N Sims   Bias   RMSE  Power Coverage Mean Estimate SD Estimate Mean Se
       design          shift       estimator XL:ZL   2000  -0.00   0.59   0.42     0.93          1.00        0.59    0.57
                                                          (0.01) (0.01) (0.01)   (0.01)        (0.01)      (0.01)  (0.00)
 Type S Rate Mean Estimand
        0.00          1.00
      (0.00)        (0.00)

You can get sample data from the design like this:

df <- draw_data(design)
kable(head(df), digits = 2)

Or do estimation like this: