DeclareDesign Community

Power for a continuous interaction

This document shows one way to do simulations to do power for a continuous interaction (inspired by this tweet https://twitter.com/NathanKalmoe/status/1063839116742283264) Here we’re simulating an average treatment effect of 0.2 SDs and an interaction of 0.05SDs. That means people who score “3” on our covariate have an effect of 0.35 SDs and those who score “-3” have an effect of 0.05 SDs.We do the simulations in DeclareDesign. The power for the interaction term’s not great at small sample sizes, even though this is a pretty hefty interaction effect.

Load packages

library(DeclareDesign)
library(tidyverse)

Declare your design

design <-
  declare_population(
    N = N,
    pid_7 = sample(-3:3, size = N, replace = TRUE), 
    noise = rnorm(N)
  ) +
  declare_potential_outcomes(Y ~ 0.2 * Z + 0.2 * pid_7 + 0.05 * pid_7 * Z + noise) +
  declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0), interaction = 0.05) +
  declare_assignment(prob = 0.5) +
  declare_reveal(Y, Z) +
  declare_estimator(
    Y ~ Z + pid_7 + Z * pid_7,
    model = lm_robust,
    term = c("Z", "Z:pid_7"),
    estimand = c("ATE", "interaction")
  )

Simulate at varying sample sizes

designs <- redesign(design, N = c(500, 1000, 3000, 5000))
simulations <- simulate_design(designs, sims = 1000)

Summarize simulations

summary_df <-
  simulations %>%
  group_by(N, estimand_label) %>%
  summarize(power = mean(p.value <= 0.05),
            mean_estimate = mean(estimate))  

Power plot

ggplot(summary_df, aes(N, power, group = estimand_label, color = estimand_label)) +
  geom_point() +
  geom_line() +
  geom_hline(yintercept = 0.8) +
  theme_bw()

image

2 Likes

I am currently running a power analysis for an interaction between a treatment condition and a continuous variable. I used this example as a template for my analysis. Since I have existing data, instead of creating a new variable (pid_7 here), I used a variable from my data. I rescaled it to -3:3 so that the values of interaction used above mean the same. I get a flat curve instead of one that looks like you got. I am unable to figure out if I am doing something wrong or if this is something about my data.

Rplot

Hi! My guess is that there’s a small bug in your code. I don’t think there could be anything wrong with your data, since you build in an interaction, then as the sample size grows, you should have better power to detect it.

Did you modify both the declare_potential_outcomes and the declare_estimator calls to use that variable from your data?

Hi! I modified the variable in both. So basically where it says pid_7, I replaced it with the variable from my data. Everything else is pretty much the same.

ok darn that was my best guess without seeing your code – can you post a MWE?

Here’s the code I used. The variable I am interested in is X1 which ranges from -3 to 3 and interacts with the treatment condition Z. This is part of my existing dataset (named data below). The rest looks similar to the code above.

design <-
declare_population(
data,
noise = rnorm(N)
) +
declare_potential_outcomes(Y ~ 0.2 * Z + 0.2 * X1 + 0.05 * X1 * Z + noise) +
declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0), interaction = 0.05) +
declare_assignment(prob = 0.5) +
declare_reveal(Y, Z) +
declare_estimator(
Y ~ Z + X1 + Z * X1,
model = lm_robust,
term = c(“Z”, “Z:X1”),
estimand = c(“ATE”, “interaction”)
)

designs <- redesign(design, N = c(500, 1000, 3000, 5000))
simulations <- simulate_design(designs, sims = 1000)

summary_df <-
simulations %>%
group_by(N, estimand_label) %>%
summarize(power = mean(p.value <= 0.05),
mean_estimate = mean(estimate))

ggplot(summary_df, aes(N, power, group = estimand_label, color = estimand_label)) +
geom_point() +
geom_line() +
geom_hline(yintercept = 0.8) +
ylim(0,1) +
theme_bw()

Ok! Apologies for the delay, I had to track down a feature I’d forgotten about! The issue is that your code was not resampling from your data at size N. Here’s some code that will do that.

rm(list = ls())

library(DeclareDesign)
library(tidyverse)


data <- tibble(X1 = sample(-3:3, 100, replace = TRUE))

N <- 200

design <-
  declare_population(
    handler = resample_data,
    N = N,
    data
  ) +
  declare_step(noise = rnorm(N), handler = fabricate) +
  declare_potential_outcomes(Y ~ 0.2 * Z + 0.2 * X1 + 0.05 * X1 * Z + noise) +
  declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0), interaction = 0.05) +
  declare_assignment(prob = 0.5) +
  declare_reveal(Y, Z) +
  declare_estimator(
    Y ~ Z + X1 + Z * X1,
    model = lm_robust,
    term = c("Z", "Z:X1"),
    estimand = c("ATE", "interaction")
  )
designs <- redesign(design, N = c(500, 1000, 3000, 5000))
simulations <- simulate_design(designs, sims = 100)

summary_df <-
  simulations %>%
  group_by(N, estimand_label) %>%
  summarize(power = mean(p.value <= 0.05),
            mean_estimate = mean(estimate))

ggplot(summary_df, aes(N, power, group = estimand_label, color = estimand_label)) +
  geom_point() +
  geom_line() +
  geom_hline(yintercept = 0.8) +
  ylim(0,1) +
  theme_bw()

Thanks for clarifying the issue and changing the code. It works now!