How can I do an a priori power analysis to calculate the sample size for a study with DeclareDesign based on cohen’s d? By now most social science journals request this.

For example, if I have the effect size as cohen’s d (e.g. =.4) from a meta-analysis. And one wants to run a two arm experiment with a covariate (e.g. a covariate that has not been tested with the treatment before). How can I transfer cohen’s d into the parameters needed by DeclareDesign (e.g. Mean scores and ATE)?

# A Priori (Power)Sample Size Analysis with cohen's d

Cohen’s `d`

is unitless, so multiply it by the standard deviation to get the treatment effect (with units). Just remember to be careful with units.

but usually I don’t know/have an SD, neither the mean. If I would run several simulations with a range of SDs and means I would end up with a power distribution. In the end journals want to see one value for N that needs to be sampled max given a certain effect size and power of .8 without assuming any SD or means.

It sounds like you just want a classical power analysis - all you need is a t-table, there’s no need to use DeclareDesign to numerically approximate it.

I think my view differs somewhat from Neals.

Imagine that your potential outcomes function were something like this

`declare_potential_outcomes(Y ~ tau*Z + rnorm(N, 0, 1))`

Here the control group’s outcome is normal distributed with mean zero and variance 1 (which is close to what happens when you standardize a variable. The treatment group’s outcome is also distributed standard normal, but with a higher mean.

If you use declare design, then you can *ALSO* build in other design features like blocking, clustering, or covariate adjustment. Classical power analysis doesn’t let you do that.

Thanks for your question!

This example is solved perfectly well by `power.t.test()`

, though. Do you have a case of actually doing a covariate adjustment (w/o means and SDs assumed, as @colonus above asked)? I suppose you would have a *d* for both the treatment variable and the covariate, or maybe a correlation if the covariate were continuous? Not sure.

thank you both for taking this issue on. Yes, I would assume that with the covariate I would get a higher power and would need a smaller sample size and therefore power.t.test() does not suffice. I do have a d-score for the treatment effect and the correlation of the covariate with the outcome.

To the best of my understanding of your question, I think you could use the two arm with covariate design from the design library. I’ve adapted it a little to match your parameters, *d* and *r*:

```
library(DeclareDesign)
d = .8
r = .2
N = 20
a_design <- declare_population(N=N, X=rnorm(N)) +
declare_estimand(d=d) +
declare_potential_outcomes(Y_Z_0 = rnorm(n=length(X), mean=r*X, sd=sqrt(1-r^2)), Y_Z_1=Y_Z_0 + d) +
declare_assignment( ) +
declare_reveal() +
declare_estimator(Y~Z , model=lm, label="no covariate", estimand='d') +
declare_estimator(Y~Z+X, model=lm, label="with covariate", estimand='d')
diagnose_design(a_design)
power <- diagnose_design(redesign(a_design, N=as.list(1:10*10)))
```

However, we can compare your results to the t table:

```
# Must divide N by two, it expects *per group*
t_power <- transform(data.frame(N=10:100),
no_cov = power.t.test(n=N/2, delta = d, sd = 1, type='two.sample')$power,
with_cov = power.t.test(n=N/2, delta = d, sd = sqrt(1 - r^2), type='two.sample')$power)
ggplot() +
geom_line(aes(x=N, y=power, col=estimator_label), data=power$diagnosands_df) +
geom_line(aes(x=N, y=no_cov), data=t_power, linetype="dashed", color='pink') +
geom_line(aes(x=N, y=with_cov), data=t_power, linetype="dashed", color='cyan')
```

Which yields this:

EDIT:

- I would tend to trust the t-table in this case, at least to the extent that the assumptions of Gaussianity are reasonable.
- You might need to add 1 to the N for the t-table above, it doesn’t know you are fitting a three parameter model.
- On the other hand, the
`lm`

model as above is using pooled variance, and the t-test can use unequal variance, so maybe the t-table is more conservative. - If getting samples is easy / cheap, just get 100 and there’ll be plenty of power.

Final thought: as @Alex_Coppock said above, if you wanted to mess around with the assignment (eg not do a 50/50 split) DeclareDesign will make that easier to diagnose - on the other hand, I would expect that design to perform worse / be less data-efficient than a 50/50 split. DeclareDesign also removes the burden of counting degrees of freedom.

thank you so much! This helps a lot even though I am surprised that adding the covariate does not noticably increase power