DeclareDesign Community

Power calculations in non-standard design

Hey team,

I’ve been racking my brain for the last couple of hours trying to understand how to apply DeclareDesign to my specific situation. I was hoping somebody could at least point me in the right direction.

I’m running an experiment where I randomly pair participants up with one another to play a game. Participants can be Republican or Democrat. By definition, you can be paired either with a member of your in-group (so Republican-Republican, Democrat-Democrat), or a member of your out-group (Republican-Democrat, Democrat-Republican).

Some of the DVs are before and after measures that tracks, for example, openness to the political outgroup, whereas other DVs are behavioral games (so post-measures only).

My base model looks like the following:

model_base <- lm_robust(outcome_of-interest ~ politics*outgroup_pairing, 
                 cluster = teamID,
                 se = "stata",
                 data = data)
summary(model_base)

Where politics represents the politics of a given respondent, and outgroup_pairing is a dummy variable that represents 1 if the participant was paired with a member of the outgroup. In other specifications, I add a vector of controls to increase precision. I’m clustering standard errors at the “team” level to resolve inter-team correlations (because the team plays a game together, they will have identical scores on the game, and this may influence outcomes).

We’ve already run a pilot consisting of 70 DD pairings, 80 “mixed pairings”, and 80 RR pairings, and find that the coefficient on “outgroup_pairing” is large and significant, and the interaction is as well – so Republicans are more affected by an outgroup_pairing than Democrats. We now want to run a pre-registered version, and want to do power calculations to have a more accurate estimate of the sample size needed to detect an effect at least as large as the one detected in the pilot.

I couldn’t find anything in the documentation that sheds light on what to do here as this isn’t exactly a pre-post design with a clear-cut treatment condition. Another wrinkle here is that we will be running multiple games with the same pairing philosophy (in-group vs out_group), and will be comparing the effects in one game with the effects of another game. So we want to ask, for example, does a Democrat paired with a Republican in Game 1 show a larger change on the DV than a Democrat paired with a Republican in Game 2?

I’m not expecting a comprehensive answer, but I was hoping I could at least get some direction as to how to do power analysis in this context.

This sounds like an interesting design - here’s how I might set up a very basic system for simulating:

Start by describing the game (I’ll use rock paper scissors):

game <- function(a, b){
  ifelse(a==b, 0, ifelse( (a == 'R' & b == 'S') 
                          | (a == 'P' & b == 'R' )
                          | (a == 'S' & b == 'P'), -1, 1))
  
}

The population is three parties, D, R, and I, with some people in each party. We model the assumption that people of party I always play R. (You can elide the nesting part for performance if need be)

pop <- declare_population(parties=add_level(N=3, p=c("D", "R", "I"), n=rpois(N,20)),
                          people=add_level(N=n, rps=ifelse(p == 'I', 'R', sample(c("R", "P", "S"),N,TRUE)))

Sample 50% of each party in the population:

samp <- declare_sampling(strata=p)

This is the tricky part - generate 200 potential pairing via the under-appreciated link_level function in fabricate. Because it prevents you from linking a level to itself,
we instead make a clone of people (people2), drop the original ID, and then
proceed with the linking.

pair_off <- declare_step(handler = function(data){
  fabricate(list(people=data),
    people2=add_level(N=1), people2=modify_level(people=NULL),
    rounds=link_levels(N=200, by=join(people,people2))) 
})

You might also want to filter the above to elimate games where a person plays themself. All the variables for player 2 will be given a .1 suffix.

The potential outcome is the result of a game between two people.

p_outcome <- declare_potential_outcomes(Y=game(rps, rps.1))

Form a design:

design <- pop + samp + pair_off + p_outcome

You’d then proceed with estimands / estimators like in more typical designs. For example, if you were interested in how often there’s a tie when different combinations of parties play each other, you would see a difference in between I and everyone else, because I’s always play R in this design.

You can also build in repeated measurements by shimming in time variables along the DGP.

Hope that gets you unblocked.

2 Likes

Thanks a lot, Neal. Somehow I never got a notification about this message. I’ve spent some time delving into your response, and there’s still quite a bit I’m not getting. Perhaps there is a way to simplify before getting caught up in the complexities.

In the actual experiment, I discard independents; so it’s safe to assume we have a population that is split 50/50 Republicans and Democrats (I’ve constructed this to be so).

Second, I don’t think we need a simulation (or data of) the “game.” The outcomes of interest are all individual-level change scores of some measure. And the interesting question is whether being paired with a member of the outgroup to play the game is a significant predictor of those change scores.

As mentioned, nearly all the specifications are of the form:

model_base <- lm_robust(outcome_of-interest ~ politics*outgroup_pairing, 
                 cluster = teamID,
                 se = "stata",
                 data = data)
summary(model_base)

But the main issues I’m struggling with are:

(1) How to account for the “clustered” nature of the standard errors in power analysis. You play this game on a team with a member of either the in-group or the out-group. So every “dyad” in a dataset will share certain game-level variables (e.g. their total score, their total payout), and thus may have correlated affective outcome measures as well.

(2) Once accounting for the clustered standard errors, what does a basic power analysis look like? Specifically, I can’t figure out what the appropriate “Data Strategy” and “Answer Strategy” look like.

Thanks again!

EDIT: To be clear, I’ve already run a pilot of this study, so I’d like to be able to include the estimates from this pilot into my power calculations. What is the best way to do so?

These are tricky, but I’ll take a guess at them:

  1. So assumming something like

     delta(Personal  Opinion) = alpha + Politics_ego + Politics_alter + Politics_ego*Politics_alter
    

    and testing if there is a difference in parties for a greedy strategy, for example, or maybe testing if there’s different directions in interaction pairs? Where the response variable is separate from the actual game played?

If you have enough replicates, I would probably recommend clustering on the team_ego:team_alter instead of just team_ego.

Here’s some thoughts:

  • I might try controlling for Team effects explicitly or otherwise reparameterizing the model, perhaps using an offset, so that it appears on the RHS explicitly somehow, instead of getting absorbed it into the error term.
  • Another good option would be trying a hierarchical model instead of clustered SEs. Team_ego could be a random intercept, for example as could Team_alter. lme4::simulate makes post-hoc power calculations super easy.
  1. You will need to be careful about specifying what N means in your analysis - is it number of players? Number of players on each team? Number of games played? Your power curve also picks up an extra dimension and the surface will depend on the within- and between- cluster variances. But otherwise, it has the same flavor as https://declaredesign.org/library/articles/block_cluster_two_arm.html where the blocks are party and the treatment is reverse coded for one block.

Hey Neal,

Thank you SO much for all your help. I have been trying to conduct appropriate power analysis for the last couple of months, and am at a total loss for what to do (as our my colleagues), so please know that your help is extremely appreciated. Let me try to me crystal clear just one more time about what the design set up looks like (apologies for what will be a very long post).


I have teams of individuals (who are either Republicans or Democrats) play a game. For the purpose of the main specification, the content of the game is irrelevant. For all participants, I ask them both before and after the game their feelings towards the political outgroup. The main dependent variable of interest is a change score on these before and after measures, let’s call it feelings_change. Because individuals can either be paired (randomly) with a member their political INGROUP or a member of their political OUTGROUP, you can think of there being four types of dyads:

Democrat-Democrat (DD)
Republican-Republican (RR)
Republican-Democrat (RD) and
Democrat-Republican (DR)

All analyses are at the individual level, so that ordering just means that the individual of party 1 is paired with an individual from party 2. As mentioned above, we run the following specification:

lm_robust(feelings_change ~ politics* outgroup_pairing, 
                 cluster = team_id,
                 se = "stata",
                 data = data_clean)

Where politics is the political identification of the participant (either R or D) and outgroup_pairing is a binary variable that is 1 if the participant was paired with somebody from their political outgroup (so Democrat if it’s a Republican, and vice versa). Here are the coefficients from that regression:

                                      Estimate Std. Error    t value     Pr(>|t|)   CI Lower  CI Upper  DF
(Intercept)                         -0.8254286   1.168932 -0.7061389 0.4817274414  -3.144276  1.493419 101
politicsRepublican                   3.1265880   2.091738  1.4947323 0.1381022219  -1.022857  7.276033 101
outgroup_pairing                    11.6638661   3.330378  3.5022650 0.0006886152   5.057292 18.270440 101
politicsRepublican:outgroup_pairing -5.0025255   4.645316 -1.0768968 0.2840919026 -14.217582  4.212531 101

We ran out first pilot with 80 participants in each “cell”, like so:

                    0  1
  Republican       80 80
  Democrat         80 80

Where a 0 indicates that they were paired with a member of their ingroup, and a 1 indicates they were paired with a member of their outgroup. So I believe the appropriate way to conceptualize “N” is the number in each cell. I want to be able to answer questions about the size of the effect we can expect to detect, given certain quantities in each cell.


That being said, the specification you listed involving politics_ego and politics_alter is not strictly true, because we have a dummy variable that soaks up the “treatment effect” that we care about (being paired with a member of the outgroup. We then run contrasts like so:

library(emmeans)
rg = qdrg(object = fit_outgroup_warmth, data = data_epi)
outgroup_means = emmeans(rg, ~ outgroup_pairing | politics)
contrast(outgroup_means, method = "pairwise")

Which yields the following kind of output:

politics = Democrat:
 contrast estimate   SE  df t.ratio p.value
 0 - 1      -11.66 3.33 199 -3.502  0.0006 

politics = Republican:
 contrast estimate   SE  df t.ratio p.value
 0 - 1       -6.66 3.27 199 -2.040  0.0427 ```

So we are essentially comparing the outcome measure from a Democrat paired with a Republican with the outcome of a Democrat paired with a Democrat. (And the same for Republican).

Now, to get to your comments:

  1. What do you mean by “controlling for team effects explicitly?” Adding a fixed effect for team? What is your intuition for why that would be necessary? What do you mean by “an offset?”

  2. I have tried fitting hierarchical models, but often get convergence issues, so I had to switch to clustered standard errors for stability sake. Whenever I could run mixed effect models, the results were identical to clustered standard errors. Is there any reason in this case to use one model over the other?

  3. The “N” in this analysis should be the number of participants in each “cell” (see above). In the pilot I had 80 in each cell, for a total of 240. Does this sound right to you?

  4. Is there any chance you can go into a little bit more detail on this point: “But otherwise, it has the same flavor as https://declaredesign.org/library/articles/block_cluster_two_arm.html where the blocks are party and the treatment is reverse coded for one block.” What do you mean “treatment is reverse coded for one block?” I’ve taken a look at that template before, but it’s pretty opaque. Is ‘party’ is the block, what is the cluster? It would help me tremendously if you could sketch up a very rough idea of what this looks like in DeclareDesign.

Thanks so much again for all your help.

I’m still not exactly sure what your game is - it doesn’t sound like teams play each other (like soccer?). Is it adversarial? A prisoner’s dilemma? I would expect different types of games to have very different potential outcomes.

But it sounds like you are ignoring the game and just want to group people randomly?

In football, for example, you might control for offense and defense separately using dummy coded cross effects . In prisoner’s dilemma, you might control for the result of the previous game.

If you get the same results, that’s good - if you saw very different results it would probably point towards bad point estimates coming out of lm.

80*4 = 320 ?

Assumming these are two player games, you had 40 DD games, 80 DR games, and 40 RR games => 160 games?

I meant the “treatment” of being paired with a R is coded as opposite_party=1 for Ds (one block) and opposite_party=0 for Rs.

Anyways, my current best guess of what you did is something like this:

pop <- declare_population(N=320, party=rep(c('D','R'), each=N/2), 
                          skill=rnorm(N), 
                          opinion=rnorm(N))

You had 320 people, half D, half R, each with a skill and prior opinion score.

assn_team <- declare_assignment(handler=function(data) {
  N <- nrow(data)
  D_teams <- c(paste("D", rep(1:(N/8), times=2)), paste("X", 1:(N/4)))
  R_teams <- c(paste("R", rep(1:(N/8), times=2)), paste("X", 1:(N/4)))

  data$team <- NA
  data$team[data$party == 'D'] <- D_teams
  data$team[data$party == 'R'] <- R_teams
  
  data$Z <- +grepl("X", data$team)
  data
})

You formed 40 DD teams, 40 RR teams, and 80 DR teams (tagged as X in this example), and randomly assigned each subject to teams. We’ll do this through a custom assignment handler, because randomizr is not built to have 160 arms in an assignment function. You might need to adjust these numbers.

Then create a Z dummy variable that is on when the pair is opposite party.

pos <- declare_potential_outcomes(Y~ave(skill, team) - opinion + 3*Z)

Your final measured outcome is the difference between some function of the teams skill at the game, the individual’s prior opinion, and whether the team was an outgroup party pair (Z) or not. You probably will want to throw in main effects for party as well.

If the above is similar to what you did, I’m not too surprised you had convergence problems with a hierarchical model - I’ve had problems with paired data before, but usually only with binary response variables.

Neal,

Thank you so much for all your help on this. I think I’m finally wrapping my head around DeclareDesign thanks to your code. To answer your questions:

Sorry – I now understand the confusion. You and your partner play a collaborative quiz game where you answer questions together (you must come to a consensus about the correct answer). As such, it’s not adversarial. There are, however, three “phases” of questions. Your comments later lead me to believe that I might want to control for performance in each phase of the game, which I was not doing (I was controlling for an overall point total).

Yes, sorry. Basic arithmetic error. Your assumption about the quantity in each cell is correct.

Now, onto your DeclareDesign code.

Can you explain this component a bit more? Why are you subtracting out “opinion”, and why are you multiplying the treatment binary by 3?

Based on your code, I have edited the simulation to incorporate data from my pilot. However, I’m not sure if I’m approaching this the right way, and would appreciate any feedback.

First, I define a list of variables based upon results from the pilot test:

ate <- 9.16 # size of coefficient on outgroup from from lm_robust(feelings_change ~ outgroup_pairing, data = data, team = team_id)

# Mean and Standard Deviation for the "pre" feelings measure for control condition
mean_pre_control = 19.6 
sd_pre_control = 22.5

# Mean and Standard Deviation for the "post" feelings measure for control condition
mean_post_control =  20.3 
sd_post_control = 23.0

# Mean and Standard Deviation for the "pre" feelings measure for treatment condition
mean_pre_treatment = 13.9
sd_pre_treatment = 18.6

# Mean and Standard Deviation for the "post" feelings measure for treatment condition
mean_post_treatment = 23.8
sd_post_treatment = 24.8

mean = 0.727 # mean of change in feelings for people NOT in treatment
sd = 13.2 # actual sd

I now generate the population. I’m interested in knowing the power of my specification if I were to get 400 participants (100 in each cell: 100 DD, 100 RR, 100 RD, and 100 DR). Notice that I draw from a truncated normal because the outcome variable is bounded from 0-100. Is this the right approach?

pop <- declare_population(N=400, 
                          party=rep(c('D','R'), each=N/2), 
                          
                          # generate draws from a truncated normal since these are "feelings thermometers"
                          # bounded between 0 and 100  
                          feelings_toward_outgroup_pre = msm::rtnorm(N, 
                                                     mean = mean_pre_control, 
                                                     sd = sd_pre_control,
                                                     lower = 0, upper = 100),
                          feelings_toward_outgroup_post = msm::rtnorm(N, 
                                                                mean = mean_post_control, 
                                                                sd = sd_post_control,
                                                                lower = 0, upper = 100)
                         )

I use your assignment function as is, because it’s perfect!

assn_team <- declare_assignment(handler=function(data) {
  N <- nrow(data)
  D_teams <- c(paste("D", rep(1:(N/8), times=2)), paste("X", 1:(N/4)))
  R_teams <- c(paste("R", rep(1:(N/8), times=2)), paste("X", 1:(N/4)))
  
  data$team <- NA
  data$team[data$party == 'D'] <- D_teams
  data$team[data$party == 'R'] <- R_teams
  
  data$Z <- +grepl("X", data$team)
  data$partner_party =  with(data, ave(party, team, FUN = rev))
  
  data
})

I declare potential outcomes as simply the control change score, and the change score + the average treatment effect. I am defining the average treatment effect as the coefficient on outgroup_pairing in a simple regression of change_score ~ outgroup_pairing.

potential_outcomes <- declare_potential_outcomes(Y_Z_0 = feelings_toward_outgroup_post - feelings_toward_outgroup_pre, 
                                                 Y_Z_1 = (feelings_toward_outgroup_post - feelings_toward_outgroup_pre) + ate)
estimand <- declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0)
reveal <- declare_reveal(Y, Z)
estimator <- declare_estimator(Y ~ Z, estimand = estimand, 
                                   model = lm_robust, clusters = team)

design <- pop + assn_team + potential_outcomes+ estimand  + reveal + estimator
diagnosis <- diagnose_design(design)

This yields the following output:

                 Design Label Estimand Label Estimator Label Term N Sims   Bias   RMSE  Power Coverage Mean Estimate SD Estimate Mean Se Type S Rate Mean Estimand
 block_cluster_two_arm_design            ATE       estimator    Z    500   0.17   2.59   0.94     0.93          9.33        2.59    2.47        0.00          9.16
                                                                         (0.12) (0.08) (0.01)   (0.01)        (0.12)      (0.08)  (0.01)      (0.00)        (0.00)

Which seems great! I have sufficient power to detect a main effect of outgroup_pairing.. However, let’s say I want to run the same calculation with an interaction effect (so I can see if Republicans have a differential effect when paired with the outgroup than Democrats). So I do the following:

My intuition is that I change the following the estimator to the following, and re-run:

estimator <- declare_estimator(Y ~ party*Z, estimand = estimand, 
                               model = lm_robust, clusters = team)

However, this leads to a remarkable loss in power:

 Design Label Estimand Label Estimator Label   Term N Sims   Bias   RMSE  Power Coverage Mean Estimate SD Estimate Mean Se Type S Rate Mean Estimand
       design            ATE       estimator partyR    500  -9.22   9.90   0.06     0.26         -0.06        3.61    3.47        0.47          9.16
                                                           (0.16) (0.16) (0.01)   (0.02)        (0.16)      (0.11)  (0.01)      (0.10)        (0.00)

Am I doing something wrong, or this the expected decline in power? Is it necessary to also change the potential outcome to account for the interaction?

Thanks again so much for your help. You’ve been incredible!

This was just a lazy example of POs with a post, pre, and Z+coefficient. You are on the right track.

I would be careful of plugging in mean(data$pre_control)from your actual data into msm::rtnorm - double check if it uses the truncated mean or the latent mean. The truncation can have ripples through the entire rest of the design.

You probably want to be careful about boundary effects here also - would the treatment affect feelings_toward_outgroup_post before truncation?
Is the truncation bias bad enough to need to use a tobit instead of lm_robust? (you could actually check this in DeclareDesign with a bit more bookkeeping)

Here, the “term” of the estimator has changed from “Z” to “partyR” - since you didn’t have a main effect for party in your data generating process, the “power” drops to the significance level, .05. You can also see the Bias is large because it is joining the partyR coefficient with the ATE estimand.

If you set the term on the estimator to Z, it should look more reasonable - but be careful if you update the PO to include an interaction as well, because in Y~party*Z Z is the main effect only whereas ATE=mean(Y_Z_1 - Y_Z_0) would absorb the interaction term as well.

1 Like