DeclareDesign

Estimating a regression slope, specifying assignment variable


#1

I’m having trouble finding examples on your site where there is no random assignment. This is not a problem in itself, but I’m finding that I’m having trouble adapting the logic to the question I want to ask.

That is to say, I’m adding my predictor of interest under declare_population because that’s where I’d expect to be able to add the complex contingencies I expect to hold (omitted in example below).

However, later declaring the same variable as the “assignment_variable” in declare_assignment leads to an error.

Error in eval(predvars, data, env) : object ‘Y’ not found

Leaving declare_assignment with the default of Z works, but this seems superfluous. Am I misunderstanding something and not using DD the way it’s intended?

Simplified reproducible example

design <-
  # simulate data
  declare_population(
    obs = add_level(N = 100,
                     midcycle = draw_binary(N, prob = 0.2),
                     noise = rnorm(N)
    )
  ) +
  # simulate real relationship
  declare_potential_outcomes(Y ~ 0.5 * midcycle + noise) +
  
  declare_assignment(m = 50, assignment_variable = "midcycle") +
    # simulate how we estimate relationship
  declare_estimator(
    Y ~ midcycle,
    estimand = estimands_regression,
    model = lm,
    term = TRUE
  )

Speciyfing estimand

Then, later I’m surprised I have to declare a custom estimand just to get the slope of a variable in a regression. Can I not just specify the term? This seems like it will get confusing quickly, when I’m looking at a higher-order interaction.

estimands_regression <- declare_estimand(
    `midcycle` = mean(Y_midcycle_1 - Y_midcycle_0),
    term = TRUE,
    label = "Regression_Estimands"
  )

#2

For the first question, you should always think about potential_outcomes, assignment and reveal steps as a group.

  • the PO step will create a response for each level of the assignment factor, eg Y_0, Y_1 or Y_treatment, Y_control, etc.
  • The assignment step will create a variable (eg Z) which can take values 0/1, treatment/control, etc
  • The reveal step creates a single column Y which takes the value Y_1 when Z == 1, Y_Z when Z == 0, etc

There’s a feature, where if the design contains PO and assignment steps that are linked, the reveal step can be inserted automatically. This is getting tripped up for you when the assignment variable in the PO step (assignment_variable = “midcycle”) does not match the one declared in the potential outcome step (eg the default Z).

Since you aren’t using the assingment functionality anyway, you might as well not use the PO step either - you can generate Y directly (eg in your population step) without needing a Y_0, Y_1 because midcycle is already assigned.

Re estimands - these are indeed unwieldly. I would probably do

declare_estimand(midcycle= .5)

to cut down on the boiler plate. See also the DesignLibrary package on github, which has many basic designs like this pre-written - you can just plug in the N=100, p=.2 beta=.5 to the corresponding function and get the design that way.


#3

Yes, I could considerably simplify things by staying in declare_population (which was easiest to understand). But I’d like to learn to think in this framework. This was a highly simplified example with just one level. I expect that I might later use DD to specify a sampling schedule and a randomised missing design for the item level.

Given your explanation (I didn’t know declare_PO also takes an assignment_variables arg), I expected this to work, but it doesn’t.

estimands_regression <- declare_estimand(
    `midcycle` = mean(Y_midcycle_1 - Y_midcycle_0),
    term = TRUE,
    label = "Regression_Estimands"
  )
design <-
  # simulate data
  declare_population(
    obs = add_level(N = 100,
                     midcycle = draw_binary(N, prob = 0.2),
                     noise = rnorm(N)
    )
  ) +
  # simulate real relationship
  declare_potential_outcomes(Y ~ 0.5 * midcycle + noise, assignment_variables = "midcycle") +

  declare_assignment(m = 50, assignment_variable = "midcycle") +
    # simulate how we estimate relationship
  declare_estimator(
    draw_likert(Y) ~ midcycle,
    estimand = estimands_regression,
    model = lm,
    term = TRUE
  ) +
  declare_reveal(outcome_variables = c("Y"), assignment_variables = c("midcycle"))

design

Measurement model for the dependent variable
#4

So, thanks to the response to my other question, I now know that I simply had my steps out of order (declaring reveal last).

I’m still running into trouble with specifying the estimands though. get_estimands simply gives me an empty data.frame without an error. I hope it’s ok that this time I don’t show a less complicated design, because some of the advice I got earlier maybe only made sense for my simplified example. This is still slightly simpler than what we’re planning (smaller sample, no randomised missings, no attrition modelling), but what I have so far.

At this page you can find a non-working example. I have some open questions, but the biggest problem right now, is how I can get the estimand working again. I had it working a while ago, but then I tinkered with the level nesting and and now the estimands are gone.

I’ll copy the code here, but maybe some of my diagnosis attempts at the link are also useful.

round_to_at_least_1 = function(x) { if_else(x < 1, 1, round(1) ) }
library(tidyverse)
library(DeclareDesign)
library(lme4)
estimands_regression <- declare_estimand(
    `midcycle` = mean(Y_midcycle_1 - Y_midcycle_0),
    # `midcycle_mod` = mean(Y_moderator_1_midcycle_1 - Y_moderator_3_midcycle_1),
    term = TRUE,
    label = "Regression_Estimands"
  )
design <-
  # simulate data
  declare_population(
    # we're measuring the outcome with up to ten items
  item = add_level(N = 10, nest = FALSE,
                   item_difficulties = rnorm(N)),
    # add the level of the participant, we simulate 
      # interindidual differences in the intercept
      # interindidual variation in cycle length
      # a between-subjects moderator
      # hormonal contraception
  woman = add_level(N = 40, nest = FALSE,
                    interind_diff = rnorm(N),
                    avg_cycle_length = round(rnorm(N, 28.8, 2.1)),
                    avg_period_length = round(rpois(N, 16) / 4),
                    moderator = rnorm(N, mean = 2),
                    moderator_m = draw_likert(moderator + rnorm(N), type = 7, N = N),
                    hormonal_contraception = draw_binary(N, prob = 0.5)),
      # at the cycle level, we simulate two cycles per woman and allow for cycle-to-cycle
      # variability in length
  cycle = add_level(N = 2, nest = TRUE,
                    cycle_length = round(rnorm(N, avg_cycle_length, 2.9)),
                    period_length = round_to_at_least_1(rnorm(N, avg_period_length, 1.3))
            ),
      # at the day level
  day = add_level(N = cycle_length, nest = TRUE,
                  # ugly code to get sequential day numbers
                  day_nr = ave(1:N, rep(1:length(cycle_length), times = cycle_length), FUN = seq_along),
                  menstruation = if_else(day_nr <= period_length, 1, 0),
                  midcycle = if_else(day_nr <= cycle_length - 14 |
                                       day >= cycle_length - 10, 1, 0),
                  midcycle_m = rnorm(N, midcycle + rnorm(N, sd = 0.5)),
                  daily_noise = rnorm(N)),
  obs = cross_levels(by = join(item, day),
                     noise = rnorm(N))
  ) +
  # simulate real relationship
  declare_potential_outcomes(Y ~ -0.2 * menstruation + 0.2 * moderator + 0.1 * moderator * midcycle * (1 - hormonal_contraception) + item_difficulties + interind_diff + noise,
                             assignment_variables = c("midcycle")) +
  
 declare_reveal(outcome_variables = "Y", assignment_variables = c("midcycle")) +
    # simulate how we estimate relationship
  declare_estimator(
    as.numeric(draw_likert(Y)) ~ midcycle * hormonal_contraception + (1 | woman) + (1 | item) + (1 | woman:day),
    estimand = estimands_regression,
    model = lmer,
    term = TRUE
  )
get_estimands(design)

#5

It looks like you hadn’t included the estimand in the design -

> str(design)
List of 4
 $ population        :design_step:	 declare_population(item = add_level(N = 10, nest = FALSE, item_difficulties = rnorm(N)), woman = add_level(N = 40, nest = FALSE, interind_diff = rnorm(N), avg_cycle_length = round(rnorm(N, 28.8, 2.1)), avg_period_length = round(rpois(N, 16)/4), moderator = rnorm(N, mean = 2), moderator_m = draw_likert(moderator + rnorm(N), type = 7, N = N), hormonal_contraception = draw_binary(N, prob = 0.5)), cycle = add_level(N = 2, nest = TRUE, cycle_length = round(rnorm(N, avg_cycle_length, 2.9)), period_length = round_to_at_least_1(rnorm(N,     avg_period_length, 1.3))), day = add_level(N = cycle_length, nest = TRUE, day_nr = ave(1:N, rep(1:length(cycle_length), times = cycle_length), FUN = seq_along), menstruation = if_else(day_nr <= period_length, 1, 0), midcycle = if_else(day_nr <= cycle_length - 14 | day >= cycle_length - 10, 1, 0), midcycle_m = rnorm(N, midcycle + rnorm(N, sd = 0.5)), daily_noise = rnorm(N)), obs = cross_levels(by = join(item, day), noise = rnorm(N))) 
 $ potential_outcomes:design_step:	 declare_potential_outcomes(Y ~ -0.2 * menstruation + 0.2 * moderator + 0.1 * moderator * midcycle * (1 - hormonal_contraception) + item_difficulties + interind_diff + noise, assignment_variables = c("midcycle")) 
 $ reveal            :design_step:	 declare_reveal(outcome_variables = "Y", assignment_variables = c("midcycle")) 
 $ estimator         :design_step:	 declare_estimator(as.numeric(draw_likert(Y)) ~ midcycle * hormonal_contraception + (1 | woman) + (1 | item) + (1 | woman:day), estimand = estimands_regression, model = lmer, term = TRUE) 
 - attr(*, "call")= language construct_design(steps = steps)
 - attr(*, "class")= chr [1:2] "design" "dd"

Easy to fix, just plus it on:

> get_estimands(design + estimands_regression)
        estimand_label     term     estimand
1 Regression_Estimands midcycle 0.1063525534

#6

Thanks, my bad. I now realise it was this way in the example where I copied it from too.
I guess what confused me is that this (when I had already specified it as an argument to declare_estimator) and the repeated assignment_variable(s) felt redundant.


#7

Also double check your day_nr, they might be misalligned. did you mean ave(day, cycle, FUN = seq_along), ?


#8

Cheers, you’re right. I didn’t know I could use the day variable already there.