DeclareDesign, dplyr, & blocking on multiple vars in an elegant way

I want to try using DeclareDesign for blocking on multiple variables. One way to do this as shown here.

Say my population is 100 people, and I want to block on eye_color and gender:

p <- declare_population(N = 100,
                   gender = rbinom(N, 1, prob = .5),
                   eye_color = sample(c("blue", "green", "brown"), 
                                      prob = c(.2, .2, .6)))

One way to block of both vars at the same time is to create a single variable that encapsulates both vars.

My intuition as a dplyr user is to do something like:

population <- p() %>% unite(block, gender, eye_color, remove = FALSE)

but I realize that this doesn’t follow the DeclareDesign framework. A more DeclareDesign approach would be to do something like:

d <- declare_step(block = paste(gender, eye_color, sep = "_"), 
                  handler = fabricate)

From here on delcare_assignment is straight forward…

However I also realize that from Jasper’s other answer that blockTools might be a superior solution to take this on, but from my cursory understanding I am not sure how it plays into the DeclareDesign framework.

My questions are:

  1. Is it bad to tackle this question the dplyr way or is there some reason should I really be using declare_step?

  2. Is the method of creating one variable eventually going to limit my designs, and should I be using blockTools instead? If so, how would I use blockTools together with DeclareDesign?

I think your approach is fine, but I’d recommend using interaction instead of pasting text together - it plays nicer with factors. You can also just include it as a final variable in the population step

Ideally you would do something like:

assn <- declare_assignment(blocks=interaction(gender,eye_color), prob=.5)

but we have some overly pedantic error checking on that kind of assignment right now.

Probably you would implement a custom assignment handler for declare_assignment - there’s lot’s of examples of custom handlers for estimators, and assignment is even easier because there’s less bookkeeping involved. If you do get it working, please consider sharing it.

1 Like

blockTools creates optimal blocks by minimizing within-block mahalanobis distance on the variables you give it to block on. Here’s what it might look like to include using declare_step:

  N = 100,
  gender = rbinom(N, 1, prob = .5),
  eye_color = sample(
    c("blue", "green", "brown"),
    replace = TRUE,
    prob = c(.2, .2, .6)
  blue_eyes = eye_color == "blue",
  green_eyes = eye_color == "green"
) + declare_step(
  handler = function(data) {
    data$blocks <- createBlockIDs(
      obj = block(
        data = data, = 2,
        id.vars = "ID",
        block.vars = c("gender", "blue_eyes", "green_eyes")
      data = data,
      id.var = "ID"
) + 
  declare_assignment(blocks = blocks, prob = .5)
1 Like