hi JH,
Typically, declare_population()
helps determine the data-generating process that takes place prior to any researcher intervention like random assignment of a treatment. add_level
is used to give you more control over making your data hierarchical, and really for that purpose only. So, both of these give you a dataset with four groups and forty individuals
declare_population(groups = add_level(N = 4, letter = LETTERS[1:4]), individuals = add_level(N = 10, noise = rnorm(N))
declare_population(N = 40, letter = sample(LETTERS[1:4],40,TRUE), noise = rnorm(N))
but the second version, as you point out, won’t have precisely controlled groups sizes.
This is separate from the question of blocked random assignment. When you say declare_assignment(blocks = female)
you’re telling DD to randomize once among men and once among women. Not every level of groups added with add_level()
is a block, and not every blocking scheme conditions on variables added to your data using add_level
. So, the following design is completely permissable:
declare_population(N = 10, female = rbinom(N, 1, .5)) + declare_assignment(blocks = female)
.
My impression is that the advantage of add_level
allows me to:
- precisely decide the size of my blocks
Yes. But you could also precisely determine block sizes without add_level() (e.g. declare_population(N = 10, blocks = rep(c(1,2),c(5,5)) + declare_assignment(blocks = blocks)
. And also it’s generally helpful for determining the size of groups in general, not just ones you plan to block on.
- block on multiple vars
I don’t think this is an advantage of add_level
– this sounds like a job for some other function or package, such as blockTools
, which forms blocks for the random assignment based on multiple variables. add_level
won’t do this – it just adds a hierarchical level to your data (e.g. students within classes within schools).
- Define certain block level characteristics e.g.,
u_b = rnorm(N) * sd_block
Yes! Block or group level characteristics. This is probably the most helpful thing about add_level
. If you want to say that students in class j
get a common shock because they share the same teacher, add_level
makes parameterizing this so much easier.
Is this all correct? Am I missing something? Does it ever make sense to just do something like declare_assignment(blocks = female)
?
Yes, when you want to ensure that equal numbers of men and women are assigned to treatment – which, as I hope is clear from the above, is a separate question from that of adding hierarchy to the data.