declare_population() helps determine the data-generating process that takes place prior to any researcher intervention like random assignment of a treatment.
add_level is used to give you more control over making your data hierarchical, and really for that purpose only. So, both of these give you a dataset with four groups and forty individuals
declare_population(groups = add_level(N = 4, letter = LETTERS[1:4]), individuals = add_level(N = 10, noise = rnorm(N))
declare_population(N = 40, letter = sample(LETTERS[1:4],40,TRUE), noise = rnorm(N))
but the second version, as you point out, won’t have precisely controlled groups sizes.
This is separate from the question of blocked random assignment. When you say
declare_assignment(blocks = female) you’re telling DD to randomize once among men and once among women. Not every level of groups added with
add_level() is a block, and not every blocking scheme conditions on variables added to your data using
add_level. So, the following design is completely permissable:
declare_population(N = 10, female = rbinom(N, 1, .5)) + declare_assignment(blocks = female).
My impression is that the advantage of
add_level allows me to:
- precisely decide the size of my blocks
Yes. But you could also precisely determine block sizes without add_level() (e.g.
declare_population(N = 10, blocks = rep(c(1,2),c(5,5)) + declare_assignment(blocks = blocks). And also it’s generally helpful for determining the size of groups in general, not just ones you plan to block on.
- block on multiple vars
I don’t think this is an advantage of
add_level – this sounds like a job for some other function or package, such as
blockTools, which forms blocks for the random assignment based on multiple variables.
add_level won’t do this – it just adds a hierarchical level to your data (e.g. students within classes within schools).
- Define certain block level characteristics e.g.,
u_b = rnorm(N) * sd_block
Yes! Block or group level characteristics. This is probably the most helpful thing about
add_level. If you want to say that students in class
j get a common shock because they share the same teacher,
add_level makes parameterizing this so much easier.
Is this all correct? Am I missing something? Does it ever make sense to just do something like
declare_assignment(blocks = female) ?
Yes, when you want to ensure that equal numbers of men and women are assigned to treatment – which, as I hope is clear from the above, is a separate question from that of adding hierarchy to the data.