DeclareDesign Community

Cluster assignment with different cluster sizes

Hi,
I would like to do a cluster assignment. The amount of units within each cluster varry. In the baseline data here is hardly any correlation of variables (including the DV) within clusters to be found. So for the final analysis I will probably be able to ignore clusters. Clusters are only necessary out of practical reasons to distribute the intervention.
I would therefore like to have an equal amount of units (at least as much as possible) within each condition. How can I do a complete random assignment at the unit level. As far as I understand it cluster_ra() can only do so at the cluster level.

Best,

Sascha

I think this is a cluster variant on the long-standing load balancing ticket - it might be worth revisiting at some point.

@colonus - to the extent that the clustering is a nuisance anyway, the easiest thing to do for now is generate a set of assignments by cluster, and reject any where the balance is out of whack - you shouldn’t need to reweight your analysis in that case. Other strategies could include a non-randomized assignment strategy that optimizes for balance, or a sequential design that dynamically preserves balance based on previous assignments and cluster size.

Am I right in thinking that you want to do a cluster assignment such that the total number of treated units says close to fixed?

@nfultz’s approach will work, though perhaps an easier way would be to do a block-and-cluster random assignment ?randomizr::block_and_cluster_ra, where you block on cluster size (https://gking.harvard.edu/files/cluster.pdf) or a combination of cluster size and cluster characteristics

I went with something like @nflutz approach, which worked quite well. I rerun the assignment 10k times, calculating the SD over assigned units per group and then selected the assignment with the smallest SD.

I’ll just mention that the procedure you describe (selecting the “best” random assignment from the set of 10,000) is not actually random! consider blocking or establishing an “acceptable” threshold, then choosing a random one of the acceptable assignments. From this, you can also calculate the probabilities of assignment, which this restricted procedure might cause to be different for different units.