DeclareDesign Community

Randomly assigning individuals to groups?

I’m interested the impact of gender diversity in small groups. My question is how do I assign students to clusters based on a demographic?

To begin let’s say I have 5 classes each with between 12 and 15 students. In the classes each student has a .3 chance of being female.

pop <- declare_population(class = add_level(N = 5),
                          students = add_level(N = c(12, 13, 12, 12, 15),
                                               is_female = draw_binary(prob = .3, N = N)))

I now want to assign the students to groups. A group will be considered “treated” if it has at least 1 female, and otherwise will be a control group. I want to maximize the number of treated groups, so almost every treated group should be some combination of MMF

For example a class with 12 students and 3 women, will have a total of 4 triads, of which 3 will be treated.

The groups should be triads if possible and if not possible I should create as many triads as possible and make up the rest in dyads. To do this I wrote the following function which may or may not be helpful:

find_group_size <- function(students){
  if(students %% 3 == 1){
    n_triads <- floor(students / 3) - 1
    n_dyad <- 2
  } else if(students %% 3 == 2){
    n_triads <- floor(students / 3)
    n_dyad <- 1
  } else {
    n_dyad <- 0
    n_triads <- students / 3
  }
  return(c(rep(3, n_triads), rep(2, n_dyad)))
}

How can I assign students to groups as part of declare_assignment, in a way to make sure all the female students are in different groups? This may be easier to do as part of declare_population. Thank you for your help.

omigosh, this is a hard one. It’s not a built-in-desgin in randomizr. I made this function once to kinda do it, maybe it will help. Here i’m assigning men and women to tables, and the tables have a randomly assigned gender composition target. The trouble is that my sample might not have a cooperative number of each kind!

library(tidyverse)
library(DeclareDesign)
composition_ra <-
  function(genre) {
    # set up
    table_sizes <-
      table(complete_ra(N = length(genre), conditions = paste0("table_", 1:4)))
    N_men <- sum(genre == "homme")
    N_women <- sum(genre == "femme")
    
    # assign to homogeneous or heterogeneous
    block_ra(
      genre,
      conditions = c("homogeneous", "heterogeneous"),
      block_m_each =
        rbind(
          c(table_sizes[1], N_women - table_sizes[1]),
          c(table_sizes[4], N_men - table_sizes[4])
        )
    )
  }

dat <-
  fabricate(
    N = 25,
    prenom = "prenom",
    nom_de_famille = "nom_de_famille",
    genre = rep(c("homme", "femme"), c(10, N - 10))
  )

dat <-
  dat %>%
  mutate(Z_composition = composition_ra(genre),
         gender_composition_block = paste0(genre, "_", Z_composition),
         Z_table =
           block_ra(
             blocks = gender_composition_block,
             block_prob_each = rbind(
               c(0.0, 0.5, 0.5, 0.0),
               c(1.0, 0.0, 0.0, 0.0),
               c(0.0, 0.5, 0.5, 0.0),
               c(0.0, 0.0, 0.0, 1.0)
             ),
             conditions = paste0("table_", 1:4)
           ))


with(dat, table(Z_table, genre))
1 Like

Here is how I would approach this - first, is it reasonable to say an FMM triad is equivalent to an MFM or MMF? If yes, I would suggest assigning the groups by gender seperately - Females first, and then males.

In R:

assn_handler <- function(data) {
  
  A <- matrix(NA_character_, ceiling(nrow(data) / 3), 3)
  
  ids_by_female <- split(data$students, data$is_female)
  
  j <- 0
  # Note the ordering below, so that females are assigned to groups first.
  for(ids in ids_by_female[c("1", "0")]) {
    A[j + seq_along(ids)] <- ids
    j <- j + length(ids)
  }
  
  grouping <- na.omit(data.frame(students=as.vector(A), group=as.vector(row(A))))
  
  data <- merge(data, grouping)

  # Mark treated (for whole group) if there is a female in the group
  data$Z <- ave(data$is_female, data$group, FUN = function(x) 1 %in% x)
  
  data
}

This is 80% there, but doesn’t do it by class - you can call the function from a handler that split/apply/combines it -

assn_handler_by_class <- function(data) {
  do.call("rbind.data.frame", by(data, data$class, assn_handler))
}

And make it into a DD step:

assn <- declare_assignment(handler = assn_handler_by_class)

pop+assn
1 Like

BTW, my intuition is that if you do assignment that way, there may be a potential confound between an effect of being in a group with or without a female, and being in a triad vs dyad - just something to keep in mind, especially with small sample sizes. If the true female ratio were 25% (anything less than 1/3, really), the dyad assignment would be biased to MM.

1 Like

Thanks so much to both of you! I know I post on the forum often, but this was so helpful for me!

@nfultz I think your last point is really important, I have two questions:

  1. Thinking about it more, it might be preferable to keep the group sizes random as well. Do you know how I would go about randomly assigning groups of 2 ** or ** 3 instead?

  2. I think if I assign males first would that solve the confound problem?

I did write a draft of a function to make create a vector of random group sizes to fit the class in case it’s helpful:

random_sizes <- function(total, sample_sizes){
  sample.vec <- function(x, ...) x[sample(length(x), ...)] # special sample function for single ints
  sample_sizes <- sort(sample_sizes)
  running_size = total
  out = list()
  i = 1
  
  # -----------------------------------
  while(running_size != 0){
    # drop numbers if they're bigger than total remaining
    while(running_size < sample_sizes[length(sample_sizes)]){
      sample_sizes = sample_sizes[-length(sample_sizes)]
    }
    # draw initial size
    group_size <- sample.vec(x = sample_sizes, size = 1)
    
    # if nothing will fit, draw again (note: this line needs fixing still)
    while(!any(((running_size - group_size) %% sample_sizes) == 0)){
      group_size <- sample.vec(sample_sizes, size = 1)
    }
    # include group and save to list
    running_size <- running_size - group_size
    out[[i]] <- group_size
    i <- i + 1
  }
  unlist(out)
}
random_sizes(total = 13, sample_sizes = c(2,3))

You can take students in groups of six, and randomly subdivide that group into 2x triads or 3x dyads.

at the end, if one person is left over, you need to assign them to an existing dyad, otherwise the split is determined by N mod 6.

If you instead fill an assignment matrix A by column, leading with Ms, you still get MMs on the bottom :

M M M
M M F
M M F
M M F
M M

With whatever assignment you do decide on, you can also generate a variable for dyad/triad, and set up a potential outcome with both Z, triad, and an interaction: Y ~ beta_z*Z + beta_t*triad + beta_zt*Z*triad and figure out if / how bad the confound is by simulation.

1 Like