DeclareDesign Community

TukeyHSD Pairwise Comparisons with lm_robust


#1

I am using your wonderful lm_robust command in order to generate cluster robust standard errors. I’m running an experiment where I have four treatment conditions within one experiment (let’s label the possible treatment conditions as MM, FM, MF, and FF) and the same treatment conditions in two ‘control’ experiments.

I have coded a variable called ‘treatment’ to be a factor variable with those four levels, and a factor variable called ‘condition’ to take on one of three values: ‘Main_Experiment, Control_1, and Control_2.’ I want to test whether, for example, FM in Main_Experiment is > FM in Control_1 or Control_2.

My current set up looks like the following:

lm_robust(outcome ~ treatment:condition + education + income + sex + birth + race + trust_daily,
  data =  all_data,
  clusters = team,
  se = "stata"
)

I’d like to be able to compare specific levels of factors across conditions using something like TukeyHSD, as in the following:

aov_model <- aov(outcome~ treatment*condition + education + income + sex + birth + race + trust_daily + turk_experience, data = all_data)
res <- TukeyHSD(my_lm, which = c('treatment:condition'))
as.data.frame(tidy(res,my_lm)) %>% 
    filter(adj.p.value < .05)

However, TukeyHSD doesn’t accept lm_robust models. Trying to use something like linearHypothesis in the car library is also difficult because lm_robust will drop one of the levels due to perfect collinearity, so I can’t get multiple-comparisons adjusted pairwise mean t-tests without cycling through every possible reference group.

Is there a better way of doing this kind of comparison within the lm_robust framework? I’ve been working on this problem for over a month, and would really appreciate it.


#2

Sadly, it appears that TukeyHSD only takes aov objects, and that aov() is hardcoded to only work with stats::lm

> methods(TukeyHSD)
[1] TukeyHSD.aov*
see '?methods' for accessing help and source code
> head(deparse(stats::aov), 30)
 [1] "function (formula, data = NULL, projections = FALSE, qr = TRUE, "                                 
 [2] "    contrasts = NULL, ...) "                                                                      
 [3] "{"                                                                                                
 [4] "    Terms <- if (missing(data)) "                                                                 
 [5] "        terms(formula, \"Error\")"                                                                
 [6] "    else terms(formula, \"Error\", data = data)"                                                  
 [7] "    indError <- attr(Terms, \"specials\")$Error"                                                  
 [8] "    if (length(indError) > 1L) "                                                                  
 [9] "        stop(sprintf(ngettext(length(indError), \"there are %d Error terms: only 1 is allowed\", "
[10] "            \"there are %d Error terms: only 1 is allowed\"), length(indError)), "                
[11] "            domain = NA)"                                                                         
[12] "    lmcall <- Call <- match.call()"                                                               
[13] "    lmcall[[1L]] <- quote(stats::lm)"                                                             
[14] "    lmcall$singular.ok <- TRUE"

Here is as far as I got to hacking around that:

df <- fabricate(N=10000, 
                   g=gl(GROUPS, N/GROUPS), 
                   Z1=rlnorm(N), 
                   Z2=rlnorm(N,1), 
                   Y=ifelse(g %in% 2:4, 3, 0) + .8*Z1 + 3.2 * Z2 + rnorm(N), 
                   X=matrix(rnorm(N*NOISE), N, NOISE))

m <- lm_robust(Y~g, df)

TukeyHSD(m, "g")

# Edit stats::lm -> estimatr::lm_robust, remove singular.ok = TRUE
my_aov <- edit(aov)
debug(my_aov)

ma <- my_aov(Y~g, df)

# > TukeyHSD(ma)
# Error in proj.default(x, unweighted.scale = TRUE) : 
#   argument does not include a 'qr' component
# > traceback()
# 9: stop("argument does not include a 'qr' component")
# 8: proj.default(x, unweighted.scale = TRUE)
# 7: NextMethod("proj")
# 6: proj.aov(x, unweighted.scale = TRUE)
# 5: proj(x, unweighted.scale = TRUE)
# 4: model.tables.aov(x, "means")
# 3: model.tables(x, "means")
# 2: TukeyHSD.aov(ma)
# 1: TukeyHSD(ma) 

lm_robust will likely never return the QR factorization because there is more going on in it’s estimation than just QR, so it can’t use the default proj method - however, if you or Luke implemented either model.tables or proj() methods for lm_robust objects, I don’t see why this couldn’t work using the existing code for TukeyHSD in the stats package.