DeclareDesign Community

Correcting p-values with the Holm Correction

In a diagnosis I’m conducting that involves more than one estimator, I have several estimators whose p-values I would like to adjust with the Holm Correction. This correction involves conducting a calculation across the p values of more than one estimator.

I presume that the best place to implement this is in the declare_diagnosand stage? But do diagnosands have access to the whole set of all estimates?

Presumably another option would be to report the mean p-value as an additional diagnosand and then conduct the Holm correction on a subset of estimators returned by diagnose_design?

1 Like

Hi Nathan,

Great question! Thanks for this. There are a few ways of doing it, but I think the method below is easiest. There is some weird code in there due to a problem with dplyr::mutate() stripping parameters from the simulations data.frame (see here, we’re working on fixing). But basically, it leverages the fact that diagnose_design() can also just take a data.frame and use that for diagnosis.

library(tidyverse)
multiple_comparisons <- 
	declare_population(
		N = 100, 
		X1 = rnorm(N),
		X2 = rnorm(N),
		X3 = rnorm(N),
		X4 = rnorm(N)
	) +
	declare_estimand(true_effect = 0) +
	declare_estimator(X1 ~ X2, model = lm_robust, estimand = "true_effect", label = "X1 on X2") +
	declare_estimator(X2 ~ X3, model = lm_robust, estimand = "true_effect", label = "X2 on X3") +
	declare_estimator(X3 ~ X4, model = lm_robust, estimand = "true_effect", label = "X3 on X4") +
	declare_estimator(X1 ~ X4, model = lm_robust, estimand = "true_effect", label = "X1 on X4") 

simulations <- 
	simulate_design(multiple_comparisons, sims = 20) %>% 
	group_by(sim_ID) %>%
	mutate(p.value = p.adjust(p.value, method = "holm")) %>%
	as.data.frame()

attr(simulations,"parameters") <- data.frame(design_label = "multiple_comparisons")

diagnose_design(simulations)
1 Like

Note that, if you use a reasonable number of simulations (e.g., simulate_design(multiple_comparisons, sims = 500) %>%), you’ll see that the power for all estimates is 2%, implying the rate of false positives is now 2% even when using an alpha of .05. In other words, the testwise rate of false positives is correctly being adjusted downwards.

1 Like

Also, this is going outside DD a bit (you could do it within DeclareDesign using a custom diagnosand), but if you want the FWER that’s pretty easy too:

simulations %>% 
	group_by(sim_ID) %>% 
	summarize(any_rejection = any(p.value < .05)) %>% 
	with(., mean(any_rejection))
1 Like

Generally I would recommend adjusting p values inside a custom estimator handler - https://github.com/DeclareDesign/DeclareDesign/blob/nfultz/postestimation/vignettes/postestimation.Rmd#L225 for example. This lets you build the adjustment into the design itself instead of having to mess with simulations post hoc.

1 Like

Hi Jasper, Neal, I just realized that I failed to respond to this thread blush. Both of your responses have been super helpful, and I’m planning to come back with a followup and report of how things went in a couple of weeks.

1 Like

Do you any of you know how you would go about doing this by writing a custom diagnosand?

Using Jasper’s approach’s, I’m not sure how to write a diagnosand using group_by(sim_ID), and using Neal’s approach I think it’s limited to correcting p-values for each model discretely rather than the p-values in comparison for all of the estimators (although maybe correcting each model individually may be mathematically equivalent depending on the method?)

I would be really surprised if the two approaches gave comparable results - Bonferoni / Holm adjustments are based on the ranking and the approaches have different sets to rank.

I think the correct choice depends on what the multiple models are being used for? EG if you are running five models, and then will pick one using some model selection objective (AICc or crossvalidated R^2 are two I’ve used, for example) then you probably don’t want to inflate the final p-values differently depending on how many dead-ends you explored, only on how many variables the selected model used.

If you wanted to use all of the models, for example fitting a set of path models by hand, then I think there’s an argument that per-model adjustment is not conservative enough. In that case, I would recommend writing a (single) custom estimator function, not a custom diagnosand. You can fit and return multiple models from a single estimation step (raw and adjusted for each model). There’s an old, rough example of fitting multiple models in one custom estimator step here - https://github.com/DeclareDesign/DeclareDesign/blob/nfultz/postestimation/vignettes/postestimation.Rmd#L417-L422

Another good example of wanting to use two models in one step would be IPSW methods where the fits of one model are used as weights in another.

The third way is adjusting the diagnostic output - I’m not a fan of this, because if you are going to be doing adjustments, it’s better to build that into the design and make the design the source of truth, rather than the simulation and diagnostics scripts. There is, however, some limited functionality on attaching custom diagnosands to a design, although I haven’t touched that code path in a long time.

The trick here is that because you are operating at the dataframe level rather than the column level, you want a custom diagnosand handler rather than a custom diagnosand. One really nice thing about Jaspers example is because it is dplyr, you can stuff the pipeline into declare_diagnosands handler argument and call it good:

Eg from this:

simulations <- 
	simulate_design(multiple_comparisons, sims = 20) %>% 
	group_by(sim_ID) %>%
	mutate(p.value = p.adjust(p.value, method = "holm")) %>%
	as.data.frame()

to this:

diagnose_design(multiple_comparisons, diagnosands=declare_diagnosands(handler = 
    (function(data) {data %>%group_by(sim_ID) %>%
      mutate(p.value = p.adjust(p.value, method = "holm")) %>% 
      ungroup() %>% 
      DeclareDesign:::diagnosand_handler() #Delegating to default handler, ?! important !?
  })
)) 

I’m like 75% sure that does equivalent logic / calculations.

1 Like