Covariates in estmator


This package is a fantastic resource, thanks so much for creating it.

I’m just wondering if it’s possible to incorporate prespecified covariates into the estimator? For example, it would be great to use the lm_lin style specification with covariates to see how this performs against vanilla OLS. When I try and do this currently I get an error message that tells me that I can only include a treatment indicator on the right hand side of the equation. Am I doing this wrong, or is not a feature?

Thanks again,

Hi Scott,

Do you think you might copy-paste the R code you’re running so we can get a handle on your syntax? The lm_lin syntax is a little different than R’s base OLS. In conventional OLS, any covariate you are regressing on is being treated the same (so a “control” covariate and the covariate of interest are mathematically doing the same thing). The Lin estimator begins by doing some processing on the pre-treatment covariates, and then after that separately estimates the treatment effect. Because this is the case, we have separate arguments for treatment and covariates so that we know which is which and don’t have to guess or potentially give you an estimate which may not be what you expect. You can check Lin 2012 if you want an in-depth statistical discussion of this, but in the mean time…

Checking the documentation in estimatr, I can see that the function definition for lm_lin is:

lm_lin(formula, covariates, data, weights, subset, clusters, se_type = NULL, ci = TRUE, alpha = 0.05, return_vcov = TRUE, try_cholesky = FALSE)

Where the arguments of interest here are:
formula: an object of class formula, as in lm, such as Y ~ Z with only one variable on the right-hand side, the treatment
covariates: a right-sided formula with pre-treatment covariates on the right hand side, such as ~ x1 + x2 + x3.
data: A data.frame

Note that here, as opposed to in R’s base lm, the covariates and the treatment are specified in separate arguments.

I used estimatr’s companion package fabricatr to generate some mock data for you to show an example. In this example, Ti is assignment to treatment, Xi is some covariate, and Yi is the dependent variable:

# Synthetic data
df = fabricate(N = 100, Ti = draw_binary(N, prob = 0.5), Xi = rnorm(N), Yi = 3 * Ti + rnorm(N))

# Base R OLS
lm(Yi ~ Ti + Xi, data = df)

# estimatr's lm_robust (robust standard errors built-in)
lm_robust(Yi ~ Ti + Xi, data = df)

# Lin estimator (be sure to specify pre-treatment covariates separately)
lm_lin(formula = Yi ~ Ti, 
       covariates = ~ Xi, 
       data = df)

# Of course we can also do this without argument names
lm_lin(Yi ~ Ti, ~ Xi, df)

I suspect what you did was something like this, which would give you an error:

lm_lin(Yi ~ Ti + Xi, data=df)

I’ll ping our package author on estimatr to let him know about your post, and also we can probably kick around a slightly better error message for when other users do this.

Best regards
Aaron Rudkin
Package author, fabricatr
DeclareDesign team

Hi Aaron,

Thanks very much for the reply. I should have been more clear that I was using the declare_estimator() command and trying to get covariates incorporated into that using the lm_lin command. Your reply has made me realise what I was doing wrong:

I was declaring my estimator like this (with age as a covariate):

estimator_lm <-
declare_estimator(Y ~ Z + age,
model = lm_lin,
estimand = estimand)

Based on your reply I tried this, and it worked nicely:

estimator_lm <-
declare_estimator(Y ~ Z,
model = lm_lin,
covariates = ~ age,
estimand = estimand)

The error message I reported which was “‘formula’ must have only one variable on the right-hand side: the treatment variable.” came when I tried to do it this way:

estimator_lm <-
declare_estimator(Y ~ Z + age, estimand = estimand)

Simply calling lm via “model = lm” seems to be enough to stop this error.

Thanks again and sorry if my message is badly formatted, I wasn’t sure how to incorporate the R code into the body text.

Also, congratulations again on a great set of packages.