Observational Panel Data - Using Elections as Exogenous Shocks


I recently discovered your collection of packages and think this is a wonderful project.

I looked through the pre-existing sample designs and did not find any that would match a relatively common (yet, fraught) concern:

Detecting causal effects with panel data

Currently, most people use the plm package for estimation - which is fantastic in its own right. However, I was wondering if the DeclareDesign suite can handle this concern, since the combination of fabricatr, estimatr, and DeclareDesign would be fantastic.

For example, I am assessing how elections impact trust in government at the county level with 27 years of data.

In your framework, I can see the following parameters for the setup:

  1. Population: There is a stable population of 3082 counties

  2. Repeated Measurement: There is a balanced panel with data for 27 years per county

  3. Population: Assume, this entire population can be split into either Republican or Democrat counties (where assignment stays constant, i.e. they are considered either Republican or Democrat for all 27 years)

  4. Random Assignment: Assume, every four years, an election occurs which is considered to be a truly “exogenous shock” i.e. all counties of a given party are randomly assigned as “winning election” (e.g Dem County - Dem President) or “losing election” (e.g. Rep County - Dem President) [1]

  5. Causal Model to be Tested: Right after winning an election, “trust in government” increases - and then slowly falls back to baseline (and, vice versa for losing an election).

[1] Of course, incumbency effects would change the probability of being assigned to winning or losing, but presumably this can be added at a later stage.

Based upon the above, would you believe that:

  • This kind of setup is easily / natively modeled with the DeclareDesign framework?
  • If so, once modeled, am I correct that the fabricatr package could be used for power calculations?
  • And, could the estimatr package be used for robust estimation of effects?
  • Finally, can the estimatr package be used to address Difference-in-Difference models under such dependency constraints?

If the above is actually true, would you mind pointing me in the right direction on how to tackle this setup?

I would be happy to share the final outcome as a “Design Template” - since such setups are increasingly common in political science and policy frameworks and hopefully could be useful to others as well.

Highlighting some points of concern regarding panel data:

  • For accurate inference, standard errors must be clustered at the appropriate level (e.g., County-level or State-level clusters)
  • In most panel data, there are time-fixed effects (due to external factors, e.g. Hurricane Katrina or 2007 banking crisis)
  • It is necessary to account for cross-sectional dependencies
  • It is necessary to account for serial correlation at the county level (as well as differing levels of geographical hierarchy, e.g. State, Census Region) and block level (e.g. Republican / Democrat counties may experience unique changes over time)
1 Like

Replies inline:

This is exactly the kind of thing the DD stack was built for.

fabricatr only generates synthetic data, but DeclareDesign can calculate power via Monte Carlo. Your scenario is likely too complicated for power calculations by analysis.


For this I think you would use estimatr to fit a regression model, and then car::linearHypothesis to get the actual estimate.

You mentioned using plm - if you prefer it, there’s no reason you can’t use fabricatr + randomizer + plm + DeclareDesign, the stack can (be made to) work with most other R packages. The difficulty of that would really depend on how bespoke or standardized plm's output is.

Right now we support one-way clustered std errors, and may or may not add two-way
clustering at some point (eg both county and year at once).

This should be fine.

This is essentially the same procedure as with time (or any other dimension you would want to index on).

If you can specify all those correlations up front, the DD stack should be able to simulate them. You may run in to some attenuation issues if your data is substantially non-gaussian, but please let us know if you do. However, it’s often easier to write designs
when you specify them in terms latent variables rather than correlations.

This is fantastic news. I will examine how to model the issues with the DeclareDesign package.

I have encountered some trouble while trying to predict from the output of the PLM package, so I would be most interested in using estimatr - to ensure consistency across the stack.

There are other packages that support two-way clustering (e.g. Sandwich and ClubSandwich). Would you recommend either of these, in case two clustering is required?

I will attempt to do so and share what I find.

Thank you for your insightful responses. I hope to be able to share a template of this framework with the community shortly.

1 Like

clubSandwich does not currently support two-way clustering. Sandwich does. You might also check out the lfe package (https://CRAN.R-project.org/package=lfe), which can be used to estimate linear models with high-dim fixed effects. Clustered SEs (including multi-way clustered) can be specified as part of the model-fitting call so everything comes out in one step. (In contrast, sandwich requires a post-estimation step.)

lfe looks pretty nice too. It looks like it already has a tidy() method in broom, so you should be able to plug it straight in to DeclareDesign’s estimation step w/o writing any wrapper code.

I’m not sure if it uses rlang or base R NSE, so let us know if you encounter any weird bugs around model formulas.