Time-to-event data

Hi team, great package.

May I know is there currently a way that I can set up a time-to-event type of experiment? Thanks.

1 Like

I’m not personally familiar with time-to-event experiments. Is the idea that you randomly assigned units to treatment conditions, then the outcome variable is the number of days (or minutes or whatever) until something happens?

Could you also say more about what the estimand in such a study is? Is it the average treatment effect on time-to-event, or is it something else?

Dear Aexl,

Thank you for responding.

Yes, exactly like what you have posted. Time-to-event (aka Survival analysis) is generally defined as a set of methods for analyzing data where the outcome variable is the time until the occurrence of an event of interest. The event can be death, occurrence of a disease, marriage, divorce, etc. The time to event or survival time can be measured in days, weeks, years, etc. For example, if the event of interest is heart attack, then the survival time can be the time in years until a person develops a heart attack. In survival analysis, subjects are usually followed over a specified time period and the focus is on the time at which the event of interest occurs.

The estimand in such a study is usually the median time-to-event for non-parametric model. Sometime the average treatment effect on time-to-event is also used when there is a justified parametric form for the time-to-event data.

This is a pretty widely use type of design in epidemiology, particularly clinical epi.

This intro paper might help: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1065034/

Let me know if there is something that I can contribute here.

Regards, Foo

I’ve tried out this type of simulation in DeclareDesign a little bit (in a labor statistics context) - here are a couple comments / notes:

  • You will most likely need to write some wrapper functions for using survival models with declare_estimator - we don’t have an example with a Cox model yet AFAIK, but we really should add one at some point.
  • What data format that estimator requires will determine how you set up the DGP portion of the design. If you are using more fancy extensions (time varying covariates) you may find it easier to write the DGP steps in “long format” (multiple rows per subject). Pivoting data inside a design is possible but not fun.
    • The Potential Outcomes step in this case would require some customization to correctly aggregate per subject. Handlers are dplyr-compatible, so you can write it in terms of a pipeline.
  • If there are no time varying covariates, you can stick to one row / subject and life is good :slight_smile:
  • If you want to simulate censored data, first simulate the complete data, and then do the censoring as a custom step. You can then parameterize the censoring window to do power analysis wrt to study length as well as number of subjects - my client found that quite slick.