Skip to contents

In this vignette we start with an empty R project and walk through a full working example analysis.

Our goal for this example analysis is to report the number of subjects experiencing a mild adverse event in each treatment arm stratified by a custom age grouping. For this example we use the ADAE dataset provided in the {pharmaverseadam} package. To run this example you will need the following packages installed:

The outline of the workflow will be:

  1. Set up our project infrastructure by running chef::use_chef
  2. Specify our endpoint
  3. Define function to produce our ADaM data
  4. Define function to calculate the statistics we want as results (i.e. number of events)
  5. Run the pipeline and inspect the results

1. Set up project infrastructure

This assumes you have set up an RStudio project (or equivalent). If you have not done so, do that first.

To setup a chef project you need:

  • A R/ directory where all project-specific R code will be stored. This will include:
    • Any functions used to make the ADaM data ingested by the chef pipeline
    • The R function that produces the endpoint specification object
    • Any analysis/statistical functions that are not sourced from other R packages (like chefStats or chefCriteria)
    • A script containing library() calls to any package needed for the pipeline to run
  • A pipeline/ directory where the targets pipeline(s) is/are defined
  • A targets.yml file tracking the different pipelines

The file file structure should look like this:

<R-project dir>/
    |-- R/
        |--- mk_endpoint_definition.R
        |--- mk_adam.R
        |--- packages.R
    |-- pipeline/
        |--- pipeline_01.R
    |-- _targets.yaml
    

{chef} has a convenience function to set up this infrastructure for you:

library(chef)
chef::use_chef(
  pipeline_id = "01"
)

This sets up the following file structure:

For now we need to know what the file in R/ do. For the _targets.yml and pipeline_01.R explanation, see vignette("pipeline").

1. Specify an endpoint

Endpoint specifications need to be created inside a function, in this case the function defined in the mk_endpoint_definition.R

An endpoint is created by using the chef::mk_endpoint_str() function. For an explanation of how to specify endpoints, see vignette("endpoint_definitions").

Here we specify a minimal working endpoint based on the adae dataset supplied by {pharmaverseadam}. We do this by modifying the R/mk_endpoint_definition.R file so that is looks like this:

mk_endpoint_def <- function() {
  chef::mk_endpoint_str(
    study_metadata = list(),
    pop_var = "SAFFL",
    pop_value = "Y",
    treatment_var = "TRT01A",
    treatment_refval = "Xanomeline High Dose",
    stratify_by = list(c("AGEGR2")),
    data_prepare = mk_adae,
    endpoint_label = "A",
    custom_pop_filter = "TRT01A %in% c('Placebo', 'Xanomeline High Dose')",
    stat_by_strata_by_trt = list("N_subj_event" = c(chefStats::n_subj_event))
  )
}

You might notice a couple things with this specification:

  • Even though we are using the ADAE dataset from {pharmavreseadam}, there is no reference to this in the endpoint specification This is because the input clinical data is created via the adam_fn field, so in this case the reference to the ADAE data set will be inside the mk_adae function (see next section).
    • In the stratify_by field we refer to a variable called AGEGR2, however the ADAE dataset from {pharmaverseadam} does not contain any such variable. This is because we will derive this variable inside mk_adae (see next section).

2. Define the input dataset

We also need to provide chef with the ADAE input data set that that corresponds to the endpoint specified above. To read more about make these data sets, see vignette("mk_adam"). We can see that we have strata based on a AGEGR2, which can be derived from the AGE variable in ADSL. For now, we write a simple ADaM function mk_adae that merges the ADSL data set (enriched with AGEGR2) onto the ADAE data set, thereby creating the input data set.

mk_adae <- function(study_metadata) {
  adae <- data.table::as.data.table(pharmaverseadam::adae)
  adsl <- data.table::as.data.table(pharmaverseadam::adsl)
  adsl[, AGEGR2 := data.table::fcase(
    AGE < 70, "AGE < 70",
    AGE >= 70, "AGE >= 70"
  )]
  adae_out <-
    merge(adsl, adae[, c(setdiff(names(adae), names(adsl)), "USUBJID"),
      with =
        F
    ], by = "USUBJID", all = TRUE)
  adae_out[]
}

3. Define the analysis methods

Now that we have specified the endpoint to be analyzed, and defined the analysis data set for {chef}, we need to define the analysis itself.

Our goal for this analysis is to count the number of events experiencing an event. We need to define a function that makes those calculations, and give that function to chef. Because we want a result per treatment arm - strata combination, we must provide the function in the stat_by_strata_by_trt argument in the endpoint specification. We have already this argument set to chefStats::n_subj_event in the example endpoint specification above

4. Run the analysis pipeline

Now that all the inputs are defined, we can run the pipeline. This is achieved by a call to tar_make() from the {targets} package.

targets::tar_make()

Targets will show you which steps in the pipeline are executed and how long each step took:

## Loading required package: targets
## ▶ dispatched target ep
## ● completed target ep [0.03 seconds]
## ▶ dispatched target ep_id
## ● completed target ep_id [0.009 seconds]
## ▶ dispatched branch ep_fn_map_c85cfecb
## ● completed branch ep_fn_map_c85cfecb [0.026 seconds]
## ● completed pattern ep_fn_map
## ▶ dispatched target user_def_fn
## ● completed target user_def_fn [0.003 seconds]
## ▶ dispatched target study_data
## ● completed target study_data [0.04 seconds]
## ▶ dispatched target fn_map_tibble
## ● completed target fn_map_tibble [0.004 seconds]
## ▶ dispatched branch ep_and_data_c85cfecb
## ● completed branch ep_and_data_c85cfecb [0.01 seconds]
## ● completed pattern ep_and_data
## ▶ dispatched branch fn_map_5091fc6f
## ● completed branch fn_map_5091fc6f [0 seconds]
## ● completed pattern fn_map
## ▶ dispatched branch analysis_data_container_a4b422b1
## ● completed branch analysis_data_container_a4b422b1 [0 seconds]
## ● completed pattern analysis_data_container
## ▶ dispatched branch ep_with_data_key_a4b422b1
## ● completed branch ep_with_data_key_a4b422b1 [0 seconds]
## ● completed pattern ep_with_data_key
## ▶ dispatched branch ep_expanded_e741f3f8
## ● completed branch ep_expanded_e741f3f8 [0.023 seconds]
## ● completed pattern ep_expanded
## ▶ dispatched branch ep_event_index_5f7976c8
## ● completed branch ep_event_index_5f7976c8 [0.004 seconds]
## ● completed pattern ep_event_index
## ▶ dispatched branch ep_crit_endpoint_3dc85872
## ● completed branch ep_crit_endpoint_3dc85872 [0.003 seconds]
## ● completed pattern ep_crit_endpoint
## ▶ dispatched branch ep_crit_by_strata_by_trt_57df2f72
## ● completed branch ep_crit_by_strata_by_trt_57df2f72 [0.009 seconds]
## ● completed pattern ep_crit_by_strata_by_trt
## ▶ dispatched branch ep_crit_by_strata_across_trt_9f8f25f5
## ● completed branch ep_crit_by_strata_across_trt_9f8f25f5 [0.003 seconds]
## ● completed pattern ep_crit_by_strata_across_trt
## ▶ dispatched branch ep_prep_by_strata_across_trt_ca9d0fd8
## ● completed branch ep_prep_by_strata_across_trt_ca9d0fd8 [0.002 seconds]
## ● completed pattern ep_prep_by_strata_across_trt
## ▶ dispatched branch ep_prep_across_strata_across_trt_ca9d0fd8
## ● completed branch ep_prep_across_strata_across_trt_ca9d0fd8 [0.001 seconds]
## ● completed pattern ep_prep_across_strata_across_trt
## ▶ dispatched branch ep_prep_by_strata_by_trt_ca9d0fd8
## ● completed branch ep_prep_by_strata_by_trt_ca9d0fd8 [0.036 seconds]
## ● completed pattern ep_prep_by_strata_by_trt
## ▶ dispatched target ep_rejected
## ● completed target ep_rejected [0.001 seconds]
## ▶ dispatched branch ep_stat_by_strata_across_trt_e550b4a6
## ● completed branch ep_stat_by_strata_across_trt_e550b4a6 [0.001 seconds]
## ● completed pattern ep_stat_by_strata_across_trt
## ▶ dispatched branch ep_stat_across_strata_across_trt_3b54a938
## ● completed branch ep_stat_across_strata_across_trt_3b54a938 [0 seconds]
## ● completed pattern ep_stat_across_strata_across_trt
## ▶ dispatched branch ep_stat_by_strata_by_trt_c23c2985
## ● completed branch ep_stat_by_strata_by_trt_c23c2985 [0.011 seconds]
## ● completed pattern ep_stat_by_strata_by_trt
## ▶ dispatched target ep_stat_nested
## ● completed target ep_stat_nested [0.001 seconds]
## ▶ dispatched target ep_stat
## ● completed target ep_stat [0.004 seconds]
## ▶ ended pipeline [0.52 seconds]
## 

Then, to see the results, you load the cached step of the pipeline corresponding to the results. In our case it will be ep_stat, so to load it into the sessions as an object we call

targets::tar_load(ep_stat)

Now ep_stat is an R object like any other. Thus we can look at our results simply by running

ep_stat

However, there is a lot of extra data included in the object, so lets look at a column subsection of the first 5 rows:

ep_stat[, .(
  treatment_var,
  treatment_refval,
  strata_var,
  stat_filter,
  stat_result_label,
  stat_result_description,
  stat_result_qualifiers,
  stat_result_value
)] |> head()
##    treatment_var     treatment_refval strata_var
##           <char>               <char>     <char>
## 1:        TRT01A Xanomeline High Dose     TOTAL_
## 2:        TRT01A Xanomeline High Dose     TOTAL_
## 3:        TRT01A Xanomeline High Dose     AGEGR2
## 4:        TRT01A Xanomeline High Dose     AGEGR2
## 5:        TRT01A Xanomeline High Dose     AGEGR2
## 6:        TRT01A Xanomeline High Dose     AGEGR2
##                                                 stat_filter stat_result_label
##                                                      <char>            <char>
## 1:                  TOTAL_ == "total" & TRT01A == "Placebo"                 n
## 2:     TOTAL_ == "total" & TRT01A == "Xanomeline High Dose"                 n
## 3:               AGEGR2 == "AGE < 70" & TRT01A == "Placebo"                 n
## 4:              AGEGR2 == "AGE >= 70" & TRT01A == "Placebo"                 n
## 5:  AGEGR2 == "AGE < 70" & TRT01A == "Xanomeline High Dose"                 n
## 6: AGEGR2 == "AGE >= 70" & TRT01A == "Xanomeline High Dose"                 n
##           stat_result_description stat_result_qualifiers stat_result_value
##                            <char>                 <char>             <num>
## 1: Number of subjects with events                   <NA>                86
## 2: Number of subjects with events                   <NA>                72
## 3: Number of subjects with events                   <NA>                22
## 4: Number of subjects with events                   <NA>                64
## 5: Number of subjects with events                   <NA>                18
## 6: Number of subjects with events                   <NA>                54

5. Pass the data on to TFL formatting

Now that the data is produced, you can pass it on for TFL formatting (outside the scope of {chef}).