In this vignette we start with an empty R project and walk through a full working example analysis.
Our goal for this example analysis is to report the number of subjects experiencing a mild adverse event in each treatment arm stratified by a custom age grouping. For this example we use the ADAE dataset provided in the {pharmaverseadam} package. To run this example you will need the following packages installed:
The outline of the workflow will be:
- Set up our project infrastructure by running
chef::use_chef
- Specify our endpoint
- Define function to produce our ADaM data
- Define function to calculate the statistics we want as results (i.e. number of events)
- Run the pipeline and inspect the results
1. Set up project infrastructure
This assumes you have set up an RStudio project (or equivalent). If you have not done so, do that first.
To setup a chef project you need:
- A
R/
directory where all project-specific R code will be stored. This will include:- Any functions used to make the
ADaM
data ingested by the chef pipeline - The R function that produces the endpoint specification object
- Any analysis/statistical functions that are not sourced from other R packages (like chefStats or chefCriteria)
- A script containing
library()
calls to any package needed for the pipeline to run
- Any functions used to make the
- A
pipeline/
directory where thetargets
pipeline(s) is/are defined - A
targets.yml
file tracking the different pipelines
The file file structure should look like this:
<R-project dir>/
|-- R/
|--- mk_endpoint_definition.R
|--- mk_adam.R
|--- packages.R
|-- pipeline/
|--- pipeline_01.R
|-- _targets.yaml
{chef} has a convenience function to set up this infrastructure for you:
This sets up the following file structure:
For now we need to know what the file in R/
do. For the
_targets.yml
and pipeline_01.R
explanation,
see vignette("pipeline")
.
1. Specify an endpoint
Endpoint specifications need to be created inside a function, in this
case the function defined in the
mk_endpoint_definition.R
An endpoint is created by using the
chef::mk_endpoint_str()
function. For an explanation of how
to specify endpoints, see
vignette("endpoint_definitions")
.
Here we specify a minimal working endpoint based on the
adae
dataset supplied by {pharmaverseadam}. We do this by
modifying the R/mk_endpoint_definition.R
file so that is
looks like this:
mk_endpoint_def <- function() {
chef::mk_endpoint_str(
study_metadata = list(),
pop_var = "SAFFL",
pop_value = "Y",
treatment_var = "TRT01A",
treatment_refval = "Xanomeline High Dose",
stratify_by = list(c("AGEGR2")),
data_prepare = mk_adae,
endpoint_label = "A",
custom_pop_filter = "TRT01A %in% c('Placebo', 'Xanomeline High Dose')",
stat_by_strata_by_trt = list("N_subj_event" = c(chefStats::n_subj_event))
)
}
You might notice a couple things with this specification:
- Even though we are using the ADAE dataset from {pharmavreseadam},
there is no reference to this in the endpoint specification This is
because the input clinical data is created via the
adam_fn
field, so in this case the reference to theADAE
data set will be inside themk_adae
function (see next section).- In the
stratify_by
field we refer to a variable calledAGEGR2
, however the ADAE dataset from {pharmaverseadam} does not contain any such variable. This is because we will derive this variable insidemk_adae
(see next section).
- In the
2. Define the input dataset
We also need to provide chef with the ADAE
input data
set that that corresponds to the endpoint specified above. To read more
about make these data sets, see vignette("mk_adam")
. We can
see that we have strata based on a AGEGR2
, which can be
derived from the AGE
variable in ADSL
. For
now, we write a simple ADaM
function mk_adae
that merges the ADSL
data set (enriched with
AGEGR2
) onto the ADAE
data set, thereby
creating the input data set.
mk_adae <- function(study_metadata) {
adae <- data.table::as.data.table(pharmaverseadam::adae)
adsl <- data.table::as.data.table(pharmaverseadam::adsl)
adsl[, AGEGR2 := data.table::fcase(
AGE < 70, "AGE < 70",
AGE >= 70, "AGE >= 70"
)]
adae_out <-
merge(adsl, adae[, c(setdiff(names(adae), names(adsl)), "USUBJID"),
with =
F
], by = "USUBJID", all = TRUE)
adae_out[]
}
3. Define the analysis methods
Now that we have specified the endpoint to be analyzed, and defined the analysis data set for {chef}, we need to define the analysis itself.
Our goal for this analysis is to count the number of events
experiencing an event. We need to define a function that makes those
calculations, and give that function to chef. Because we want a result
per treatment arm - strata combination, we must provide the function in
the stat_by_strata_by_trt
argument in the endpoint
specification. We have already this argument set to
chefStats::n_subj_event
in the example endpoint
specification above
4. Run the analysis pipeline
Now that all the inputs are defined, we can run the pipeline. This is
achieved by a call to tar_make()
from the {targets}
package.
targets::tar_make()
Targets will show you which steps in the pipeline are executed and how long each step took:
## Loading required package: targets
## ▶ dispatched target ep
## ● completed target ep [0.03 seconds]
## ▶ dispatched target ep_id
## ● completed target ep_id [0.009 seconds]
## ▶ dispatched branch ep_fn_map_c85cfecb
## ● completed branch ep_fn_map_c85cfecb [0.026 seconds]
## ● completed pattern ep_fn_map
## ▶ dispatched target user_def_fn
## ● completed target user_def_fn [0.003 seconds]
## ▶ dispatched target study_data
## ● completed target study_data [0.04 seconds]
## ▶ dispatched target fn_map_tibble
## ● completed target fn_map_tibble [0.004 seconds]
## ▶ dispatched branch ep_and_data_c85cfecb
## ● completed branch ep_and_data_c85cfecb [0.01 seconds]
## ● completed pattern ep_and_data
## ▶ dispatched branch fn_map_5091fc6f
## ● completed branch fn_map_5091fc6f [0 seconds]
## ● completed pattern fn_map
## ▶ dispatched branch analysis_data_container_a4b422b1
## ● completed branch analysis_data_container_a4b422b1 [0 seconds]
## ● completed pattern analysis_data_container
## ▶ dispatched branch ep_with_data_key_a4b422b1
## ● completed branch ep_with_data_key_a4b422b1 [0 seconds]
## ● completed pattern ep_with_data_key
## ▶ dispatched branch ep_expanded_e741f3f8
## ● completed branch ep_expanded_e741f3f8 [0.023 seconds]
## ● completed pattern ep_expanded
## ▶ dispatched branch ep_event_index_5f7976c8
## ● completed branch ep_event_index_5f7976c8 [0.004 seconds]
## ● completed pattern ep_event_index
## ▶ dispatched branch ep_crit_endpoint_3dc85872
## ● completed branch ep_crit_endpoint_3dc85872 [0.003 seconds]
## ● completed pattern ep_crit_endpoint
## ▶ dispatched branch ep_crit_by_strata_by_trt_57df2f72
## ● completed branch ep_crit_by_strata_by_trt_57df2f72 [0.009 seconds]
## ● completed pattern ep_crit_by_strata_by_trt
## ▶ dispatched branch ep_crit_by_strata_across_trt_9f8f25f5
## ● completed branch ep_crit_by_strata_across_trt_9f8f25f5 [0.003 seconds]
## ● completed pattern ep_crit_by_strata_across_trt
## ▶ dispatched branch ep_prep_by_strata_across_trt_ca9d0fd8
## ● completed branch ep_prep_by_strata_across_trt_ca9d0fd8 [0.002 seconds]
## ● completed pattern ep_prep_by_strata_across_trt
## ▶ dispatched branch ep_prep_across_strata_across_trt_ca9d0fd8
## ● completed branch ep_prep_across_strata_across_trt_ca9d0fd8 [0.001 seconds]
## ● completed pattern ep_prep_across_strata_across_trt
## ▶ dispatched branch ep_prep_by_strata_by_trt_ca9d0fd8
## ● completed branch ep_prep_by_strata_by_trt_ca9d0fd8 [0.036 seconds]
## ● completed pattern ep_prep_by_strata_by_trt
## ▶ dispatched target ep_rejected
## ● completed target ep_rejected [0.001 seconds]
## ▶ dispatched branch ep_stat_by_strata_across_trt_e550b4a6
## ● completed branch ep_stat_by_strata_across_trt_e550b4a6 [0.001 seconds]
## ● completed pattern ep_stat_by_strata_across_trt
## ▶ dispatched branch ep_stat_across_strata_across_trt_3b54a938
## ● completed branch ep_stat_across_strata_across_trt_3b54a938 [0 seconds]
## ● completed pattern ep_stat_across_strata_across_trt
## ▶ dispatched branch ep_stat_by_strata_by_trt_c23c2985
## ● completed branch ep_stat_by_strata_by_trt_c23c2985 [0.011 seconds]
## ● completed pattern ep_stat_by_strata_by_trt
## ▶ dispatched target ep_stat_nested
## ● completed target ep_stat_nested [0.001 seconds]
## ▶ dispatched target ep_stat
## ● completed target ep_stat [0.004 seconds]
## ▶ ended pipeline [0.52 seconds]
##
Then, to see the results, you load the cached step of the pipeline
corresponding to the results. In our case it will be
ep_stat
, so to load it into the sessions as an object we
call
targets::tar_load(ep_stat)
Now ep_stat
is an R object like any other. Thus we can
look at our results simply by running
ep_stat
However, there is a lot of extra data included in the object, so lets look at a column subsection of the first 5 rows:
ep_stat[, .(
treatment_var,
treatment_refval,
strata_var,
stat_filter,
stat_result_label,
stat_result_description,
stat_result_qualifiers,
stat_result_value
)] |> head()
## treatment_var treatment_refval strata_var
## <char> <char> <char>
## 1: TRT01A Xanomeline High Dose TOTAL_
## 2: TRT01A Xanomeline High Dose TOTAL_
## 3: TRT01A Xanomeline High Dose AGEGR2
## 4: TRT01A Xanomeline High Dose AGEGR2
## 5: TRT01A Xanomeline High Dose AGEGR2
## 6: TRT01A Xanomeline High Dose AGEGR2
## stat_filter stat_result_label
## <char> <char>
## 1: TOTAL_ == "total" & TRT01A == "Placebo" n
## 2: TOTAL_ == "total" & TRT01A == "Xanomeline High Dose" n
## 3: AGEGR2 == "AGE < 70" & TRT01A == "Placebo" n
## 4: AGEGR2 == "AGE >= 70" & TRT01A == "Placebo" n
## 5: AGEGR2 == "AGE < 70" & TRT01A == "Xanomeline High Dose" n
## 6: AGEGR2 == "AGE >= 70" & TRT01A == "Xanomeline High Dose" n
## stat_result_description stat_result_qualifiers stat_result_value
## <char> <char> <num>
## 1: Number of subjects with events <NA> 86
## 2: Number of subjects with events <NA> 72
## 3: Number of subjects with events <NA> 22
## 4: Number of subjects with events <NA> 64
## 5: Number of subjects with events <NA> 18
## 6: Number of subjects with events <NA> 54