To add new functions to chefStats, you follow four general steps:
- Consider what type of function you need:
stat_by_strata_by_trt
,stat_by_strata_across_trt
orstat_across_strata_across_trt
(see article Function types. - Decide what information the function needs in order to compute the desired results. See section Interface with chef
- Write the function definition (use
use_chefStats()
to help create the template) - Write the appropriate unit-tests for the function (using testthat framework)
Walk through example
Here we show how we would add a function which counts the number of
subjects experiencing the event by stratification level and treatment
level, if it didn’t already exist in chefStats
(n_subj_event()
).
This is the workflow for writing such a function from scratch:
- Decide which function type it is (see article Function
types. Since the function produces a number by stratification level
and by treatment level, this will be a
stat_by_strata_by_trt
. - Call
use_chefStats(fn_name = "num_subj_events", fn_type = "stat_by_strata_by_trt")
. This will produce a template R function that has the correct arguments for what is passed by the chef pipeline tostat_by_strata_by_trt
-type functions (see section Interface with chef for more details. The skeleton function will look like this:
num_subj_events <- function(dat,
event_index,
cell_index,
strata_var,
strata_val,
treatment_var,
treatment_val,
subject_id,
...) {
# Function body here:
# The final object retuned needs to be a data.table with the following format:
return(
data.table::data.table(
label = NA_character_,
value = NA_real_,
description = NA_character_
)
)
}
- Remove unneeded arguments. For our specific function, we only need
the following variables:
-
cell_index
Who was eligible to have the event -
event_index
Who has the event -
subject_id
Allows us to count unique people -
...
The elipses always have to be included in any chefStat function So we can delete the others arguments. The additional arguments will still be passed to the function by the chef pipeline, but they will be collected by...
and not used.
-
- Modify the function definition for our needs. The final function definition might look like this:
num_subj_events <-
function(dat,
event_index,
cell_index,
subjectid_var,
...) {
# Please see the "Interface with chef" section for details on what
# `event_index` and `cell_index`
# `intersect()` provides us with a vector of rows in `dat` that match both
# `event_index` and `cell_index` - aka records that were BOTH eligible to
# have the event (`cell_index`) AND had the event (`event_index`)
index <- intersect(event_index, cell_index)
# Return all matching rows in `dat` where `INDEX_`
# matches `index`.
event_rows <- dat[INDEX_ %in% index]
# `dat` contains event data, meaning subjects can appear more than once if
# they have >1 event, so we need to remove these extra rows to get a proper
# count
event_rows_unique_by_subject <- unique(event_rows, by = subjectid_var)
stat <- NROW(event_rows_unique_by_subject)
# The return object has to be a data.table object with the following 3
# columns. The `value` column always has to be a double (not an integer)
return(
data.table(
description = "Number of subjects with events",
label = "n_subj_events",
value = as.double(stat)
)
)
}
Interface with chef
The statistical functions from chefStats will be called within the context of a chef pipeline. The category of the function determines what arguments chef makes available to the function.
The following table describes arguments that are passed to all chefStats function:
Argument Names | Description |
---|---|
dat
|
A data.table containing the analysis data set produced by
the prepare_data function. To allow flexability for
creative use of chefStats functions, this dataset is
not filtered to the exact records needed for each
analysis when passed to chefStats. Instead this filtering is done inside
the chefStats functions. This is done using the INDEX_
column from dat that serves as a row ID, and is used for
filtering with, for example, cell_index or
event_index
|
event_index
|
A vector of indicies indicating which rows (as specified in
INDEX_ column of dat ) are considered to be
events for the endpoint specification under evaluation
|
strata_var
|
A character indicating which stratatification is being used
(e.g. SEX, AGE, etc)
|
treatment_var
|
A character indicating the name of the column in
dat containing the treatment information used for the
endpoint
|
subjectid_var
|
A character specifying the name of the column in
dat containing the subject ID. Defaults to “USUBJID”
|
The tables below describe the set of arguments chef passes that vary depending on the function type:
Argument Names | Description |
---|---|
cell_index
|
A vector of indicies specifying which rows in
dat are considered to be part of the analysis for the given
strata level and treatment level under evaluation. For example, if the
current instance of the function was analysis “Number of Events” for
SEX==“M” and TRT01A == “Placebo”, then cell_index would be
a vector of records in dat$INDEX_ that match those
parameters. You can thus obtain the analysis set by filtering
dat via: dat[cell_index %in% INDEX_]
|
strata_val
|
A character specifying the stratification level under
evaluation. For example if strat_var =="SEX" , then
strat_val could be either "M" or
"F"
|
treatment_val
|
A character specifying the treatment level (or treatment
arm). *Not to be confused with the treatment_refval that
specifies the reference treatment value.
|
Argument Names | Description |
---|---|
cell_index
|
A vector of indicies specifying which rows in
dat are considered to be part of the analysis for the given
strata level and treatment level under evaluation. For example, if the
current instance of the function was analysis “Number of Events” for
SEX==“M” and TRT01A == “Placebo”, then cell_index would be
a vector of records in dat$INDEX_ that match those
parameters. You can thus obtain the analysis set by filtering
dat via: dat[cell_index %in% INDEX_]
|
strata_val
|
A character specifying the stratification level under
evaluation. For example if strat_var =="SEX" , then
strat_val could be either "M" or
"F"
|
treatment_refval
|
A character specifying the treatment reference level. *Not
to be confused with the treatment_val that specifies the
treatment value for by_strata_by_trt functions
|
Argument Names | Description |
---|---|
strata_val
|
A character specifying the stratification level under
evaluation. For example if strat_var =="SEX" , then
strat_val could be either "M" or
"F"
|
treatment_refval
|
A character specifying the treatment reference level. *Not
to be confused with the treatment_val that specifies the
treatment value for by_strata_by_trt functions
|
Using building-blocks
When possible, utilize building-block functions when making new
statistical functions. For example, if the new functions requires a 2x2
table, use the make_two_by_two_()
function instead of
writing a new one.
This also allows you to easily write functions that collapse several
chefStats functions into one function call. For example, on call to
count_set()
is the same as one call each to
n_subj()
, n_event()
,
n_subj_event()
and p_subj_event()
. The only
rational for combining functions like this is to save compute time, due
to the way chef pipelines are constructed.
Building block function names are always suffixed with an underscore
_
to indicate they cannot be called from inside a chef
pipeline. For example, n_event_()
is a building block that
is used to make n_event()
, but n_event_()
can
also be use to build other functions, such as count_set()
,
because it does not format it’s output for a chef pipeline. Conversely,
n_event()
does format the output, so it can be use in a
chef pipeline, but not as a building block.