Adding new functions • chefStats

To add new functions to chefStats, you follow four general steps:

Consider what type of function you need: stat_by_strata_by_trt, stat_by_strata_across_trt or stat_across_strata_across_trt (see article Function types.
Decide what information the function needs in order to compute the desired results. See section Interface with chef
Write the function definition (use use_chefStats() to help create the template)
Write the appropriate unit-tests for the function (using testthat framework)

Walk through example

Here we show how we would add a function which counts the number of subjects experiencing the event by stratification level and treatment level, if it didn’t already exist in chefStats (n_subj_event()).

This is the workflow for writing such a function from scratch:

Decide which function type it is (see article Function types. Since the function produces a number by stratification level and by treatment level, this will be a stat_by_strata_by_trt.
Call use_chefStats(fn_name = "num_subj_events", fn_type = "stat_by_strata_by_trt"). This will produce a template R function that has the correct arguments for what is passed by the chef pipeline to stat_by_strata_by_trt-type functions (see section Interface with chef for more details. The skeleton function will look like this:

num_subj_events <- function(dat,
                            event_index,
                            cell_index,
                            strata_var,
                            strata_val,
                            treatment_var,
                            treatment_val,
                            subject_id,
                            ...) {
  # Function body here:

  # The final object retuned needs to be a data.table with the following format:
  return(
    data.table::data.table(
      label = NA_character_,
      value = NA_real_,
      description = NA_character_
    )
  )
}

Remove unneeded arguments. For our specific function, we only need the following variables:
- cell_index Who was eligible to have the event
- event_index Who has the event
- subject_id Allows us to count unique people
- ... The elipses always have to be included in any chefStat function So we can delete the others arguments. The additional arguments will still be passed to the function by the chef pipeline, but they will be collected by ... and not used.
Modify the function definition for our needs. The final function definition might look like this:

num_subj_events <-
  function(dat,
           event_index,
           cell_index,
           subjectid_var,
           ...) {
    # Please see the "Interface with chef" section for details on what
    # `event_index` and `cell_index`

    # `intersect()` provides us with a vector of rows in `dat` that match both
    # `event_index` and `cell_index` - aka records that were BOTH eligible to
    # have the event (`cell_index`) AND had the event (`event_index`)
    index <- intersect(event_index, cell_index)

    # Return all matching rows in `dat` where `INDEX_`
    # matches `index`.
    event_rows <- dat[INDEX_ %in% index]

    # `dat` contains event data, meaning subjects can appear more than once if
    # they have >1 event, so we need to remove these extra rows to get a proper
    # count
    event_rows_unique_by_subject <- unique(event_rows, by = subjectid_var)

    stat <- NROW(event_rows_unique_by_subject)

    # The return object has to be a data.table object with the following 3
    # columns. The `value` column always has to be a double (not an integer)
    return(
      data.table(
        description = "Number of subjects with events",
        label = "n_subj_events",
        value = as.double(stat)
      )
    )
  }

Interface with chef

The statistical functions from chefStats will be called within the context of a chef pipeline. The category of the function determines what arguments chef makes available to the function.

The following table describes arguments that are passed to all chefStats function:

Arguments always passed to chefStats functions
Argument Names	Description
`dat`	A `data.table` containing the analysis data set produced by the `prepare_data` function. To allow flexability for creative use of chefStats functions, this dataset is not filtered to the exact records needed for each analysis when passed to chefStats. Instead this filtering is done inside the chefStats functions. This is done using the `INDEX_` column from `dat` that serves as a row ID, and is used for filtering with, for example, `cell_index` or `event_index`
`event_index`	A `vector` of indicies indicating which rows (as specified in `INDEX_` column of `dat`) are considered to be events for the endpoint specification under evaluation
`strata_var`	A `character` indicating which stratatification is being used (e.g. SEX, AGE, etc)
`treatment_var`	A `character` indicating the name of the column in `dat` containing the treatment information used for the endpoint
`subjectid_var`	A `character` specifying the name of the column in `dat` containing the subject ID. Defaults to “USUBJID”

The tables below describe the set of arguments chef passes that vary depending on the function type:

Additional arguments passed to by_strata_by_trt functions
Argument Names	Description
`cell_index`	A `vector` of indicies specifying which rows in `dat` are considered to be part of the analysis for the given strata level and treatment level under evaluation. For example, if the current instance of the function was analysis “Number of Events” for SEX==“M” and TRT01A == “Placebo”, then `cell_index` would be a vector of records in `dat$INDEX_` that match those parameters. You can thus obtain the analysis set by filtering `dat` via: `dat[cell_index %in% INDEX_]`
`strata_val`	A `character` specifying the stratification level under evaluation. For example if `strat_var =="SEX"`, then `strat_val` could be either `"M"` or `"F"`
`treatment_val`	A `character` specifying the treatment level (or treatment arm). *Not to be confused with the `treatment_refval` that specifies the reference treatment value.

Additional arguments passed to by_strata_across_trt functions
Argument Names	Description
`cell_index`	A `vector` of indicies specifying which rows in `dat` are considered to be part of the analysis for the given strata level and treatment level under evaluation. For example, if the current instance of the function was analysis “Number of Events” for SEX==“M” and TRT01A == “Placebo”, then `cell_index` would be a vector of records in `dat$INDEX_` that match those parameters. You can thus obtain the analysis set by filtering `dat` via: `dat[cell_index %in% INDEX_]`
`strata_val`	A `character` specifying the stratification level under evaluation. For example if `strat_var =="SEX"`, then `strat_val` could be either `"M"` or `"F"`
`treatment_refval`	A `character` specifying the treatment reference level. *Not to be confused with the `treatment_val` that specifies the treatment value for `by_strata_by_trt` functions

Additional arguments passed to across_strata_across_trt functions
Argument Names	Description
`strata_val`	A `character` specifying the stratification level under evaluation. For example if `strat_var =="SEX"`, then `strat_val` could be either `"M"` or `"F"`
`treatment_refval`	A `character` specifying the treatment reference level. *Not to be confused with the `treatment_val` that specifies the treatment value for `by_strata_by_trt` functions

Using building-blocks

When possible, utilize building-block functions when making new statistical functions. For example, if the new functions requires a 2x2 table, use the make_two_by_two_() function instead of writing a new one.

This also allows you to easily write functions that collapse several chefStats functions into one function call. For example, on call to count_set() is the same as one call each to n_subj(), n_event(), n_subj_event() and p_subj_event(). The only rational for combining functions like this is to save compute time, due to the way chef pipelines are constructed.

Building block function names are always suffixed with an underscore _ to indicate they cannot be called from inside a chef pipeline. For example, n_event_() is a building block that is used to make n_event(), but n_event_() can also be use to build other functions, such as count_set(), because it does not format it’s output for a chef pipeline. Conversely, n_event() does format the output, so it can be use in a chef pipeline, but not as a building block.