Skip to contents

To add new functions to chefStats, you follow four general steps:

  1. Consider what type of function you need: stat_by_strata_by_trt, stat_by_strata_across_trt or stat_across_strata_across_trt (see article Function types.
  2. Decide what information the function needs in order to compute the desired results. See section Interface with chef
  3. Write the function definition (use use_chefStats() to help create the template)
  4. Write the appropriate unit-tests for the function (using testthat framework)

Walk through example

Here we show how we would add a function which counts the number of subjects experiencing the event by stratification level and treatment level, if it didn’t already exist in chefStats (n_subj_event()).

This is the workflow for writing such a function from scratch:

  1. Decide which function type it is (see article Function types. Since the function produces a number by stratification level and by treatment level, this will be a stat_by_strata_by_trt.
  2. Call use_chefStats(fn_name = "num_subj_events", fn_type = "stat_by_strata_by_trt"). This will produce a template R function that has the correct arguments for what is passed by the chef pipeline to stat_by_strata_by_trt-type functions (see section Interface with chef for more details. The skeleton function will look like this:
num_subj_events <- function(dat,
                            event_index,
                            cell_index,
                            strata_var,
                            strata_val,
                            treatment_var,
                            treatment_val,
                            subject_id,
                            ...) {
  # Function body here:

  # The final object retuned needs to be a data.table with the following format:
  return(
    data.table::data.table(
      label = NA_character_,
      value = NA_real_,
      description = NA_character_
    )
  )
}
  1. Remove unneeded arguments. For our specific function, we only need the following variables:
    • cell_index Who was eligible to have the event
    • event_index Who has the event
    • subject_id Allows us to count unique people
    • ... The elipses always have to be included in any chefStat function So we can delete the others arguments. The additional arguments will still be passed to the function by the chef pipeline, but they will be collected by ... and not used.
  2. Modify the function definition for our needs. The final function definition might look like this:
num_subj_events <-
  function(dat,
           event_index,
           cell_index,
           subjectid_var,
           ...) {
    # Please see the "Interface with chef" section for details on what
    # `event_index` and `cell_index`

    # `intersect()` provides us with a vector of rows in `dat` that match both
    # `event_index` and `cell_index` - aka records that were BOTH eligible to
    # have the event (`cell_index`) AND had the event (`event_index`)
    index <- intersect(event_index, cell_index)

    # Return all matching rows in `dat` where `INDEX_`
    # matches `index`.
    event_rows <- dat[INDEX_ %in% index]

    # `dat` contains event data, meaning subjects can appear more than once if
    # they have >1 event, so we need to remove these extra rows to get a proper
    # count
    event_rows_unique_by_subject <- unique(event_rows, by = subjectid_var)

    stat <- NROW(event_rows_unique_by_subject)

    # The return object has to be a data.table object with the following 3
    # columns. The `value` column always has to be a double (not an integer)
    return(
      data.table(
        description = "Number of subjects with events",
        label = "n_subj_events",
        value = as.double(stat)
      )
    )
  }


Interface with chef

The statistical functions from chefStats will be called within the context of a chef pipeline. The category of the function determines what arguments chef makes available to the function.

The following table describes arguments that are passed to all chefStats function:

Arguments always passed to chefStats functions
Argument Names Description
dat A data.table containing the analysis data set produced by the prepare_data function. To allow flexability for creative use of chefStats functions, this dataset is not filtered to the exact records needed for each analysis when passed to chefStats. Instead this filtering is done inside the chefStats functions. This is done using the INDEX_ column from dat that serves as a row ID, and is used for filtering with, for example, cell_index or event_index
event_index A vector of indicies indicating which rows (as specified in INDEX_ column of dat) are considered to be events for the endpoint specification under evaluation
strata_var A character indicating which stratatification is being used (e.g. SEX, AGE, etc)
treatment_var A character indicating the name of the column in dat containing the treatment information used for the endpoint
subjectid_var A character specifying the name of the column in dat containing the subject ID. Defaults to “USUBJID”



The tables below describe the set of arguments chef passes that vary depending on the function type:

Additional arguments passed to by_strata_by_trt functions
Argument Names Description
cell_index A vector of indicies specifying which rows in dat are considered to be part of the analysis for the given strata level and treatment level under evaluation. For example, if the current instance of the function was analysis “Number of Events” for SEX==“M” and TRT01A == “Placebo”, then cell_index would be a vector of records in dat$INDEX_ that match those parameters. You can thus obtain the analysis set by filtering dat via: dat[cell_index %in% INDEX_]
strata_val A character specifying the stratification level under evaluation. For example if strat_var =="SEX", then strat_val could be either "M" or "F"
treatment_val A character specifying the treatment level (or treatment arm). *Not to be confused with the treatment_refval that specifies the reference treatment value.
Additional arguments passed to by_strata_across_trt functions
Argument Names Description
cell_index A vector of indicies specifying which rows in dat are considered to be part of the analysis for the given strata level and treatment level under evaluation. For example, if the current instance of the function was analysis “Number of Events” for SEX==“M” and TRT01A == “Placebo”, then cell_index would be a vector of records in dat$INDEX_ that match those parameters. You can thus obtain the analysis set by filtering dat via: dat[cell_index %in% INDEX_]
strata_val A character specifying the stratification level under evaluation. For example if strat_var =="SEX", then strat_val could be either "M" or "F"
treatment_refval A character specifying the treatment reference level. *Not to be confused with the treatment_val that specifies the treatment value for by_strata_by_trt functions
Additional arguments passed to across_strata_across_trt functions
Argument Names Description
strata_val A character specifying the stratification level under evaluation. For example if strat_var =="SEX", then strat_val could be either "M" or "F"
treatment_refval A character specifying the treatment reference level. *Not to be confused with the treatment_val that specifies the treatment value for by_strata_by_trt functions


Using building-blocks

When possible, utilize building-block functions when making new statistical functions. For example, if the new functions requires a 2x2 table, use the make_two_by_two_() function instead of writing a new one.

This also allows you to easily write functions that collapse several chefStats functions into one function call. For example, on call to count_set() is the same as one call each to n_subj(), n_event(), n_subj_event() and p_subj_event(). The only rational for combining functions like this is to save compute time, due to the way chef pipelines are constructed.

Building block function names are always suffixed with an underscore _ to indicate they cannot be called from inside a chef pipeline. For example, n_event_() is a building block that is used to make n_event(), but n_event_() can also be use to build other functions, such as count_set(), because it does not format it’s output for a chef pipeline. Conversely, n_event() does format the output, so it can be use in a chef pipeline, but not as a building block.