This function can map ICD occurrences to phecode occurrences, calculate weights for each phecode, and calculate raw and residual phenotype risk scores.

## Usage

phers(
demos,
icdOccurrences,
diseasePhecodeMap,
icdPhecodeMap = phers::icdPhecodeMap,
dxIcd = phers::diseaseDxIcdMap,
weights = NULL,
method = c("prevalence", "logistic", "cox", "loglinear"),
methodFormula = NULL,
dopar = FALSE,
residScoreFormula = NULL
)

## Arguments

demos

A data.table having one row per person in the cohort. Must have a column person_id. The cox method also requires first_age and last_age columns corresponding to the first and last age of visit (in years).

icdOccurrences

A data.table of occurrences of ICD codes for each person in the cohort. Must have columns person_id, icd, and flag. The cox and "loglinear methods require an additional occurrence_age column corresponding to the age (in years) a person acquired an ICD code.

diseasePhecodeMap

A data.table of the mapping between diseases and phecodes. Must have columns disease_id and phecode.

icdPhecodeMap

A data.table of the mapping between ICD codes and phecodes. Must have columns icd, phecode, and flag. Default is the map included in this package.

dxIcd

A data.table of ICD codes to exclude from mapping to phecodes. Must have columns icd and flag. Default is the table of Mendelian diseases and the corresponding ICD codes that indicate a genetic diagnosis. If NULL, no ICD codes will be excluded.

weights

A data.table of phecodes and their corresponding weights. Must have columns phecode and w. If NULL (the default), weights will be calculated based on data for the cohort provided. If the cohort is small or its phecode prevalences do not reflect those in the population of interest, it is recommended to use preCalcWeights.

method

A string indicating the statistical model for calculating weights.

methodFormula

A formula representing the right hand side of the model corresponding to method. All terms in the formula must correspond to columns in demos. A method formula is not required for the prevalence method.

dopar

Logical indicating whether to calculate the weights in parallel if a parallel backend is already set up, e.g., using doParallel::registerDoParallel(). Recommended to minimize runtime.

residScoreFormula

A formula representing the linear model to use for calculating residual scores. All terms in the formula must correspond to columns in demos. If NULL, no residual scores will be calculated.

## Value

A list with elements:

• phecodeOccurences: A data.table of phecode occurrences for each person in the cohort.

• weights: A data.table of phecodes and their corresponding weights.

• scores: A data.table of raw and possibly residual phenotype risk scores for each person and each disease.

getPhecodeOccurrences(), getWeights(), getScores(), getResidualScores(), mapDiseaseToPhecode(), icdPhecodeMap, diseaseDxIcdMap, preCalcWeights, getDxStatus()

## Examples

library('data.table')

# OMIM disease IDs for which to calculate phenotype risk scores
diseaseId = 154700

# map diseases to phecodes
diseasePhecodeMap = mapDiseaseToPhecode()
diseasePhecodeMap = diseasePhecodeMap[disease_id == diseaseId]

# calculate raw and residal scores using weights based on the sample cohort
scores = phers(
demoSample, icdSample, diseasePhecodeMap, residScoreFormula = ~ sex)

# calculate scores using pre-calculated weights
scores = phers(
demoSample, icdSample, diseasePhecodeMap,
weights = phers::preCalcWeights, residScoreFormula = ~ sex)