HealthTable: Preprocessing Functions
This page documents the preprocessing and transformation functions available for HealthTable objects when working with OMOP CDM data. These functions are provided by the OMOP CDM extension and enable data preparation workflows for machine learning and analysis.
One-Hot Encoding
Transform categorical variables into binary indicator columns suitable for machine learning algorithms.
HealthBase.one_hot_encode — Functionone_hot_encode(ht::HealthTable; cols, drop_original=true, return_features_only=false)One-hot encode the categorical columns in ht using FeatureTransforms.jl.
For every requested column the function appends Boolean indicator columns — one per unique (non-missing) level. New columns are named col_value, e.g. gender_concept_id_8507.
Boolean source columns are detected and skipped automatically with a warning.
Arguments
ht::HealthTable: Table to transform (schema-aware).
Keyword Arguments
cols::Vector{Symbol}: Categorical columns to encode.drop_original::Bool=true: Drop the source columns after encoding.return_features_only::Bool=false: Iftruereturn a DataFrame containing only the encoded data; iffalsewrap the result in aHealthTablewithdisable_type_enforcement=true(because the output is no longer standard OMOP CDM).
Returns
DataFrameorHealthTabledepending onreturn_features_only.
Example
ht_ohe = one_hot_encode(ht; cols = [:gender_concept_id, :race_concept_id])
X = one_hot_encode(ht; cols = [:gender_concept_id], return_features_only = true) # ML featuresVocabulary Compression
Reduce the dimensionality of categorical variables by grouping infrequent levels under a common label.
HealthBase.apply_vocabulary_compression — Functionapply_vocabulary_compression(ht::HealthTable; cols, min_freq=10, other_label="Other")Group infrequent categorical levels under a single other label.
Arguments
ht::HealthTable: Input data table.
Keyword Arguments
cols::Vector{Symbol}: Columns to compress.min_freq::Int=10: Minimum frequency for a value to remain unchanged.other_label::String="Other": Label used to replace infrequent values.drop_original::Bool=false: Whether to drop original columns after compression.
Returns
HealthTable: Table with compressed categorical levels.
Examples
ht_small = apply_vocabulary_compression(ht; cols=[:condition_source_value], min_freq=5)Concept Translation
Concept Mapping (Immutable)
Map OMOP concept IDs to human-readable concept names using the OMOP vocabulary tables, returning a new HealthTable.
HealthBase.map_concepts — Functionmap_concepts(ht::HealthTable, col::Symbol, new_col::String, conn::DuckDB.DB; drop_original::Bool = false, concept_table::String = "concept", schema::String = "main")Map concept IDs in a column to their corresponding concept names using the OMOP concept table. Only direct mappings using concept IDs are supported.
Arguments
ht::HealthTable: Input OMOP data table.cols::Union{Symbol, Vector{Symbol}}: Column(s) containing concept IDs.conn::DuckDB.DB: Database connection for concept lookup.
Keyword Arguments
new_cols: Name(s) for output columns. If not provided, usescol * suffix.suffix::String="_mapped": Suffix for default new column names.drop_original::Bool=false: Drop source column(s) after mapping.concept_table::String="concept": Table name for concepts.schema::String="main": Schema containing the concept table.
Returns
- A new
HealthTablewith the concept names added innew_col.
Example
conn = DBInterface.connect(DuckDB.DB, "path/to/db/.duckdb")
# Map gender_concept_id to concept_name
ht_mapped = map_concepts(ht, :gender_concept_id, "gender_name", conn; schema = "dbt_synthea_dev")Concept Mapping (In-Place)
In-place version of concept mapping that modifies the original HealthTable directly for memory efficiency.
HealthBase.map_concepts! — Functionmap_concepts!(ht::HealthTable, cols, conn; ...)In-place version of map_concepts. Maps concept IDs to human-readable names using the OMOP concept table.
Arguments
ht::HealthTable: The table to update.cols: Single column or list of columns with concept IDs.conn::DuckDB.DB: Connection to the OMOP database.
Keyword Arguments
new_cols: Optional new column names. Defaults tocol * "_mapped".suffix: Suffix used whennew_colsis not provided.drop_original: Whether to drop the original columns.concept_table,schema: Source table and schema.
Returns
- The mutated
HealthTable.
Example
conn = DBInterface.connect(DuckDB.DB, "path/to/db/.duckdb")
# Map gender_concept_id to concept_name in-place
map_concepts!(ht, :gender_concept_id, conn; new_cols="gender_name", schema="dbt_synthea_dev")