OMOP CDM Support for HealthTable
Core Goals & Features
The HealthTable interface in HealthBase.jl is designed to make working with OMOP CDM data in Julia easy, robust, and compatible with the Tables.jl ecosystem. The key features include:
Schema-Aware Validation: Instead of just wrapping your data,
HealthTableactively validates it against the official OMOP CDM specification usingOMOPCommonDataModel.jl. This includes:- Column Type Enforcement: Verifies that column types in the input
DataFramematch the official OMOP schema (e.g.,person_idisInt64,condition_start_dateisDate). - Clear Error Reporting: If mismatches exist, the constructor returns detailed messages about all invalid columns or can emit warnings if type enforcement is disabled.
- Metadata Attachment: Attaches OMOP metadata (like
cdmDatatype,standardConcept, etc.) directly to each validated column.
- Column Type Enforcement: Verifies that column types in the input
Preprocessing Utilities: Built-in tools for data preparation include:
one_hot_encode: One-hot encodes categorical variables usingFeatureTransforms.jl.apply_vocabulary_compression: Groups rare categorical values under a shared"Other"label.map_concepts: Maps concept IDs to human-readable concept names using a DuckDB-backedconcepttable.map_concepts!: An in-place variant of concept mapping that modifies the existing table.
Tables.jl Compatibility: The
HealthTabletype implements the fullTables.jlinterface so it can be used with any downstream package in the Julia data ecosystem.JuliaHealth Integration: Designed to interoperate seamlessly with current and future JuliaHealth tools and projects.
Extensible Foundation: The core architecture is extensible future support could include streaming, direct DuckDB views, or remote OMOP datasets.
Tables.jl Interface Sketch
The HealthTable type is the main interface for working with OMOP CDM tables. You construct it by passing in a DataFrame and optionally specifying a CDM version. The constructor will validate the schema and attach metadata. The resulting object:
- Is a wrapper over the validated DataFrame (
ht.source), - Provides schema-aware access to data,
- Can be used anywhere a
Tables.jl-compatible table is expected.
This eliminates the need for a separate wrapping step the constructor itself ensures conformance and returns a ready-to-use tabular object.
In future extensions, similar wrappers could be created for other data sources, such as database queries or streaming sources. These types would implement the same Tables.jl interface to support composable workflows.