OMOPCDMFeasibility
Documentation for OMOPCDMFeasibility.
OMOPCDMFeasibility._concept_colOMOPCDMFeasibility._counter_reducerOMOPCDMFeasibility._create_cartesian_profile_tableOMOPCDMFeasibility._create_individual_profile_tableOMOPCDMFeasibility._domain_id_to_tableOMOPCDMFeasibility._format_numberOMOPCDMFeasibility._funsqlOMOPCDMFeasibility._get_category_nameOMOPCDMFeasibility._get_cohort_person_idsOMOPCDMFeasibility._get_concept_nameOMOPCDMFeasibility._get_concepts_by_domainOMOPCDMFeasibility._get_database_total_patientsOMOPCDMFeasibility._get_person_ids_from_cohort_tableOMOPCDMFeasibility._get_person_ids_from_dataframeOMOPCDMFeasibility._resolve_tableOMOPCDMFeasibility._setup_domain_queryOMOPCDMFeasibility.analyze_concept_distributionOMOPCDMFeasibility.analyze_concept_distributionOMOPCDMFeasibility.create_cartesian_profilesOMOPCDMFeasibility.create_cartesian_profilesOMOPCDMFeasibility.create_individual_profilesOMOPCDMFeasibility.create_individual_profilesOMOPCDMFeasibility.generate_domain_breakdownOMOPCDMFeasibility.generate_domain_breakdownOMOPCDMFeasibility.generate_summaryOMOPCDMFeasibility.generate_summary
OMOPCDMFeasibility._concept_col Method
_concept_col(tblsym::Symbol) -> SymbolGenerates the concept column name for a given table symbol.
This is an internal helper function that constructs the appropriate concept column name based on table naming conventions. Special handling is provided for the person table which uses gender_concept_id.
Arguments
tblsym- The table symbol
Returns
Symbol- The concept column name for that table
Examples
col = _concept_col(:condition_occurrence)
# Returns: :condition_concept_id
col = _concept_col(:person)
# Returns: :gender_concept_idOMOPCDMFeasibility._counter_reducer Method
_counter_reducer(sub, funcs) -> AnyApplies a sequence of functions to a subject, reducing through function composition.
This internal helper function sequentially applies each function in the funcs vector to the result of the previous function, starting with sub.
Arguments
sub- Initial subject/input to transformfuncs- Vector of functions to apply sequentially
Returns
Any- Result after applying all functions
Examples
result = _counter_reducer([1,2,3], [x -> x .* 2, sum])
# Equivalent to: sum([1,2,3] .* 2) = sum([2,4,6]) = 12OMOPCDMFeasibility._create_cartesian_profile_table Method
_create_cartesian_profile_table(df, cols, cohort_size, database_size, conn; schema="dbt_synthea_dev", dialect=:postgresql)Create a Cartesian product profile table with all covariate combinations.
Arguments
df- DataFrame with demographic datacols- Vector of column names to include in combinationscohort_size- Total cohort sizedatabase_size- Total database population sizeconn- Database connection objectschema- Database schema name (default: "dbt_synthea_dev")dialect- SQL dialect (default: :postgresql)
Returns
DataFrame- Table with all covariate combinations and statistics
OMOPCDMFeasibility._create_individual_profile_table Method
_create_individual_profile_table(df, col, cohort_size, database_size, conn; schema="dbt_synthea_dev", dialect=:postgresql)Create an individual profile table for a single covariate column.
Arguments
df- DataFrame with demographic datacol- Column name to profilecohort_size- Total cohort sizedatabase_size- Total database population sizeconn- Database connection objectschema- Database schema name (default: "dbt_synthea_dev")dialect- SQL dialect (default: :postgresql)
Returns
DataFrame- Profile table with covariate categories and statistics
OMOPCDMFeasibility._domain_id_to_table Method
_domain_id_to_table(domain_id::String) -> SymbolMaps OMOP domain_id strings to their corresponding database table symbols.
This function provides the mapping between OMOP domain classifications and the actual database tables where those concepts are stored. It includes special handling for person-related domains and falls back to a naming convention for unknown domains.
Arguments
domain_id- OMOP domain identifier string (e.g., "Condition", "Drug")
Returns
Symbol- Database table symbol (e.g., :condition_occurrence, :drug_exposure)
Examples
table = _domain_id_to_table("Condition")
# Returns: :condition_occurrence
table = _domain_id_to_table("Gender")
# Returns: :person
table = _domain_id_to_table("CustomDomain")
# Returns: :customdomain_occurrenceOMOPCDMFeasibility._format_number Method
_format_number(n) -> StringFormats a number into a human-readable string with appropriate scaling.
This utility function formats numbers using common abbreviations:
Numbers ≥ 1,000,000 are formatted as "X.XM" (millions)
Numbers ≥ 1,000 are formatted as "X.XK" (thousands)
Numbers < 1,000 are formatted as integers with ties rounded up
Arguments
n- Number to format
Returns
String- Formatted number string
Examples
_format_number(1234567) # Returns: "1.2M"
_format_number(5432) # Returns: "5.4K"
_format_number(123) # Returns: "123"
_format_number(0.5) # Returns: "1"OMOPCDMFeasibility._funsql Method
_funsql(conn; schema::String="main", dialect::Symbol=:postgresql) -> SQLConnectionCreates a FunSQL connection with database schema reflection.
This internal function sets up a FunSQL SQLConnection with the appropriate database dialect and schema reflection for query building. Use :postgresql for DuckDB and :sqlite for SQLite.
Arguments
conn- Raw database connection
Keyword Arguments
schema- Database schema name. Default:"main"dialect- Database dialect. Default::postgresql(for DuckDB compatibility)
Returns
SQLConnection- FunSQL connection object with reflected schema
OMOPCDMFeasibility._get_category_name Method
_get_category_name(value, col, conn; schema="dbt_synthea_dev", dialect=:postgresql)Get the human-readable category name for a covariate value.
Arguments
value- The value to convert (concept ID or string)col- The column nameconn- Database connection objectschema- Database schema name (default: "dbt_synthea_dev")dialect- SQL dialect (default: :postgresql)
Returns
String- Human-readable category name
OMOPCDMFeasibility._get_cohort_person_ids Method
_get_cohort_person_ids(cohort_definition_id, cohort_df, conn; schema="dbt_synthea_dev")Extract person IDs from either a cohort definition ID or a cohort DataFrame.
Arguments
cohort_definition_id- ID of the cohort definition in the cohort table (or nothing)cohort_df- DataFrame containing cohort withperson_idcolumn (or nothing)conn- Database connection objectschema- Database schema name (default: "dbt_synthea_dev")
Returns
Vector- Vector of unique person IDs
Notes
You must provide exactly one of
cohort_definition_idorcohort_df(not both).If both are provided, an error is thrown.
OMOPCDMFeasibility._get_concept_name Method
_get_concept_name(concept_id, conn; schema="main", dialect=:postgresql) -> StringRetrieves the human-readable name for a given OMOP concept ID.
Arguments
concept_id- OMOP CDM concept ID to look upconn- Database connection using DBInterface
Keyword Arguments
schema- Database schema name. Default:"main"dialect- Database dialect. Default::postgresql(for DuckDB compatibility)
Returns
String- The concept name, or "Unknown" if the concept ID is not found
Examples
name = _get_concept_name(8507, conn)
# Returns: "Male"
name = _get_concept_name(999999, conn)
# Returns: "Unknown"OMOPCDMFeasibility._get_concepts_by_domain Method
_get_concepts_by_domain(concept_ids::Vector{<:Integer}, conn; schema="main", dialect=:postgresql) -> Dict{String, Vector{Int}}Groups a list of OMOP concept IDs by their domain classification.
This function queries the concept table to determine which domain each concept belongs to (e.g., "Condition", "Drug", "Procedure") and returns them grouped by domain.
Arguments
concept_ids- Vector of OMOP concept IDs to classifyconn- Database connection using DBInterface
Keyword Arguments
schema- Database schema name. Default:"main"dialect- Database dialect. Default::postgresql(for DuckDB compatibility)
Returns
Dict{String, Vector{Int}}- Dictionary mapping domain names to vectors of concept IDs
Examples
concepts = [201820, 192671, 1503297]
domains = _get_concepts_by_domain(concepts, conn)
# Returns: Dict("Condition" => [201820, 192671], "Drug" => [1503297])OMOPCDMFeasibility._get_database_total_patients Method
_get_database_total_patients(conn; schema="dbt_synthea_dev")Get the total number of patients in the database.
Arguments
conn- Database connection objectschema- Database schema name (default: "dbt_synthea_dev")
Returns
Int- Total count of people in the person table
OMOPCDMFeasibility._get_person_ids_from_cohort_table Method
_get_person_ids_from_cohort_table(cohort_definition_id, conn; schema="dbt_synthea_dev")Extract person IDs from the cohort table using a cohort definition ID.
Arguments
cohort_definition_id- ID of the cohort definitionconn- Database connection objectschema- Database schema name (default: "dbt_synthea_dev")
Returns
Vector- Vector of unique person IDs (subject_id from cohort table)
OMOPCDMFeasibility._get_person_ids_from_dataframe Method
_get_person_ids_from_dataframe(cohort_df)Extract person IDs from a cohort DataFrame.
Arguments
cohort_df- DataFrame containing cohort withperson_idcolumn
Returns
Vector- Vector of unique person IDs from the DataFrame
OMOPCDMFeasibility._resolve_table Method
_resolve_table(fconn::SQLConnection, tblsym::Symbol) -> TableResolves a table symbol to its corresponding FunSQL table object.
This internal function looks up a table by name in the FunSQL catalog, performing case-insensitive matching.
Arguments
fconn- FunSQL SQLConnection objecttblsym- Table symbol to resolve
Returns
Table- FunSQL table object
Throws
ErrorException- If the table is not found in the catalog
OMOPCDMFeasibility._setup_domain_query Method
_setup_domain_query(conn; domain::Symbol, schema::String="main", dialect::Symbol=:postgresql) -> NamedTupleSets up the necessary components for querying a specific domain table.
This internal function prepares all the components needed to query a domain-specific table including the FunSQL connection, resolved table objects, and appropriate concept column name.
Arguments
conn- Database connection
Keyword Arguments
domain- Domain table symbol (e.g., :condition_occurrence)schema- Database schema name. Default:"main"dialect- Database dialect. Default::postgresql(for DuckDB compatibility)
Returns
NamedTuple- Contains fconn, tbl, concept_table, and concept_col components
Examples
setup = _setup_domain_query(conn; domain=:condition_occurrence)
# Returns: (fconn=..., tbl=..., concept_table=..., concept_col=:condition_concept_id)OMOPCDMFeasibility.analyze_concept_distribution Method
analyze_concept_distribution(
conn;
concept_set::Vector{<:Integer},
covariate_funcs::AbstractVector{<:Function} = Function[],
schema::String = "main",
dialect::Symbol = :postgresql
)Analyzes the distribution of medical concepts across patient demographics by automatically detecting domains.
Arguments
conn- Database connection using DBInterfaceconcept_set- Vector of OMOP concept IDs to analyze; must be subtype ofInteger
Keyword Arguments
covariate_funcs- Vector of OMOPCDMCohortCreator functions for demographic stratification. Default:Function[]schema- Database schema name. Default:"main"dialect- Database dialect. Default::postgresql(for DuckDB compatibility)
Returns
DataFrame- Summary statistics with columns for concept information, domain, covariate values, and patient counts (count)
Examples
# Basic concept summary with automatic domain detection
df = analyze_concept_distribution(conn; concept_set=[31967, 4059650])
# With demographic breakdown
df = analyze_concept_distribution(
conn;
concept_set=[31967, 4059650],
covariate_funcs=[GetPatientGender, GetPatientAgeGroup]
)OMOPCDMFeasibility.create_cartesian_profiles Method
create_cartesian_profiles(;
cohort_definition_id::Union{Int, Nothing} = nothing,
cohort_df::Union{DataFrame, Nothing} = nothing,
conn,
covariate_funcs::AbstractVector{<:Function},
schema::String = "dbt_synthea_dev",
dialect::Symbol = :postgresql
)Creates Cartesian product demographic profiles for a cohort by analyzing all combinations of covariates.
This function generates a single DataFrame containing all possible combinations of demographic covariates (e.g., gender × race × age_group), providing comprehensive cross-tabulated statistics for detailed post-cohort feasibility analysis. Column order matches the input covariate_funcs order, and results are sorted by covariate values for interpretable output.
Arguments
conn- Database connection using DBInterfacecovariate_funcs- Vector of covariate functions from OMOPCDMCohortCreator (must contain at least 2 functions)
Keyword Arguments
cohort_definition_id- ID of the cohort definition in the cohort table (or nothing). Either this orcohort_dfmust be providedcohort_df- DataFrame containing cohort withperson_idcolumn (or nothing). Either this orcohort_definition_idmust be providedschema- Database schema name. Default:"dbt_synthea_dev"dialect- Database dialect. Default::postgresql(for DuckDB compatibility)
Returns
DataFrame- Cross-tabulated profile table with all covariate combinations and statistics
Examples
using OMOPCDMCohortCreator: GetPatientAgeGroup, GetPatientGender, GetPatientRace
cartesian_profiles = create_cartesian_profiles(
cohort_df = my_cohort_df,
conn = conn,
covariate_funcs = [GetPatientAgeGroup, GetPatientGender, GetPatientRace]
)OMOPCDMFeasibility.create_individual_profiles Method
create_individual_profiles(;
cohort_definition_id::Union{Int, Nothing} = nothing,
cohort_df::Union{DataFrame, Nothing} = nothing,
conn,
covariate_funcs::AbstractVector{<:Function},
schema::String = "dbt_synthea_dev",
dialect::Symbol = :postgresql
)Creates individual demographic profile tables for a cohort by analyzing each covariate separately.
This function generates separate DataFrames for each demographic covariate (e.g., gender, race, age group), providing detailed statistics including cohort and database-level percentages for post-cohort feasibility analysis. Results are sorted alphabetically by covariate values for consistent, readable output.
Arguments
conn- Database connection using DBInterfacecovariate_funcs- Vector of covariate functions from OMOPCDMCohortCreator (e.g.,GetPatientGender,GetPatientRace)
Keyword Arguments
cohort_definition_id- ID of the cohort definition in the cohort table (or nothing). Either this orcohort_dfmust be providedcohort_df- DataFrame containing cohort withperson_idcolumn (or nothing). Either this orcohort_definition_idmust be providedschema- Database schema name. Default:"dbt_synthea_dev"dialect- Database dialect. Default::postgresql(for DuckDB compatibility)
Returns
NamedTuple- Named tuple with keys corresponding to covariate names, each containing a DataFrame with covariate categories and statistics
Examples
using OMOPCDMCohortCreator: GetPatientGender, GetPatientRace, GetPatientAgeGroup
individual_profiles = create_individual_profiles(
cohort_df = my_cohort_df,
conn = conn,
covariate_funcs = [GetPatientGender, GetPatientRace, GetPatientAgeGroup]
)OMOPCDMFeasibility.generate_domain_breakdown Method
generate_domain_breakdown(
conn;
concept_set::Vector{<:Integer},
covariate_funcs::AbstractVector{<:Function} = Function[],
schema::String = "main",
dialect::Symbol = :postgresql,
raw_values::Bool = false
)Generates a detailed breakdown of feasibility metrics by medical domain.
This function provides domain-specific statistics showing concepts, patients, records, and coverage for each medical domain in the concept set. This is useful for understanding which domains contribute most to study feasibility.
Arguments
conn- Database connection using DBInterfaceconcept_set- Vector of OMOP concept IDs to analyze; must be subtype ofInteger
Keyword Arguments
covariate_funcs- Vector of OMOPCDMCohortCreator functions for demographic analysis. Default:Function[]schema- Database schema name. Default:"main"dialect- Database dialect. Default::postgresql(for DuckDB compatibility)raw_values- If true, returns raw numerical values; if false, returns formatted strings. Default:false
Returns
DataFrame- Domain-specific metrics with columns:metric,value,interpretation, anddomain
Examples
# Get formatted breakdown (default)
breakdown = generate_domain_breakdown(conn; concept_set=[31967, 4059650])
# Get raw numerical values for calculations
breakdown_raw = generate_domain_breakdown(conn; concept_set=[31967, 4059650], raw_values=true)OMOPCDMFeasibility.generate_summary Method
generate_summary(
conn;
concept_set::Vector{<:Integer},
covariate_funcs::AbstractVector{<:Function} = Function[],
schema::String = "main",
dialect::Symbol = :postgresql,
raw_values::Bool = false
)Generates a summary of feasibility metrics for the given concept set.
This function provides high-level summary statistics including total patients, eligible patients, total records, and population coverage metrics. This is useful for getting a quick overview of study feasibility without detailed domain breakdowns.
Arguments
conn- Database connection using DBInterfaceconcept_set- Vector of OMOP concept IDs to analyze; must be subtype ofInteger
Keyword Arguments
covariate_funcs- Vector of OMOPCDMCohortCreator functions for demographic analysis. Default:Function[]schema- Database schema name. Default:"main"dialect- Database dialect. Default::postgresql(for DuckDB compatibility)raw_values- If true, returns raw numerical values; if false, returns formatted strings. Default:false
Returns
DataFrame- Summary metrics with columns:metric,value,interpretation, anddomain
Examples
# Get formatted summary (default)
summary = generate_summary(conn; concept_set=[31967, 4059650])
# Get raw numerical values for calculations
summary_raw = generate_summary(conn; concept_set=[31967, 4059650], raw_values=true)