Pre-Cohort Analysis
What is Feasibility Analysis?
Feasibility analysis is the process of checking if your planned study or cohort is possible and meaningful with the data you have. It helps you answer questions like: Are there enough patients? Are the concepts I care about present? Is the data complete and reliable?
What is Pre-Cohort Analysis?
Pre-cohort analysis is the first step in any observational health study. Before you define your study population (the "cohort"), you use pre-cohort tools to explore your OMOP CDM database. This helps you:
Understand what data is available
Check the frequency and quality of key concepts
Plan your study with confidence
Pre-cohort analysis is like scouting the terrain before starting a journey—it helps you avoid surprises and design better, more robust studies.
1. analyze_concept_distribution
OMOPCDMFeasibility.analyze_concept_distribution Function
analyze_concept_distribution(
conn;
concept_set::Vector{<:Integer},
covariate_funcs::AbstractVector{<:Function} = Function[],
schema::String = "main",
dialect::Symbol = :postgresql
)Analyzes the distribution of medical concepts across patient demographics by automatically detecting domains.
Arguments
conn- Database connection using DBInterfaceconcept_set- Vector of OMOP concept IDs to analyze; must be subtype ofInteger
Keyword Arguments
covariate_funcs- Vector of OMOPCDMCohortCreator functions for demographic stratification. Default:Function[]schema- Database schema name. Default:"main"dialect- Database dialect. Default::postgresql(for DuckDB compatibility)
Returns
DataFrame- Summary statistics with columns for concept information, domain, covariate values, and patient counts (count)
Examples
# Basic concept summary with automatic domain detection
df = analyze_concept_distribution(conn; concept_set=[31967, 4059650])
# With demographic breakdown
df = analyze_concept_distribution(
conn;
concept_set=[31967, 4059650],
covariate_funcs=[GetPatientGender, GetPatientAgeGroup]
)2. generate_summary
OMOPCDMFeasibility.generate_summary Function
generate_summary(
conn;
concept_set::Vector{<:Integer},
covariate_funcs::AbstractVector{<:Function} = Function[],
schema::String = "main",
dialect::Symbol = :postgresql,
raw_values::Bool = false
)Generates a summary of feasibility metrics for the given concept set.
This function provides high-level summary statistics including total patients, eligible patients, total records, and population coverage metrics. This is useful for getting a quick overview of study feasibility without detailed domain breakdowns.
Arguments
conn- Database connection using DBInterfaceconcept_set- Vector of OMOP concept IDs to analyze; must be subtype ofInteger
Keyword Arguments
covariate_funcs- Vector of OMOPCDMCohortCreator functions for demographic analysis. Default:Function[]schema- Database schema name. Default:"main"dialect- Database dialect. Default::postgresql(for DuckDB compatibility)raw_values- If true, returns raw numerical values; if false, returns formatted strings. Default:false
Returns
DataFrame- Summary metrics with columns:metric,value,interpretation, anddomain
Examples
# Get formatted summary (default)
summary = generate_summary(conn; concept_set=[31967, 4059650])
# Get raw numerical values for calculations
summary_raw = generate_summary(conn; concept_set=[31967, 4059650], raw_values=true)3. generate_domain_breakdown
OMOPCDMFeasibility.generate_domain_breakdown Function
generate_domain_breakdown(
conn;
concept_set::Vector{<:Integer},
covariate_funcs::AbstractVector{<:Function} = Function[],
schema::String = "main",
dialect::Symbol = :postgresql,
raw_values::Bool = false
)Generates a detailed breakdown of feasibility metrics by medical domain.
This function provides domain-specific statistics showing concepts, patients, records, and coverage for each medical domain in the concept set. This is useful for understanding which domains contribute most to study feasibility.
Arguments
conn- Database connection using DBInterfaceconcept_set- Vector of OMOP concept IDs to analyze; must be subtype ofInteger
Keyword Arguments
covariate_funcs- Vector of OMOPCDMCohortCreator functions for demographic analysis. Default:Function[]schema- Database schema name. Default:"main"dialect- Database dialect. Default::postgresql(for DuckDB compatibility)raw_values- If true, returns raw numerical values; if false, returns formatted strings. Default:false
Returns
DataFrame- Domain-specific metrics with columns:metric,value,interpretation, anddomain
Examples
# Get formatted breakdown (default)
breakdown = generate_domain_breakdown(conn; concept_set=[31967, 4059650])
# Get raw numerical values for calculations
breakdown_raw = generate_domain_breakdown(conn; concept_set=[31967, 4059650], raw_values=true)Example: Pre-Cohort Analysis in Practice
using DataFrames, DuckDB, DBInterface
using OMOPCDMFeasibility
using OMOPCDMCohortCreator:
GenerateDatabaseDetails,
GenerateTables,
GetPatientGender,
GetPatientAgeGroup,
GetPatientRace,
GetPatientEthnicity,
ConditionFilterPersonIDs
conn = DBInterface.connect(DuckDB.DB, "synthea_1M_3YR.duckdb")
GenerateDatabaseDetails(:postgresql, "dbt_synthea_dev")
GenerateTables(conn)
concept_ids = [
31967, # Condition: Nausea
1127433, # Drug: Acetaminophen
]
println("\n")
distribution = OMOPCDMFeasibility.analyze_concept_distribution(
conn;
concept_set=concept_ids,
covariate_funcs=[GetPatientGender, GetPatientRace],
schema="dbt_synthea_dev"
)
display(distribution)
println("\n")
summary = OMOPCDMFeasibility.generate_summary(
conn;
concept_set=concept_ids,
covariate_funcs=[GetPatientAgeGroup, GetPatientRace],
schema="dbt_synthea_dev"
)
display(summary)
println("\n")
domain_breakdown = OMOPCDMFeasibility.generate_domain_breakdown(
conn;
concept_set=concept_ids,
schema="dbt_synthea_dev"
)
display(domain_breakdown)
println()
DBInterface.close!(conn)