Skip to content

Pre-Cohort Analysis

What is Feasibility Analysis?

Feasibility analysis is the process of checking if your planned study or cohort is possible and meaningful with the data you have. It helps you answer questions like: Are there enough patients? Are the concepts I care about present? Is the data complete and reliable?

What is Pre-Cohort Analysis?

Pre-cohort analysis is the first step in any observational health study. Before you define your study population (the "cohort"), you use pre-cohort tools to explore your OMOP CDM database. This helps you:

  • Understand what data is available

  • Check the frequency and quality of key concepts

  • Plan your study with confidence

Pre-cohort analysis is like scouting the terrain before starting a journey—it helps you avoid surprises and design better, more robust studies.

1. analyze_concept_distribution

OMOPCDMFeasibility.analyze_concept_distribution Function
julia
analyze_concept_distribution(
    conn;
    concept_set::Vector{<:Integer},
    covariate_funcs::AbstractVector{<:Function} = Function[],
    schema::String = "main",
    dialect::Symbol = :postgresql
)

Analyzes the distribution of medical concepts across patient demographics by automatically detecting domains.

Arguments

  • conn - Database connection using DBInterface

  • concept_set - Vector of OMOP concept IDs to analyze; must be subtype of Integer

Keyword Arguments

  • covariate_funcs - Vector of OMOPCDMCohortCreator functions for demographic stratification. Default: Function[]

  • schema - Database schema name. Default: "main"

  • dialect - Database dialect. Default: :postgresql (for DuckDB compatibility)

Returns

  • DataFrame - Summary statistics with columns for concept information, domain, covariate values, and patient counts (count)

Examples

julia
# Basic concept summary with automatic domain detection
df = analyze_concept_distribution(conn; concept_set=[31967, 4059650])

# With demographic breakdown
df = analyze_concept_distribution(
    conn;
    concept_set=[31967, 4059650], 
    covariate_funcs=[GetPatientGender, GetPatientAgeGroup]
)
source

2. generate_summary

OMOPCDMFeasibility.generate_summary Function
julia
generate_summary(
    conn;
    concept_set::Vector{<:Integer},
    covariate_funcs::AbstractVector{<:Function} = Function[],
    schema::String = "main",
    dialect::Symbol = :postgresql,
    raw_values::Bool = false
)

Generates a summary of feasibility metrics for the given concept set.

This function provides high-level summary statistics including total patients, eligible patients, total records, and population coverage metrics. This is useful for getting a quick overview of study feasibility without detailed domain breakdowns.

Arguments

  • conn - Database connection using DBInterface

  • concept_set - Vector of OMOP concept IDs to analyze; must be subtype of Integer

Keyword Arguments

  • covariate_funcs - Vector of OMOPCDMCohortCreator functions for demographic analysis. Default: Function[]

  • schema - Database schema name. Default: "main"

  • dialect - Database dialect. Default: :postgresql (for DuckDB compatibility)

  • raw_values - If true, returns raw numerical values; if false, returns formatted strings. Default: false

Returns

  • DataFrame - Summary metrics with columns: metric, value, interpretation, and domain

Examples

julia
# Get formatted summary (default)
summary = generate_summary(conn; concept_set=[31967, 4059650])

# Get raw numerical values for calculations
summary_raw = generate_summary(conn; concept_set=[31967, 4059650], raw_values=true)
source

3. generate_domain_breakdown

OMOPCDMFeasibility.generate_domain_breakdown Function
julia
generate_domain_breakdown(
    conn;
    concept_set::Vector{<:Integer},
    covariate_funcs::AbstractVector{<:Function} = Function[],
    schema::String = "main",
    dialect::Symbol = :postgresql,
    raw_values::Bool = false
)

Generates a detailed breakdown of feasibility metrics by medical domain.

This function provides domain-specific statistics showing concepts, patients, records, and coverage for each medical domain in the concept set. This is useful for understanding which domains contribute most to study feasibility.

Arguments

  • conn - Database connection using DBInterface

  • concept_set - Vector of OMOP concept IDs to analyze; must be subtype of Integer

Keyword Arguments

  • covariate_funcs - Vector of OMOPCDMCohortCreator functions for demographic analysis. Default: Function[]

  • schema - Database schema name. Default: "main"

  • dialect - Database dialect. Default: :postgresql (for DuckDB compatibility)

  • raw_values - If true, returns raw numerical values; if false, returns formatted strings. Default: false

Returns

  • DataFrame - Domain-specific metrics with columns: metric, value, interpretation, and domain

Examples

julia
# Get formatted breakdown (default)
breakdown = generate_domain_breakdown(conn; concept_set=[31967, 4059650])

# Get raw numerical values for calculations
breakdown_raw = generate_domain_breakdown(conn; concept_set=[31967, 4059650], raw_values=true)
source

Example: Pre-Cohort Analysis in Practice

julia
using DataFrames, DuckDB, DBInterface
using OMOPCDMFeasibility
using OMOPCDMCohortCreator:
    GenerateDatabaseDetails,
    GenerateTables,
    GetPatientGender,
    GetPatientAgeGroup,
    GetPatientRace,
    GetPatientEthnicity,
    ConditionFilterPersonIDs

conn = DBInterface.connect(DuckDB.DB, "synthea_1M_3YR.duckdb")

GenerateDatabaseDetails(:postgresql, "dbt_synthea_dev")
GenerateTables(conn)

concept_ids = [
    31967,    # Condition: Nausea
    1127433,  # Drug: Acetaminophen
]

println("\n")
distribution = OMOPCDMFeasibility.analyze_concept_distribution(
    conn;
    concept_set=concept_ids,
    covariate_funcs=[GetPatientGender, GetPatientRace],
    schema="dbt_synthea_dev"
)
display(distribution)

println("\n")
summary = OMOPCDMFeasibility.generate_summary(
    conn;
    concept_set=concept_ids,
    covariate_funcs=[GetPatientAgeGroup, GetPatientRace],
    schema="dbt_synthea_dev"
)
display(summary)

println("\n")
domain_breakdown = OMOPCDMFeasibility.generate_domain_breakdown(
    conn;
    concept_set=concept_ids,
    schema="dbt_synthea_dev"
)
display(domain_breakdown)
println()

DBInterface.close!(conn)