Skip to content

Post-Cohort Analysis

What is Post-Cohort Analysis?

Post-cohort analysis is the process of exploring and summarizing your study population after you have defined your cohort. It helps you answer questions like: Who is in my cohort? What are their characteristics? How do they compare to the rest of the database?

This step is essential for understanding your results, checking for biases, and making your study reproducible and transparent.

Post-cohort analysis in OMOPCDMFeasibility.jl is designed to be simple and clear, even for beginners.

1. create_individual_profiles

OMOPCDMFeasibility.create_individual_profiles Function
julia
create_individual_profiles(;
    cohort_definition_id::Union{Int, Nothing} = nothing,
    cohort_df::Union{DataFrame, Nothing} = nothing,
    conn,
    covariate_funcs::AbstractVector{<:Function},
    schema::String = "dbt_synthea_dev",
    dialect::Symbol = :postgresql
)

Creates individual demographic profile tables for a cohort by analyzing each covariate separately.

This function generates separate DataFrames for each demographic covariate (e.g., gender, race, age group), providing detailed statistics including cohort and database-level percentages for post-cohort feasibility analysis. Results are sorted alphabetically by covariate values for consistent, readable output.

Arguments

  • conn - Database connection using DBInterface

  • covariate_funcs - Vector of covariate functions from OMOPCDMCohortCreator (e.g., GetPatientGender, GetPatientRace)

Keyword Arguments

  • cohort_definition_id - ID of the cohort definition in the cohort table (or nothing). Either this or cohort_df must be provided

  • cohort_df - DataFrame containing cohort with person_id column (or nothing). Either this or cohort_definition_id must be provided

  • schema - Database schema name. Default: "dbt_synthea_dev"

  • dialect - Database dialect. Default: :postgresql (for DuckDB compatibility)

Returns

  • NamedTuple - Named tuple with keys corresponding to covariate names, each containing a DataFrame with covariate categories and statistics

Examples

julia
using OMOPCDMCohortCreator: GetPatientGender, GetPatientRace, GetPatientAgeGroup

individual_profiles = create_individual_profiles(
    cohort_df = my_cohort_df,
    conn = conn,
    covariate_funcs = [GetPatientGender, GetPatientRace, GetPatientAgeGroup]
)
source

2. create_cartesian_profiles

OMOPCDMFeasibility.create_cartesian_profiles Function
julia
create_cartesian_profiles(;
    cohort_definition_id::Union{Int, Nothing} = nothing,
    cohort_df::Union{DataFrame, Nothing} = nothing,
    conn,
    covariate_funcs::AbstractVector{<:Function},
    schema::String = "dbt_synthea_dev",
    dialect::Symbol = :postgresql
)

Creates Cartesian product demographic profiles for a cohort by analyzing all combinations of covariates.

This function generates a single DataFrame containing all possible combinations of demographic covariates (e.g., gender × race × age_group), providing comprehensive cross-tabulated statistics for detailed post-cohort feasibility analysis. Column order matches the input covariate_funcs order, and results are sorted by covariate values for interpretable output.

Arguments

  • conn - Database connection using DBInterface

  • covariate_funcs - Vector of covariate functions from OMOPCDMCohortCreator (must contain at least 2 functions)

Keyword Arguments

  • cohort_definition_id - ID of the cohort definition in the cohort table (or nothing). Either this or cohort_df must be provided

  • cohort_df - DataFrame containing cohort with person_id column (or nothing). Either this or cohort_definition_id must be provided

  • schema - Database schema name. Default: "dbt_synthea_dev"

  • dialect - Database dialect. Default: :postgresql (for DuckDB compatibility)

Returns

  • DataFrame - Cross-tabulated profile table with all covariate combinations and statistics

Examples

julia
using OMOPCDMCohortCreator: GetPatientAgeGroup, GetPatientGender, GetPatientRace

cartesian_profiles = create_cartesian_profiles(
    cohort_df = my_cohort_df,
    conn = conn,
    covariate_funcs = [GetPatientAgeGroup, GetPatientGender, GetPatientRace]
)
source

Example: Post-Cohort Analysis in Practice

julia
using DataFrames, DuckDB, DBInterface, Dates
using OMOPCDMFeasibility
using OMOPCDMCohortCreator:
    GenerateDatabaseDetails,
    GenerateTables,
    GetPatientGender,
    GetPatientAgeGroup,
    GetPatientRace,
    GetPatientEthnicity,
    ConditionFilterPersonIDs

conn = DBInterface.connect(DuckDB.DB, "synthea_1M_3YR.duckdb")

GenerateDatabaseDetails(:postgresql, "dbt_synthea_dev")
GenerateTables(conn)

diabetes_concept_ids = [201826]
cohort_result = ConditionFilterPersonIDs(diabetes_concept_ids, conn)
cohort_ids = cohort_result.person_id

sample_cohort = DataFrame(
    person_id = cohort_ids
)

println("Creating individual demographic profiles...")
individual_demographics = OMOPCDMFeasibility.create_individual_profiles(
    cohort_df=sample_cohort,
    conn=conn,
    covariate_funcs=[GetPatientGender, GetPatientRace, GetPatientAgeGroup]
)

println("Individual profiles:")
for (name, table) in pairs(individual_demographics)
    println("$name:")
    println(table)
    println()
end

println("Creating Cartesian demographic profiles...")
cartesian_demographics = OMOPCDMFeasibility.create_cartesian_profiles(
    cohort_df=sample_cohort,
    conn=conn,
    covariate_funcs=[GetPatientAgeGroup, GetPatientGender, GetPatientRace]
)

println("Cartesian profiles:")
println(cartesian_demographics)

DBInterface.close!(conn)