Post-Cohort Analysis

What is Post-Cohort Analysis?

Post-cohort analysis is the process of exploring and summarizing your study population after you have defined your cohort. It helps you answer questions like: Who is in my cohort? What are their characteristics? How do they compare to the rest of the database?

This step is essential for understanding your results, checking for biases, and making your study reproducible and transparent.

Post-cohort analysis in OMOPCDMFeasibility.jl is designed to be simple and clear, even for beginners.

1. `create_individual_profiles`

OMOPCDMFeasibility.create_individual_profiles Function

julia

create_individual_profiles(;
    cohort_definition_id::Union{Int, Nothing} = nothing,
    cohort_df::Union{DataFrame, Nothing} = nothing,
    conn,
    covariate_funcs::AbstractVector{<:Function},
    schema::String = "dbt_synthea_dev",
    dialect::Symbol = :postgresql
)

Creates individual demographic profile tables for a cohort by analyzing each covariate separately.

This function generates separate DataFrames for each demographic covariate (e.g., gender, race, age group), providing detailed statistics including cohort and database-level percentages for post-cohort feasibility analysis. Results are sorted alphabetically by covariate values for consistent, readable output.

Arguments

conn - Database connection using DBInterface
covariate_funcs - Vector of covariate functions from OMOPCDMCohortCreator (e.g., GetPatientGender, GetPatientRace)

Keyword Arguments

cohort_definition_id - ID of the cohort definition in the cohort table (or nothing). Either this or cohort_df must be provided
cohort_df - DataFrame containing cohort with person_id column (or nothing). Either this or cohort_definition_id must be provided
schema - Database schema name. Default: "dbt_synthea_dev"
dialect - Database dialect. Default: :postgresql (for DuckDB compatibility)

Returns

NamedTuple - Named tuple with keys corresponding to covariate names, each containing a DataFrame with covariate categories and statistics

Examples

julia

using OMOPCDMCohortCreator: GetPatientGender, GetPatientRace, GetPatientAgeGroup

individual_profiles = create_individual_profiles(
    cohort_df = my_cohort_df,
    conn = conn,
    covariate_funcs = [GetPatientGender, GetPatientRace, GetPatientAgeGroup]
)

source

2. `create_cartesian_profiles`

OMOPCDMFeasibility.create_cartesian_profiles Function

julia

create_cartesian_profiles(;
    cohort_definition_id::Union{Int, Nothing} = nothing,
    cohort_df::Union{DataFrame, Nothing} = nothing,
    conn,
    covariate_funcs::AbstractVector{<:Function},
    schema::String = "dbt_synthea_dev",
    dialect::Symbol = :postgresql
)

Creates Cartesian product demographic profiles for a cohort by analyzing all combinations of covariates.

This function generates a single DataFrame containing all possible combinations of demographic covariates (e.g., gender × race × age_group), providing comprehensive cross-tabulated statistics for detailed post-cohort feasibility analysis. Column order matches the input covariate_funcs order, and results are sorted by covariate values for interpretable output.

Arguments

conn - Database connection using DBInterface
covariate_funcs - Vector of covariate functions from OMOPCDMCohortCreator (must contain at least 2 functions)

Keyword Arguments

cohort_definition_id - ID of the cohort definition in the cohort table (or nothing). Either this or cohort_df must be provided
cohort_df - DataFrame containing cohort with person_id column (or nothing). Either this or cohort_definition_id must be provided
schema - Database schema name. Default: "dbt_synthea_dev"
dialect - Database dialect. Default: :postgresql (for DuckDB compatibility)

Returns

DataFrame - Cross-tabulated profile table with all covariate combinations and statistics

Examples

julia

using OMOPCDMCohortCreator: GetPatientAgeGroup, GetPatientGender, GetPatientRace

cartesian_profiles = create_cartesian_profiles(
    cohort_df = my_cohort_df,
    conn = conn,
    covariate_funcs = [GetPatientAgeGroup, GetPatientGender, GetPatientRace]
)

source

Example: Post-Cohort Analysis in Practice

julia

using DataFrames, DuckDB, DBInterface, Dates
using OMOPCDMFeasibility
using OMOPCDMCohortCreator:
    GenerateDatabaseDetails,
    GenerateTables,
    GetPatientGender,
    GetPatientAgeGroup,
    GetPatientRace,
    GetPatientEthnicity,
    ConditionFilterPersonIDs

conn = DBInterface.connect(DuckDB.DB, "synthea_1M_3YR.duckdb")

GenerateDatabaseDetails(:postgresql, "dbt_synthea_dev")
GenerateTables(conn)

diabetes_concept_ids = [201826]
cohort_result = ConditionFilterPersonIDs(diabetes_concept_ids, conn)
cohort_ids = cohort_result.person_id

sample_cohort = DataFrame(
    person_id = cohort_ids
)

println("Creating individual demographic profiles...")
individual_demographics = OMOPCDMFeasibility.create_individual_profiles(
    cohort_df=sample_cohort,
    conn=conn,
    covariate_funcs=[GetPatientGender, GetPatientRace, GetPatientAgeGroup]
)

println("Individual profiles:")
for (name, table) in pairs(individual_demographics)
    println("$name:")
    println(table)
    println()
end

println("Creating Cartesian demographic profiles...")
cartesian_demographics = OMOPCDMFeasibility.create_cartesian_profiles(
    cohort_df=sample_cohort,
    conn=conn,
    covariate_funcs=[GetPatientAgeGroup, GetPatientGender, GetPatientRace]
)

println("Cartesian profiles:")
println(cartesian_demographics)

DBInterface.close!(conn)

Post-Cohort Analysis ​

1. create_individual_profiles ​

2. create_cartesian_profiles ​

Example: Post-Cohort Analysis in Practice ​

Post-Cohort Analysis

1. `create_individual_profiles`

2. `create_cartesian_profiles`

Example: Post-Cohort Analysis in Practice