Post-Cohort Analysis
What is Post-Cohort Analysis?
Post-cohort analysis is the process of exploring and summarizing your study population after you have defined your cohort. It helps you answer questions like: Who is in my cohort? What are their characteristics? How do they compare to the rest of the database?
This step is essential for understanding your results, checking for biases, and making your study reproducible and transparent.
Post-cohort analysis in OMOPCDMFeasibility.jl is designed to be simple and clear, even for beginners.
1. create_individual_profiles
OMOPCDMFeasibility.create_individual_profiles Function
create_individual_profiles(;
cohort_definition_id::Union{Int, Nothing} = nothing,
cohort_df::Union{DataFrame, Nothing} = nothing,
conn,
covariate_funcs::AbstractVector{<:Function},
schema::String = "dbt_synthea_dev",
dialect::Symbol = :postgresql
)Creates individual demographic profile tables for a cohort by analyzing each covariate separately.
This function generates separate DataFrames for each demographic covariate (e.g., gender, race, age group), providing detailed statistics including cohort and database-level percentages for post-cohort feasibility analysis. Results are sorted alphabetically by covariate values for consistent, readable output.
Arguments
conn- Database connection using DBInterfacecovariate_funcs- Vector of covariate functions from OMOPCDMCohortCreator (e.g.,GetPatientGender,GetPatientRace)
Keyword Arguments
cohort_definition_id- ID of the cohort definition in the cohort table (or nothing). Either this orcohort_dfmust be providedcohort_df- DataFrame containing cohort withperson_idcolumn (or nothing). Either this orcohort_definition_idmust be providedschema- Database schema name. Default:"dbt_synthea_dev"dialect- Database dialect. Default::postgresql(for DuckDB compatibility)
Returns
NamedTuple- Named tuple with keys corresponding to covariate names, each containing a DataFrame with covariate categories and statistics
Examples
using OMOPCDMCohortCreator: GetPatientGender, GetPatientRace, GetPatientAgeGroup
individual_profiles = create_individual_profiles(
cohort_df = my_cohort_df,
conn = conn,
covariate_funcs = [GetPatientGender, GetPatientRace, GetPatientAgeGroup]
)2. create_cartesian_profiles
OMOPCDMFeasibility.create_cartesian_profiles Function
create_cartesian_profiles(;
cohort_definition_id::Union{Int, Nothing} = nothing,
cohort_df::Union{DataFrame, Nothing} = nothing,
conn,
covariate_funcs::AbstractVector{<:Function},
schema::String = "dbt_synthea_dev",
dialect::Symbol = :postgresql
)Creates Cartesian product demographic profiles for a cohort by analyzing all combinations of covariates.
This function generates a single DataFrame containing all possible combinations of demographic covariates (e.g., gender × race × age_group), providing comprehensive cross-tabulated statistics for detailed post-cohort feasibility analysis. Column order matches the input covariate_funcs order, and results are sorted by covariate values for interpretable output.
Arguments
conn- Database connection using DBInterfacecovariate_funcs- Vector of covariate functions from OMOPCDMCohortCreator (must contain at least 2 functions)
Keyword Arguments
cohort_definition_id- ID of the cohort definition in the cohort table (or nothing). Either this orcohort_dfmust be providedcohort_df- DataFrame containing cohort withperson_idcolumn (or nothing). Either this orcohort_definition_idmust be providedschema- Database schema name. Default:"dbt_synthea_dev"dialect- Database dialect. Default::postgresql(for DuckDB compatibility)
Returns
DataFrame- Cross-tabulated profile table with all covariate combinations and statistics
Examples
using OMOPCDMCohortCreator: GetPatientAgeGroup, GetPatientGender, GetPatientRace
cartesian_profiles = create_cartesian_profiles(
cohort_df = my_cohort_df,
conn = conn,
covariate_funcs = [GetPatientAgeGroup, GetPatientGender, GetPatientRace]
)Example: Post-Cohort Analysis in Practice
using DataFrames, DuckDB, DBInterface, Dates
using OMOPCDMFeasibility
using OMOPCDMCohortCreator:
GenerateDatabaseDetails,
GenerateTables,
GetPatientGender,
GetPatientAgeGroup,
GetPatientRace,
GetPatientEthnicity,
ConditionFilterPersonIDs
conn = DBInterface.connect(DuckDB.DB, "synthea_1M_3YR.duckdb")
GenerateDatabaseDetails(:postgresql, "dbt_synthea_dev")
GenerateTables(conn)
diabetes_concept_ids = [201826]
cohort_result = ConditionFilterPersonIDs(diabetes_concept_ids, conn)
cohort_ids = cohort_result.person_id
sample_cohort = DataFrame(
person_id = cohort_ids
)
println("Creating individual demographic profiles...")
individual_demographics = OMOPCDMFeasibility.create_individual_profiles(
cohort_df=sample_cohort,
conn=conn,
covariate_funcs=[GetPatientGender, GetPatientRace, GetPatientAgeGroup]
)
println("Individual profiles:")
for (name, table) in pairs(individual_demographics)
println("$name:")
println(table)
println()
end
println("Creating Cartesian demographic profiles...")
cartesian_demographics = OMOPCDMFeasibility.create_cartesian_profiles(
cohort_df=sample_cohort,
conn=conn,
covariate_funcs=[GetPatientAgeGroup, GetPatientGender, GetPatientRace]
)
println("Cartesian profiles:")
println(cartesian_demographics)
DBInterface.close!(conn)