Setup and Cohort Building
This page walks through the actual code from run.jl and 02_cohort_definition.jl - loading configuration, connecting to the database, and building cohorts.
Configuration File
The workflow is driven by a single config.toml file. Copy config.toml.example -> config.toml and set your paths.
[database]
path = "/path/to/your/omop_cdm.duckdb"
[schema]
name = "dbt_synthea_dev"
[cohorts]
target_json = "data/definitions/Hypertension.json"
outcome_json = "data/definitions/Pneumonia.json"
target_cohort_id = 1
outcome_cohort_id = 2
target_label = "Hypertension (target)"
outcome_label = "Pneumonia (outcome)"Loading Configuration
The pipeline starts by reading config.toml and validating that all paths exist.
using TOML
config_file = joinpath(@__DIR__, "config.toml")
if !isfile(config_file)
error("""
config.toml not found.
Copy config.toml.example -> config.toml and set your paths.
""")
end
config = TOML.parsefile(config_file)
const DB_PATH = config["database"]["path"]
const SCHEMA = config["schema"]["name"]
if !isfile(DB_PATH)
error("DuckDB file not found at: $DB_PATH")
endFrom 01_data_loader.jl, the cohort JSON paths are validated:
target_json = joinpath(@__DIR__, "..", config["cohorts"]["target_json"])
outcome_json = joinpath(@__DIR__, "..", config["cohorts"]["outcome_json"])
if !isfile(target_json)
error("Target JSON not found: $target_json")
end
if !isfile(outcome_json)
error("Outcome JSON not found: $outcome_json")
end
println("Target: $(basename(target_json))")
println("Outcome: $(basename(outcome_json))")Connecting to DuckDB
DuckDB is an embedded analytical database - no server needed, just a single file.
using DuckDB
using DBInterface: DBInterface
const conn = DBInterface.connect(DuckDB.DB, DB_PATH)
DBInterface.execute(conn, "PRAGMA max_temp_directory_size='50GB'")Translating Cohort JSON to SQL
OHDSICohortExpressions.jl converts ATLAS cohort JSON into SQL via FunSQL.jl.
From 02_cohort_definition.jl:
import DBInterface: execute
import FunSQL: reflect, render
import OHDSICohortExpressions: translate
using DataFrames
target_def = read(TARGET_JSON, String)
outcome_def = read(OUTCOME_JSON, String)
function build_cohort(definition, cohort_id, conn)
catalog = reflect(conn; schema=SCHEMA, dialect=:duckdb)
sql = render(catalog, translate(definition; cohort_definition_id=cohort_id))
execute(
conn,
"""
INSERT INTO $SCHEMA.cohort
SELECT * FROM ($sql) AS foo;
"""
)
endWhat this does:
reflect- reads the live database schematranslate- converts ATLAS JSON -> FunSQL expressionrender- turns the FunSQL expression into valid DuckDB SQLexecute- inserts the cohort rows into the OMOPcohorttable
Building Both Cohorts
The workflow clears any existing cohorts with the same IDs, then builds fresh ones:
execute(
conn,
"DELETE FROM $SCHEMA.cohort WHERE cohort_definition_id IN ($TARGET_COHORT_ID, $OUTCOME_COHORT_ID)"
)
for (defn, id, label) in [
(target_def, TARGET_COHORT_ID, TARGET_LABEL),
(outcome_def, OUTCOME_COHORT_ID, OUTCOME_LABEL),
]
println("Building: $label ...")
try
build_cohort(defn, id, conn)
n = DataFrame(
execute(
conn,
"SELECT COUNT(*) AS n FROM $SCHEMA.cohort WHERE cohort_definition_id = $id"
)
)[1, :n]
println("Done - $n rows")
catch e
msg = sprint(showerror, e)
error("""
Cohort build failed for: $label (id=$id)
Error: $msg
This usually means the cohort JSON downloaded from ATLAS is missing a field
that OHDSICohortExpressions.jl expects (e.g. CollapseSettings).
Possible fixes:
1. Open the cohort in ATLAS, ensure it is fully configured, then re-export / re-run.
2. Check the JSON at: $(id == TARGET_COHORT_ID ? TARGET_JSON : OUTCOME_JSON)
3. Browse valid cohort definitions at: https://atlas-demo.ohdsi.org/#/cohortdefinitions
""")
end
endAfter this step, the OMOP cohort table contains:
cohort_definition_id | Cohort | Description |
|---|---|---|
1 | Target | Hypertensive patients - each with an index date |
2 | Outcome | Patients who subsequently developed pneumonia |
Final Check: Cohort Build Output
When both cohorts are built successfully, you should see output like this in your terminal:
── Defining cohorts ──────────────────────────────────────────
Building: Hypertension (target) ...
Done - 269607 rows
Building: Pneumonia (outcome) ...
Done - 13461 rowsIf either count is 0, check that:
Your OMOP CDM contains the specific Concept Set
The cohort JSON files exist at the paths in
config.tomlThe
cohorttable exists in$SCHEMA(created byrun.jlon first run)