Using OMOPCDMCohortCreator with R ๐ดโโ ๏ธ
This tutorial builds on the Beginner Tutorial in creating a characterization study but instead of using Julia, we will use R! This assumes the user has familiarity with R but familiarity with Julia is not required. By the end of this tutorial, you will learn how to use OMOPCDMCohortCreator directly within R without having to ever touch Julia.
Analysis Set-up ๐
R and Julia Installation
You will need to have R and Julia installed onto your computer. Here are the minimum required versions needed:
- R version must be greater than or equal to version $3.2$.
- Julia version must be greater than or equal to version $1.7$.
Furthermore, the Julia executable must be available from the system PATH
or you can set the JULIA_BINDIR
R environment variable to where the Julia bin
directory is on your computer like this:
# For Windows
Sys.setenv(JULIA_BINDIR = "C:/Users/user/AppData/Local/Programs/Julia-1.7.1/bin")
# For Linux or OSX
Sys.setenv(JULIA_BINDIR = "~/path/to/Julia-1.7.1/bin")
Packages
R Packages
You will need the following packages for this tutorial which you can install with install.packages
:
dplyr
JuliaConnectoR
tibble
To read more on these packages and what they do, see the appendix for details.
Julia Packages
You will need the following packages for this tutorial:
HealthSampleData
OMOPCDMCohortCreator
SQLite
We will have to install them as follows from within R:
library(JuliaConnectoR)
pkg <- juliaImport("Pkg")
pkg$activate("TUTORIAL", shared = TRUE)
# NOTE: You could specify the path to where your project is or any other path you want; set `shared` to `FALSE` if you do
pkg$add(c("HealthSampleData", "OMOPCDMCohortCreator", "SQLite"))
To read more on these packages and what they do, see the appendix for details.
Activating Analysis Environment
Now within R, anytime you want to use these installed packages, do the following:
library(JuliaConnectoR)
pkg <- juliaImport("Pkg")
pkg$activate("TUTORIAL", shared = TRUE)
Data
For this tutorial, we will work with data from Eunomia that is stored in a SQLite format.
hsd <- juliaImport("HealthSampleData")
eunomia <- hsd$Eunomia()
NOTE: An internet connection will be needed to download this data. After this data is downloaded, internet is no longer required for this tutorial.
Create Database Connection to Eunomia ๐พ
After you have finished your set up in R, we need to establish a connection to the Eunomia SQLite database that we will use for the rest of the tutorial:
slt <- juliaImport("SQLite")
conn <- slt$DB(eunomia)
With Eunomia, the database's schema is simply called "main". We will use this to generate database connection details that OMOPCDMCohortCreator
will use internally. For this step, we will use OMOPCDMCohortCreator
:
occ <- juliaImport("OMOPCDMCohortCreator")
occ$GenerateDatabaseDetails(juliaEval(":sqlite"), "main")
Finally, we will generate internal representations of each table found within Eunomia for OMOPCDMCohortCreator to use:
occ$GenerateTables(conn)
As a check to make sure everything was correctly installed and works properly, the following block should work and return a list of all person ids in this data:
occ$GetDatabasePersonIDs(conn)
Characterization Analysis ๐ค
Background for Analysis
As all the tools are working properly, let's do what is called a characterization study - a study that characterizes a group of patients with a certain condition (or conditions) across various attributes like race, age, and combinations thereof. We are going to do miniature version of such a study looking at patients with strep throat. For this, we will use the condition_concept_id
, $28060$.
Task: Find All Patients with Strep Throat
strep_patients <- occ$ConditionFilterPersonIDs(28060, conn)
strep_patients <- strep_patients$person_id
Task: Find the Race of Patients with Strep Throat
strep_patients_race <- occ$GetPatientRace(strep_patients, conn)
Task: Find the Gender of Patients with Strep Throat
strep_patients_gender <- occ$GetPatientGender(strep_patients, conn)
Task: Create Age Groupings of Patients with Strep Throat
For this task, for every single person who has a strep throat diagnosis, we need to assign them an age group. For this demo, age groupings will be made along $5$ year intervals when assigned to a person up to $100$ years of age (e.g. $[0, 4], [5, 9], ... [95, 100]$).
age_groups <- list(
list(0, 4),
list(5, 9),
list(10, 14),
list(15, 19),
list(20, 24),
list(25, 29),
list(30, 34),
list(35, 39),
list(40, 44),
list(45, 49),
list(50, 54),
list(55, 59),
list(60, 64),
list(65, 69),
list(70, 74),
list(75, 79),
list(80, 84),
list(85, 89),
list(90, 94),
list(95, 99))
strep_patients_age_group <- occ$GetPatientAgeGroup(strep_patients, conn, age_groupings = age_groups)
Task: Characterize Each Person by Gender, Race, and Age Group
With the previous tasks, we now know patients' gender, race, and age group. Using this information, we can combine these features to create a final table showing each patient's person_id
, gender, race, and age group per row. To do this combining, we will use the dplyr
and tibble
packages:
library(dplyr)
library(tibble)
strep_patients <- tibble(data.frame(person_id=strep_patients))
strep_patients_race <- tibble(data.frame(strep_patients_race))
strep_patients_gender <- tibble(data.frame(strep_patients_gender))
strep_patients_age_group <- tibble(data.frame(strep_patients_age_group))
Task: Create Patient Groupings
Often with characterization style studies, it is extremely important to aggregate patient populations. This is to protect the anonymity of patients with perhaps severely sensitive conditions (e.g. mental illnesses, sexually transmitted diseases, etc.) from possible repercussions from accidental disclosure of this patient information.
final_df <- full_join(strep_patients, strep_patients_race, by = c("person_id" = "person_id")) %>%
full_join(strep_patients_age_group, by = c("person_id" = "person_id")) %>%
full_join(strep_patients_gender, by = c("person_id" = "person_id")) %>%
select(-person_id) %>%
count(race_concept_id, age_group, gender_concept_id) %>%
rename(count = "n") %>%
filter(count > 10)
Conclusion ๐
This mini characterization study that we just conducted on this dataset opens up a whole new avenue for a researcher to pursue. For example, we could now calculate prevalence rates across different patient characteristics or compare and contrast multiple conditions at once. It should also be apparent that the API is set up in a very particular way: it is functional meaning that each function does one thing only. This gives a lot of flexibility to a user to build together a study incrementally using OMOPCDMCohortCreator
.
Appendix ๐ต๏ธ
Packages Used in Analysis
R Packages Used:
dplyr
- grammar of data manipulationtibble
- modern reimagining of the data.frameJuliaConnectoR
- a functionally oriented interface for calling Julia from R
Julia Packages Used:
OMOPCDMCohortCreator
- Create cohorts from databases utilizing the OMOP CDMHealthSampleData
- Sample health data for a variety of health formats and use casesSQLite
- A Julia interface to the SQLite library