How-to-Run Study

1 Setup Study

1.1 Download Zip

To avoid using git to leverage this study, the easiest way to access the study code is via a zip file. Instructions for downloading the zip file are below:

  1. Navigate to github repo url.
  2. Select the green code button revealing a dropdown menu.
  3. Select Download Zip.
  4. Unzip the folder on your local computer that is easily accessible within R Studio.
  5. Open the unzipped folder and select ehden_hmb.Rproj.

1.2 Setup R Environment

1.2.1 Using renv

This study uses renv to reproduce the R environment to execute this study. The study code maintains an renv.lock file in the main branch of the repository. To activate the R dependencies through renv use the following code:

renv::restore()

1.2.2 Troubleshooting renv

Sometimes there are errors in the package installation via renv. If you encounter an error try removing the problematic package from the renv.lock file and restore again. To remove a package from the lock file find the header of the package and delete all corresponding lines. Once you are able to get the remaining packages to install, manually install the problem package using one of the strategies below:


# Installing an R package from CRAN ------------

## Installing latest version of R package on CRAN
install.packages("ggplot2")

## Installing archived version of R package on CRAN
packageurl <- "http://cran.r-project.org/src/contrib/Archive/ggplot2/ggplot2_0.9.1.tar.gz"
install.packages(packageurl, repos=NULL, type="source")

# Installing an R package from github -----------------

# Installing current version of R package from github
install.packages("remotes") # note you may also work devtools
remotes::install_github("ohdsi/FeatureExtraction")

# Installing develop version of R package from github
remotes::install_github("ohdsi/Ulysses", ref = "develop")

# Installing old version of R package from github
remotes::install_github("ohdsi/CohortGenerator", ref = "v0.7.0")

Any additional issues with the renv lock file please file an issue in the study repository

1.2.3 Conflicts

Some organizational IT setups pose conflicts with renv. One example is if your organization uses the Broadsea Docker. If you are aware that your OHDSI environment will conflict with renv, deactivate it by running renv::deactivate() in the active ehden .RProj. Also delete the renv folder in your project directory. Review the list of R packages required to run the study in the Technical Requirements tab of the study hub and manually install.

It is highly recommended you stick with the renv snapshot as this is the easiest way to reproduce the study execution environment. Please only deactivate the lock file if it is a last resort.

1.3 Load Execution Credentials

This study uses keyring and config to mask and query database credentials needed for execution. The study will help load these credentials using the file extras/KeyringSetup.R.

1.3.1 Required Credentials

Data nodes executing this study require the following credentials:

  1. dbms - the name of the dbms you are using (redshift, postgresql, snowflake, etc)
  2. user - the username credential used to connect to the OMOP database
  3. password - the password credential used to connect to the OMOP database
  4. connectionString - a composed string that establishes the connection to the OMOP database. An example of a connection string would be jdbc:dbms://host-url.com:port/database_name.
  5. cdmDatabaseSchema - the database and schema used to access the cdm. Note that this credential may be separated by a dot to indicate the database and schema, which tends to be the case in sql server. Example: our_cdm.cdm.
  6. vocabDatabaseSchema - the database and schema used to access the vocabulary tables. Note this is typically the same as the cdmDatabaseSchema.
  7. workDatabaseSchema - a section of the database where the user has read/write access. This schema is where we write the cohortTable used to enumerate the cohort definitions.

It is recommended that you write these credentials down in a private txt file to make it easier to load into the credential manager for the study.

1.3.2 Loading Credentials

  1. Open the file in the study names extras/KeyringSetup.R.
  2. On L15:16 place a name for your config block and the database. The configBlock name can be an abbreviation, for example:
configBlock <- "synpuf"
database <- "synpuf_110k"
  1. One at a time run each line in the script and follow any prompts
    • L27 asks you to build a new config.yml file which can be done as so Ulysses::makeConfig(block = configBlock, database = database). Running this function will open a new file. Make sure any keyring variable is labelled as ehden_hmb.
    • L32 will setup a new keyring for the study. The name of the keyring is ehden_hmb and the password for the keyring is ohdsi. If R prompts you to place a keyring password it will be ohdsi unless changed by the user.
    • L36 will setup all the credentials for the study. A prompt will appear asking to input credentials, using your txt file input your credentials into the prompt. Review the credentials once you are done

1.3.2.1 Troubleshooting

If you have a problem with the keyring, please post an issue in the study repository. Otherwise you can avoid the keyring api by hard-coding your credentials to the config.yml file as shown in the example:

db: # replace with an acronym for your database no underscore
  databaseName: db_ehr # replace with the name of your database
  dbms: <your_dbms>
  user:  <your_user>
  password:  <your_pass>
  connectionString: <your_connectionString>
  cdmDatabaseSchema:  <your_ cdmDatabaseSchema>
  vocabDatabaseSchema:  <your_ vocabDatabaseSchema>
  workDatabaseSchema: <your_ workDatabaseSchema>
  cohortTable: ehden_hmb_<databaseName>

1.3.3 Multiple Databases (Optional)

Some study participants will be running this study on multiple databases. To add additional credentials rerun the extras/KeyringSetup.R file replacing the configBlock and database with a second database you need to use. on L28 you can add the following function to add a new config block to the config.yml file.

Ulysses::addConfig(block = "[next_block]", database = "[next_database]")

2 Run Study

2.1 Execution Script

Running all tasks sequentially can be done using the executeStudy.R file. Replace L17 with the configBlock of choice and run the script.

2.2 Study Tasks

Once your study is setup you are ready to run the EHDEN HMB study. The EHDEN HMB study contains six analytical tasks:

  1. Build Cohorts - initialize a cohort table and generate cohort counts from the circe json in cohortsToCreate
  2. Cohort Diagnostics - review of the hmb cohort definition
  3. Incidence Analysis - calculation of the incidence of hmb in a population of females between ages 11 to 55
  4. Baseline Characteristics - generation of prevalence of demographics and comorbidities at baseline (-365d prior to index)
  5. Treatment Patterns - calculation of drug prevalence post-index, sequences and time to treatment discontinuation
  6. Procedure Analysis - calculation of post-index prevalence of procedures and time to initial treatment

Each task will output to the results folder. Note that the first task buildCohorts is required to run any additional analytical task (base dependency). If a cohort definition json has changed it requires rerunning the buildCohorts task.

2.3 Review Cohort Diagnostics

If you would like to review the results of the Cohort Diagnostics prior to sending you can do so by the instructions below:

  1. Open the file extras/ExploreDiagnostics.R
  2. Run the R file line by line
    • Replace L20 with the name of your database
    • L28 will create a new folder in the directory called scratchDiagnostics where it will save the sqlite object
    • L36 prepares the diagnostics results per database
    • L41 launches the shiny app
  3. Ignore any warning messages about reserved keywords and navigation containers

2.3.1 Troubleshooting

Multiple Databases

Some data partners may be running this study on multiple databases. If that is the case then you to identify a folder that contains all the zip files produced from the CohortDiagnostics routine. To check if this file was produced, go into the cohortDiagnostics folder of each database and identify the .zip file amongst the csv. It should be labelled something like Results_database.zip. You can collect all of these zip files in a single folder and make the dataFolder target as the folder with the multiple zip files per database.

Shiny App

The shiny app rendered from CohortDiagnostics relies on the OHDSI package OhdsiShinyModules. This package is installed within the renv snapshot loaded with the package. If you are having trouble launching the app from within the ehden_hmb.RProj, exit the R project and try launching the app outside the study session. If you are outside the ehden_hmb.RProj remember you have to provide full file paths because you are no longer zoomed in on the study directory.

2.4 Results Folder

Following successful execution of the study, each database will have its own sub-folder within results. There will be a further 14 subfolders containing results from the execution underneath the database. Within each of these folders there will be a combination of csv, rds and parquet (only for treatmentHistory) that contain the results. You may review these files individually if you wish. Instructions on how to share the results are maintained in the contribution tab of the website.

2.5 Review Results

In version 0.2 of the study code, we have added code to launch a shiny app to locally review results prior to distributing them to the study host. In order to run the shiny app the user needs to prep the data for the app. Navigate to extras/PreviewResults.R and run the first step. This will take all databases run in the results section and build a data folder for the shiny app. This data folder is ignored to prevent accidental commits back to github. Next launch the shiny app.