Intro to Picard • picard

Introduction

Running an OHDSI study is exciting effort that can lead to meaningful new evidence on clinical questions across a network of study participants who have data mapped to the OMOP CDM. However, there are a lot of tasks to keep track to make an OHDSI study successful. We introduce the picard R package, as a tool to assist in the development and organization of an OHDSI study. The idea of the picard package is inspired by the usethis package that is used to assist in the development workflow of novel R packages and projects alike. Similar to the development of an R package, there are several steps and pieces of documentation needed in an OHDSI study to effectively run the study across the OHDSI network. By providing functions that automate tasks and provide consistent structure to OHDSI studies, picard attempts to help users develop and communicate new OHDSI studies.

Directory Structure

The first step towards assisting OHDSI studies is to introduce a consistent directory structure that contains also necessary components towards executing a study and is easy to follow. Below is a proposed directory structure, offered by the picard package for OHDSI studies.

 [01;34mC:/Users/mdlav/AppData/Local/Temp/Rtmpc9Llol/temp_libpath4a84762036c4/picard/vignette/picardDirectory [0m
+--  [01;34manalysis [0m
|   +--  [01;34mprivate [0m
|   +--  [01;34msettings [0m
|   \--  [01;34mstudyTasks [0m
+--  [01;34mcohortsToCreate [0m
|   \--  [01;34m01_target [0m
+--  [01;34mdocumentation [0m
+--  [01;34mextras [0m
+--  [01;34mlogs [0m
+--  [01;34mresults [0m
\-- _study.yml

Analysis Folder

The analysis folder contains files that are required for running an OHDSI study. There are three sub-folders: private, settings and studyTasks. The studyTasks folder contains files needed to run the OHDSI study. These could be several files in a pipeline (i.e. 01_buildCohorts.R, 02_buildStrata.R, 03_baselineCharacteristics.R) or a single strategus json file that contains details of the modules to run. Next, the settings folder contains any files that provide details of the analysis settings. For example, this folder could contain scripts that specify the settings of an incidence analysis to run in the study. Likewise, this folder could contain the createStrategusAnalysisSpecification.R which creates the strategus json to run the analysis. Finally, the private folder contains any internal files needed to run the analysis. For example this could include internal functions to run a study script. The picard package offers functions to help develop components of the analysis such as:

makeAnalysisScript: initializes an organized .R file, pre-rendered with details about the analysis.
makeInternals: creates a .R file used for developing internal functions.

Cohorts to Create Folder

OHDSI studies revolve around generated cohort definitions used to enumerate persons with a particular clinical occurrence (i.e. persons prescribed ACE Inhibitors for first time). Keeping track of these cohort definitions, is very important for a successful OHDSI study. Clinical phenotypes often change during the development of studies, so it is very important to keep the latest cohort definition json files organized. The cohortsToCreate folder stores all the json files of cohort definitions used in the study. They are organized in numbered folders that are listed at the developers description. By default, picard creates a starting 01_target folder to store the target cohort definitions of the study. picard offers functions that support the organization of this folder, such as:

makeCohortFolder: initializes a new folder to store cohort definitions, i.e. a new folder for comparator cohorts
makeCohortDetails: a markdown file that provides “plain english” descriptions of the cohort definitions and tracks updates.

Documentation Folder

An OHDSI study consists of lots of documentation that effectively communicate what the study is, how to run it and how to participate. There are three key files stored in this folder:

Study Protocol: the protocol offers guidance on the scientific methods used to conduct the study and justification for why scientific decisions were made. picard auto-generates a skeleton file via makeStudyProtocol.
How To Run: a file is needed that provides all technical specifications and instructions needed to run the study at a site. This file provides information on how to download the OHDSI study, prepare the environment for running the study, executing the study and how to dissiminate the results to the study host. picard auto-generates a skeleton file via makeHowToRun
Contribution Guidelines: a file is required to communicate how others can contribute to the study. Contributions can range from simply running the study to providing significant development. OHDSI studies require transparency to nodes on what are the rules and expectations for meaningful contribution to the study. picard auto-generates a skeleton file via makeContribution

There exist scenarios when a full-fledged protocol is not required for a study. While a study protocol is not required by the institution running the study, it is still good practice to provide guidance on the scientific decision making for the study. picard offers a skeleton file called the Study Synopsis, implemented via makeStudySynopsis, that gives structure to the methods and rationale for the study while not being as formal as a study protocol.

The documentation folder may also contain other files that are essential for communicating important aspects of the study.

Results Folder

When an OHDSI study is executed, we require a location to store the results in an organized fashion. These results can be easily zipped and sent to the study host. picard initializes a results folder that can be used as a target for the output. The results folder is automatically added to the .gitignore so that results are not accidentally committed to github repository of the OHDSI study. We intend to add functions to compliment the results folder in the future.

Logs Folder

Running an OHDSI study is like executing a pipeline of tasks. It is vital that we know what is going on in the pipeline, whether an error has occurred or when an execution has taken place. Loggers are an important part of a pipeline and likewise an OHDSI study. picard offers a folder to save logs in a single location. The log folder is is automatically added to the .gitignore so that results are not accidentally committed to github repository of the OHDSI study.

Extras Folder

OHDSI studies sometimes contain files that are important to a study but do not have a natural save location; the extras folder hosts these files. A prime use for the extras folder is for scripts or files that are ancillary to the main study. For example scripts such as KeyringSetup.R are helpful for running the study but not core to the study itself. picard offers functions that support the extras folder.

_study.yml

The final file initiated by picard is the _study.yml file. This is a meta file that provides an overview of the study. It contains information about who is the study lead, the date the study started and the full name of the study. We plan to expand upon this meta file as we feel quality meta data for a study is useful for 1) automating start-up tasks and 2) providing records to users.

Initializing an OHDSI study

Once you have downloaded the picard package, you can begin creating an OHDSI study. To do this, you can run code as shown in the block below:

picard::newOhdsiStudy(
  projectName = "my_ohdsi_study",
  author = "Jean-Luc Picard",
  type = "Characterization",
  directory = "my_directory",
  open = TRUE
)

By running this function, picard will initiate the described directory structure and open an R project in a new session. Notice that the details provided in the function will be rendered in the _study.yml file. With the study initiated we can begin developing our OHDSI study.

The README file

For those familiar with code development, the README file is a standard file used to introduce a describe the the project. Think of the README as your cover page. It is the first file users see when they navigate to the github page. All OHDSI projects require a README file, thus picard provides a function (makeReadMe) that initializes it.

picard::makeReadMe()

The readme file is initailized using information in the _study.yml file. A typical OHDSI study readme has a section of meta information as seen below:

Study Lead: Jean-Luc Picard
Analytics Use Case: Characterization
Study Start Date: 2023-05-
Study End Date: 12-31-2099
Study Tags: None
Protocol: Unavailable
Publications: Unavailable
Results Explorer: Unavailable

This provides all the basic information about the OHDSI study. A future feature will be to provide linkage to the OHDSI Forums post introducing the study to the community.

Next, the user will be asked to provide a description of the study and databases used in the study. Finally, the README provides links to important study information including the protocol, how to run and contributions files. Users can add as much as they please to the README file, picard offers suggested guidance for starting it.

Another feature offered by picard is support for svg badges. These simple badges appear at the top of the README and provide useful information about the study. Badges seen in OHDSI studies include a study status badge and badges versioning the CDM and OMOP vocabulary. picard will provide more support and documentation for badges in the future.

The NEWS file

Another common file in software repositories is the NEWS file. The purpose of the NEWS file is to track changes to the software over time. Of course, version control software such as github keeps a record of the iterations of the study as it is developed, however it is helpful to have a “plain english” file that explains what has happened as the study has developed over time. NEWS files are often maintained via a semantic versioning system. OHDSI studies are not entirely software, however it is important to maintain order over the development of the technical pieces of the study. picard offers a simple function that initiates this file, makeNews.

The config.yml file

One of the most important inputs for an OHDSI study is the connection details to the Database Management System (DBMS) that hosts the OMOP data. These credentials need to be supplied in order to access the data within the study node, to build cohorts and run the study on the CDM. Keeping credentials secure and accessible is a difficult challenge. From experience, we have found success maintaining credentials via the config package. This package stores configurations to an analysis in a yml where each set of credentials is differentiated with a block name. This structure is particularly helpful in institutions that have access to multiple OMOP assets and wish to iterate between connections efficiently. Below is an example of a config.yml file:

# Config File for example

default:
  projectName: example

# Config block for example

example:
  databaseName: synpuf_110k
  dbms: !expr keyring::key_get('example_dbms')
  user: !expr keyring::key_get('example_user')
  password: !expr keyring::key_get('example_password')
  connectionString: !expr keyring::key_get('example_connectionString')
  cdmDatabase: !expr keyring::key_get('example_cdmDatabase')
  cdmSchema: !expr keyring::key_get('example_cdmSchema')
  vocabSchema: !expr keyring::key_get('example_vocabSchema')
  workDatabase: !expr keyring::key_get('example_workDatabase')
  workSchema: !expr keyring::key_get('example_workSchema')
  role: !expr keyring::key_get('example_role')
  cohortTable: example_synpuf_110k

We can efficiently access credentials that are hidden behind the keyring API. The config.yml file works very similarly to the .Renviron file. The advantage of using the config.yml file instead of the .Renviron is that can handle multiple credentials and is more portable than the .Renviron file which is set to an environment. With picard we provide a function that initializes this file:

picard::makeConfig(block = "example", database = "synpuf_110k")

When a user creates a new config.yml file it is added to .gitignore to ensure it is not committed to a github repository.