picard_intro.Rmd
Running an OHDSI study is exciting effort that can lead to meaningful
new evidence on clinical questions across a network of study
participants who have data mapped to the OMOP CDM. However, there are a
lot of tasks to keep track to make an OHDSI study successful. We
introduce the picard
R package, as a tool to assist in the
development and organization of an OHDSI study. The idea of the
picard
package is inspired by the usethis
package that
is used to assist in the development workflow of novel R packages and
projects alike. Similar to the development of an R package, there are
several steps and pieces of documentation needed in an OHDSI study to
effectively run the study across the OHDSI network. By providing
functions that automate tasks and provide consistent structure to OHDSI
studies, picard
attempts to help users develop and
communicate new OHDSI studies.
The first step towards assisting OHDSI studies is to introduce a
consistent directory structure that contains also necessary components
towards executing a study and is easy to follow. Below is a proposed
directory structure, offered by the picard
package for
OHDSI studies.
[01;34mC:/Users/mdlav/AppData/Local/Temp/Rtmpc9Llol/temp_libpath4a84762036c4/picard/vignette/picardDirectory
[0m
+--
[01;34manalysis
[0m
| +--
[01;34mprivate
[0m
| +--
[01;34msettings
[0m
| \--
[01;34mstudyTasks
[0m
+--
[01;34mcohortsToCreate
[0m
| \--
[01;34m01_target
[0m
+--
[01;34mdocumentation
[0m
+--
[01;34mextras
[0m
+--
[01;34mlogs
[0m
+--
[01;34mresults
[0m
\-- _study.yml
The analysis folder contains files that are required for running an
OHDSI study. There are three sub-folders: private,
settings and studyTasks. The studyTasks
folder contains files needed to run the OHDSI study. These could be
several files in a pipeline (i.e. 01_buildCohorts.R
,
02_buildStrata.R
,
03_baselineCharacteristics.R
) or a single strategus json
file that contains details of the modules to run. Next, the
settings folder contains any files that provide details of the
analysis settings. For example, this folder could contain scripts that
specify the settings of an incidence analysis to run in the study.
Likewise, this folder could contain the
createStrategusAnalysisSpecification.R
which creates the
strategus json to run the analysis. Finally, the private folder
contains any internal files needed to run the analysis. For example this
could include internal functions to run a study script. The
picard
package offers functions to help develop components
of the analysis such as:
makeAnalysisScript
: initializes an organized .R file,
pre-rendered with details about the analysis.makeInternals
: creates a .R file used for developing
internal functions.OHDSI studies revolve around generated cohort definitions used to
enumerate persons with a particular clinical occurrence (i.e. persons
prescribed ACE Inhibitors for first time). Keeping track of these cohort
definitions, is very important for a successful OHDSI study. Clinical
phenotypes often change during the development of studies, so it is very
important to keep the latest cohort definition json files organized. The
cohortsToCreate folder stores all the json files of cohort
definitions used in the study. They are organized in numbered folders
that are listed at the developers description. By default,
picard
creates a starting 01_target
folder to
store the target cohort definitions of the study. picard
offers functions that support the organization of this folder, such
as:
makeCohortFolder
: initializes a new folder to store
cohort definitions, i.e. a new folder for comparator cohortsmakeCohortDetails
: a markdown file that provides “plain
english” descriptions of the cohort definitions and tracks updates.An OHDSI study consists of lots of documentation that effectively communicate what the study is, how to run it and how to participate. There are three key files stored in this folder:
picard
auto-generates a
skeleton file via makeStudyProtocol
.picard
auto-generates a skeleton file via makeHowToRun
picard
auto-generates a skeleton file via
makeContribution
There exist scenarios when a full-fledged protocol is not required
for a study. While a study protocol is not required by the institution
running the study, it is still good practice to provide guidance on the
scientific decision making for the study. picard
offers a
skeleton file called the Study Synopsis, implemented via
makeStudySynopsis
, that gives structure to the methods and
rationale for the study while not being as formal as a study
protocol.
The documentation folder may also contain other files that are essential for communicating important aspects of the study.
When an OHDSI study is executed, we require a location to store the
results in an organized fashion. These results can be easily zipped and
sent to the study host. picard
initializes a
results folder that can be used as a target for the output. The
results folder is automatically added to the .gitignore
so
that results are not accidentally committed to github repository of the
OHDSI study. We intend to add functions to compliment the results folder
in the future.
Running an OHDSI study is like executing a pipeline of tasks. It is
vital that we know what is going on in the pipeline, whether an error
has occurred or when an execution has taken place. Loggers are an
important part of a pipeline and likewise an OHDSI study.
picard
offers a folder to save logs in a single location.
The log folder is is automatically added to the .gitignore
so that results are not accidentally committed to github repository of
the OHDSI study.
OHDSI studies sometimes contain files that are important to a study
but do not have a natural save location; the extras folder hosts these
files. A prime use for the extras folder is for scripts or files that
are ancillary to the main study. For example scripts such as
KeyringSetup.R
are helpful for running the study but not
core to the study itself. picard
offers functions that
support the extras folder.
The final file initiated by picard
is the
_study.yml
file. This is a meta file that provides an
overview of the study. It contains information about who is the study
lead, the date the study started and the full name of the study. We plan
to expand upon this meta file as we feel quality meta data for a study
is useful for 1) automating start-up tasks and 2) providing records to
users.
Once you have downloaded the picard
package, you can
begin creating an OHDSI study. To do this, you can run code as shown in
the block below:
picard::newOhdsiStudy(
projectName = "my_ohdsi_study",
author = "Jean-Luc Picard",
type = "Characterization",
directory = "my_directory",
open = TRUE
)
By running this function, picard
will initiate the
described directory structure and open an R project in a new session.
Notice that the details provided in the function will be rendered in the
_study.yml
file. With the study initiated we can begin
developing our OHDSI study.
For those familiar with code development, the README file is a
standard file used to introduce a describe the the project. Think of the
README as your cover page. It is the first file users see when they
navigate to the github page. All OHDSI projects require a README file,
thus picard
provides a function (makeReadMe
)
that initializes it.
picard::makeReadMe()
The readme file is initailized using information in the
_study.yml
file. A typical OHDSI study readme has a section
of meta information as seen below:
This provides all the basic information about the OHDSI study. A future feature will be to provide linkage to the OHDSI Forums post introducing the study to the community.
Next, the user will be asked to provide a description of the study
and databases used in the study. Finally, the README provides links to
important study information including the protocol, how to run and
contributions files. Users can add as much as they please to the README
file, picard
offers suggested guidance for starting it.
Another feature offered by picard
is support for svg
badges. These simple badges appear at the top of the README and provide
useful information about the study. Badges seen in OHDSI studies include
a study status badge and badges versioning the CDM and OMOP vocabulary.
picard
will provide more support and documentation for
badges in the future.
Another common file in software repositories is the NEWS file. The
purpose of the NEWS file is to track changes to the software over time.
Of course, version control software such as github keeps a record of the
iterations of the study as it is developed, however it is helpful to
have a “plain english” file that explains what has happened as the study
has developed over time. NEWS files are often maintained via a semantic versioning system. OHDSI studies
are not entirely software, however it is important to maintain order
over the development of the technical pieces of the study.
picard
offers a simple function that initiates this file,
makeNews
.
One of the most important inputs for an OHDSI study is the connection
details to the Database Management System (DBMS) that hosts the OMOP
data. These credentials need to be supplied in order to access the data
within the study node, to build cohorts and run the study on the CDM.
Keeping credentials secure and accessible is a difficult challenge. From
experience, we have found success maintaining credentials via the config
package. This package stores configurations to an analysis in a yml
where each set of credentials is differentiated with a block name. This
structure is particularly helpful in institutions that have access to
multiple OMOP assets and wish to iterate between connections
efficiently. Below is an example of a config.yml
file:
# Config File for example
default:
projectName: example
# Config block for example
example:
databaseName: synpuf_110k
dbms: !expr keyring::key_get('example_dbms')
user: !expr keyring::key_get('example_user')
password: !expr keyring::key_get('example_password')
connectionString: !expr keyring::key_get('example_connectionString')
cdmDatabase: !expr keyring::key_get('example_cdmDatabase')
cdmSchema: !expr keyring::key_get('example_cdmSchema')
vocabSchema: !expr keyring::key_get('example_vocabSchema')
workDatabase: !expr keyring::key_get('example_workDatabase')
workSchema: !expr keyring::key_get('example_workSchema')
role: !expr keyring::key_get('example_role')
cohortTable: example_synpuf_110k
We can efficiently access credentials that are hidden behind the keyring
API.
The config.yml
file works very similarly to the
.Renviron
file. The advantage of using the
config.yml
file instead of the .Renviron
is
that can handle multiple credentials and is more portable than the
.Renviron
file which is set to an environment. With
picard
we provide a function that initializes this
file:
picard::makeConfig(block = "example", database = "synpuf_110k")
When a user creates a new config.yml
file it is added to
.gitignore
to ensure it is not committed to a github
repository.