Maryam Garzaa, Guilherme Del Fiolb, Jessica Tenenbaumc, Anita Waldena, Meredith Zozusc (2016). Evaluating common data models for use with a longitudinal community registry. Journal of Biomedical Informatics.
Duke Translational Medicine Institute, Duke University, 2424 Erwin Road, Hock Plaza Box 3850, Durham, NC 27705, USA
Department of Biomedical Informatics, University of Utah School of Medicine, 421 Wakara Way, Room: Suite 140, Salt Lake City, UT 84108, USA
Department of Biostatistics and Bioinformatics, Duke University, 2424 Erwin Road, Suite 1102 Hock Plaza Box 2721, Durham, NC 27705, USA
Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, 501 Jack Stephens Drive, Mail Slot # 782, Little Rock, AR 72205, USA
To evaluate common data models (CDMs) to determine which is best suited for sharing data from a large, longitudinal, electronic health record (EHR)-based community registry.
Materials and Methods
Four CDMs were chosen from models in use for clinical research data: Sentinel v5.0 (referred to as the Mini-Sentinel CDM in previous versions), PCORnet v3.0 (an extension of the Mini-Sentinel CDM), OMOP v5.0, and CDISC SDTM v1.4. Each model was evaluated against 11 criteria adapted from previous research. The criteria fell into six categories: content coverage, integrity, flexibility, ease of querying, standards compatibility, and ease and extent of implementation.
The OMOP CDM accommodated the highest percentage of our data elements (76%), fared well on other requirements, and had broader terminology coverage than the other models. Sentinel and PCORnet fell short in content coverage with 37% and 48% matches respectively. Although SDTM accommodated a significant percentage of data elements (55% true matches), 45% of the data elements mapped to SDTM’s extension mechanism, known as Supplemental Qualifiers, increasing the number of joins required to query the data.
The OMOP CDM best met the criteria for supporting data sharing from longitudinal EHR-based studies. Conclusions may differ for other uses and associated data element sets, but the methodology reported here is easily adaptable to common data model evaluation for other uses.