Published March 13, 2023
| Version v1
Journal article
Open
Towards self-describing and FAIR bulk formats for biomedical data
Description
We introduce a self-describing serialized format for bulk biomedical data called the Portable Format for Biomedical (PFB) data. The Portable Format for Biomedical data is based upon Avro and encapsulates a data model, a data dictionary, the data itself, and pointers to third party controlled vocabularies. In general, each data element in the data dictionary is associated with a third party controlled vocabulary to make it easier for applications to harmonize two or more PFB files. We also introduce an open source software development kit (SDK) called PyPFB for creating, exploring and modifying PFB files. We describe experimental studies showing the performance improvements when importing and exporting bulk biomedical data in the PFB format versus using JSON and SQL formats.
Data availability
The PyPFB software can be obtained from https://github.com/uc-cdis/pypfb. The data from the experimental studies can be obtained from: https://github.com/uc-cdis/pfb-paper-artifacts.Files
Towards-self-describing-and-FAIR-bulk-formats-for-biomedicaldata.pdf
Files
(875.7 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:e57c88cbe77d0070be9ed246d41c5e26
|
875.7 kB | Preview Download |
Additional details
Identifiers
- DOI
- 10.1371/journal.pcbi.1010944
- Other
- oai:uchicago.tind.io:5670
Funding
- National Heart, Lung, and Blood Institute
- U2CHL138346