Biomedical Informatics Research Network (BIRN) | Supplement V: FBIRN Informatics

Supplement V: FBIRN Informatics

This page includes supplemental materials for the paper, FBIRN Recommendations for Multi-Center fMRI Studies:

Coordinating environment. The FBIRN federated environment consists of ten geographically distributed sites, each hosting a HID database and maintaining a GridFTP⁹ accessible storage device. Access control, data movement, data security, and replication are managed using functionality from the Globus Toolkit¹⁰ and supported by the BIRN coordinating center (www.birncommunity.org/). The BIRN coordinating center provides a number of capabilities in support of multisite studies¹¹. The FBIRN FIRE environment builds domain specific informatics tools on top of the core data access and movement functionality supplied by the coordinating center. Having a supporting infrastructure is important for multi-site federated projects because it relieves scientists and informaticians from the burden of having to maintain generic data access and movement services and instead allows them to focus on domain specific tool building. The HID database is a web-accessible data management tool that provides an extensible database schema for the storage and retrieval of clinical assessment, imaging data, and derived data collected and produced during the course of an imaging experiment. The HID maintains links to the binary image files stored on a file system that is accessible to the web server. The system currently supports locally mounted filesystems (NFS), GridFTP accessible filesystems, and the Storage Resource Broker distributed filesystem (SRB) for image and derived data storage. The FBIRN has chosen to use GridFTP accessible storage devices because the technology provides single sign-on capabilities and parallel data transfer support while allowing sites the most flexibility in terms of hardware choices and access control. The user interacts with the database through the web interface which implements a shopping cart motif whereby users can select imaging and/or clinical dataset from multiple sites for download. The system provides federated query functionality by registering other HID servers at a site installation so that queries can be sent to distributed installations and results joined locally, providing each site with access to data collected at the other sites. By using the core data access and movement functionality provided by the Globus Toolkit, the FBIRN sites can download image data stored on remote GridFTP resources using the same credentials they use to gain access to their local HID web interface. The HID supports electronic data capture of clinical data using online data entry forms or remote data capture using TabletPC based software and HID XCEDE webservices. The electronic data capture functionality provides support for double data entry and reconciliation, an important feature when transferring clinical data collected on paper to the data management system. When using the TabletPC based system (or the HID electronic data capture forms directly), clinical assessment data are collected via the electronic interface only and obviates the need for double data entry. In order to represent and transfer clinical data to the HID database when collected by the TabletPC device or other data providers, an XCEDE formatted XML document is passed to the HID webservice for database storage. The use of a structured representation of the data is important for data consistency and checking against protocol specifications (see Data Validation and QC section). For more information on the core HID functionality please see Ozyurt et al.¹².

Data validation. The FBIRN consortium has implemented several levels of protocol validation in the automated upload infrastructure, described here.

At the highest level, we take advantage of the fact that the upload process for each subject and visit is driven by an XML formatted upload template file. This template file lists the project, subject, and visit identifiers, as well as, for each listed image directory, the standard name and protocol identifiers associated with that image data. The upload XML file is passed through a project-specific validator written using the Schematron validation language. (ISO/IEC (2006). Document Schema Definition Languages (DSDL) part 3: rule-based validation — Schematron. ISO/IEC19757-3:2006.) This validator verifies that the protocol identifiers, standard names, and subject group types listed in the XML file match those previously decided upon for this project, and that the correct number of scans of each protocol type were collected.

The next stage of protocol validation occurs after the image metadata has been extracted from the raw image files into XCEDE¹³ wrapper files. Because the image metadata are now in a standard XML-based form, we again use a project-specific Schematron-based validation to examine acquisition parameters, including image orientation, spatial and temporal dimensions, voxel sizes, slice spacing and gaps, TR, TE, bandwidth, flip angle, etc.

The last level of protocol validation encompasses those checks that are not possible via the other methods. These are implemented as dynamically-loaded extensions to the data publication tool itself. At carefully chosen locations, or “hooks”, during the upload process, the upload tool may call out to project-specific code, if there exists a source file named with the appropriate project and hook identifiers. These have been used to verify the presence and validate the structure of expected behavioral data, and also to run project-specific tasks like defacing of high-resolution structural images.

Data Monitoring/QC. The FBIRN has implemented tools for each aspect of monitoring and QC. Along with data validation (see 3. Data validation), FBIRN has implemented image quality control scripts which execute on each of the GridFTP servers after data publication (upload). Each night, each GridFTP server at participating sites executes scripts that identify which fMRI datasets on the local file system have not been run through the QC tool. The QC reports are then generated for those datasets and are thus available for data curators and the data monitoring tool (discussed below). FBIRN has further implemented versioning logic which forces re-computation of image QC reports if the version of the QC tool has changed. In longitudinal, distributed multi-site studies, versioning of the QC reports and analysis results is extremely important for consistency and data reuse.

The FBIRN dashboard is a dynamic web-accessible, project specific snapshot of what data have been collected by each participating site. The dashboard is created by scripts that inventory each of the distributed databases, each of the GridFTP distributed file systems, and the project specific image curation wiki for a complete integrated view of the project. The dashboard scripts draw upon the project protocol to identify which imaging sequences and clinical assessments should have been collected. Data (or subsets of data) that are missing are coded red in the appropriate cell of the dashboard. Participating sites can click on the red cells and describe why the data are missing or unusable. These annotations update future snapshots of the dashboard to prevent repeated inquiries about the same missing data throughout the course of the project. The dashboard further plots site specific data collection target trajectories and estimates of which sites are behind targeted enrollments. Finally, image QC reports are linked into the dashboard and available for review along with an indicator of the curation status for each QC report. The dashboard has been an invaluable tool for integration and coordination among sites.

Supplement V: FBIRN Informatics

This page includes supplemental materials for the paper, FBIRN Recommendations for Multi-Center fMRI Studies:

Data validation. The FBIRN consortium has implemented several levels of protocol validation in the automated upload infrastructure, described here.

Supplement V: FBIRN Informatics

This page includes supplemental materials for the paper, FBIRN Recommendations for Multi-Center fMRI Studies:

Data validation. The FBIRN consortium has implemented several levels of protocol validation in the automated upload infrastructure, described here.