Data Upload Scripts

Filed under: Tools

Status: In production

The Upload scripts work in concert with the GridFTP servers, XCEDE, and the standardized data hierarchies to put data onto the GridFTP servers and link it to the appropriate database.

Organizing and locating information, both clinical metadata and imaging information, in a distributed data environment can be challenging. Because each site in the collaboratory maintains its own linked data storage, it is very important to develop standards for how and where the images are stored, metadata descriptions of what the images are, and how to find them.

To address these requirements the BIRN has implemented standardized data storage hierarchies on GridFTP servers. The data storage hierarchy was designed to be amenable to automated processing scripts and robust enough to account for differences in storage requirements across BIRN consortia. Not only is the physical organization of files maintained, but so are XML-based descriptors of what the files mean, which aids in data analysis.

The goals of the data upload scripts are:

  • getting imaging information into the standardized data storage hierarchies,
  • creating XML-based metadata about the data acquisition,
  • converting scanner-specific, proprietary data formats into standard “useable” formats, and
  • linking data locations within the distributed file system into a site’s human imaging database (HID)

Once the data has been “uploaded” using these scripts, any site in the collaboratory can locate the datasets by querying their local site’s HID database for particular imaging parameters, protocol specifications, or clinical metadata and downloading the images for further analyses. Using this decentralized approach, the BIRN has put the sites in control of their data while still maintaining shareability and access to the information.

Status: New upload scripts working with the HID and GridFTP are scheduled to be available by the end of 2010. The previous versions which worked with the SRB are available here for archival purposes.

BIRN is supported by NIH grants 1U24-RR025736, U24-RR021992, U24-RR021760 and by the Collaborative Tools Support Network Award 1U24-RR026057-01.