Filed under: News
Layout-aware text extraction from full-text PDF of scientific articles, a paper published in Source Code for Biology & Medicine by senior author and BIRN member, Gully Burns, has been downloaded over 4,500 times since August 2012. The paper describes an open source software tool developed by Burns and other ISI/USC researchers that extracts text from one or more PDFs based on user-specified rules that take into account issues such as layout, extraneous elements and formatting to produce results more accurate than standard available tools.
You can view the paper here: http://www.scfbm.org/content/7/1/7