Welcome to the Bioinformatics Testing Consortium
Codebase peer-review to improve quality of bioinformatics pipelines
Over the last decade, biology has become one of the most data-rich sciences: Next-generation sequencing has been democratized as a result of exponentially decreasing sequencing costs; rapid improvements to mass-spectrometry throughput allow for unprecedented analysis of proteomes and metaproteomes across all domains of life. To deal with this data, a concomitant rapid expansion of the tools required for computational analysis of this data has also arisen. Such tools are typically written by individuals or small teams and are then propagated to the rest of the bioinformatics community through publications as open-source programs.
The great strength of this model is that rapid sharing of tools prevents individual groups from re-inventing the same tool to solve a similar problem and rapid adoption of tools as the ‘industry-standard’ way of analyzing a particular dataset is common. However, the current review process for the publication describing a tool seldom involves a quality check of the tool itself, particularly if the tool is published as part of a broader biological study describing the results generated by the tool. Documentation is often sparse and code is often only tested on a single runtime environment prior to publication. As a result, initial use of a new bioinformatics tool is often preceded by a period of troubleshooting; email conversations with the author and/or internet forums etc. In the worst case, this increase in activation energy can deter use and prevent broader adoption.
In the early 1990s the use of open source software libraries in the IT industry typically suffered from the issues seen in today’s bioinformatics pipelines. However, a paradigm shift in the late 90s/early 2000s recognized the critical weakness of allowing software testing to be performed by the software developers themselves. The intimate knowledge software developers have with their own code leads to several problems:
When the industry started hiring professional testers, whose role consisted entirely of systematically running software as a naïve user down each possible path, these problems soon came to light and were quickly corrected. It is now commonplace for developers to release their code to testers, who then raise bugs in function and documentation for the developers to fix prior to general release.
While the use of professional testing in bioinformatics is undoubtedly out of the budgetary constraints of most projects, there are significant parallels to be drawn with the review process of manuscripts. The ‘Bioinformatics Testing Consortium’, was established to perform the role of testers for bioinformatics software.
The main aims of the consortium would be to verify that:
The benefits to the developer of such as system would include fewer setup query emails; improved confidence in deployment; suggestions on how to improve the efficiency of algorithms used and exposure of the software to a broader range of data, improving applicability and ultimately, a stamp of approval that their codebase has been rigorously tested. End-users would ultimately benefit from better quality software with lower activation energy, broadening bioinformatics accessibility to those who are confounded at the first compilation error. Journals would benefit as the codebase presented in the manuscript would be more stable and thus less subject to changes from the reported methods.
Once up and running, this site will serve as a consortium nexus to which bioinformatic software would be voluntarily submitted for testing. The BTC will then coordinate recruitment of reviewers to the project and manage the process from initiation to completion. Reviewers would raise issues via the bug tracking resources associated with the repository (alpha-testing). Once these issues were resolved, the codebase would be deemed suitable for beta- testing, at which point it could be marked as ready for general use. The transition of a codebase to beta-release would provide evidence to journal editors that code and test data associated with a manuscript was publicly available, well-documented and that the analysis described in the paper had been independently verified. It is my view that such a system would significantly improve the quality of bioinformatics research and increase exposure of new methodologies to a broader audience.
To keep up with updates in the progress of establishing the BTC, follow us on @BioTestConsort.