Traditional peer review is not enough to ensure data quality amid the recent boom in scientific research findings, according to results of a 10-year collaboration between the National Institute of Standards and Technology (NIST) and five technical journals.
While production of research data is growing about 7% annually, about one-third of papers submitted to participating journals contained erroneous or incomplete chemical property data, according to a report by 32 authors from NIST and the collaborating journals. Poor data can lead to mistakes in equipment selection, over-design of industrial plant components, difficulty simulating and discovering new processes, and poor regulatory decisions, the report notes.
The traditional peer-review process is under pressure to work too fast to evaluate fully all new experimental data, the NIST-journal collaboration found. The authors' solution is a set of customized software tools and procedures for validating experimental data and eliminating errors after a paper is approved by peer review, but before a journal formally accepts the paper.
The five journals are the Journal of Chemical and Engineering Data, Fluid Phase Equilibria, The Journal of Chemical Thermodynamics, the International Journal of Thermophysics and Thermochimica Acta. The collaboration focused on thermophysical property data used in chemical process technologies such as distillation, extraction and crystallization. Thermophysical properties include boiling and melting points, density, viscosity, solubility and many other physical values related to temperature, including those for mixtures. The study findings also may be of broad value to scientific data publishing in general.
"The solutions we offer, while centered on the field of thermodynamics, should be applicable in principle to other areas of science and engineering," says Michael Frenkel, a co-author of the new paper and leader of NIST's Thermodynamics Research Center.
Managing thermophysical property data is particularly challenging because some 100-year-old data remain useful today for engineering purposes. Efforts to establish data-reporting standards for this field began more than 50 years ago but could not succeed until recently, with the development of specialized computer hardware and software tools.
The collaboration cites a variety of factors contributing to poor data quality. Advances in measurement science have boosted data collection, but increased automation has resulted in the loss of personnel expertise and knowledge required to run manual systems. In addition, equipment manufacturers sometimes make invalid uncertainty claims. And word processing software, with functions such as cut-and-paste and "fill down" in spreadsheets, has led to many published errors. The most common problem found in papers analyzed by the collaboration was missing or underestimated uncertainties for reported data.
After several tries, the collaboration developed a rapid, cost-effective process for identifying and eliminating errors. NIST developed a new online tool (NIST ThermoLit) that allows researchers to generate a literature report containing relevant references retrieved from a NIST database. Researchers can combine this capability with an older experiment planning system (NIST ThermoPlan) at both the conceptualization and publication stages of their work. If the submitted paper passes a journal's peer review, NIST generates a report noting any inconsistencies between the new experimental data and critically evaluated data based on past research. Data are extracted from the submitted paper and validated by NIST's expert software system for data evaluation (NIST ThermoData Engine).
The NIST-journal collaboration plans to continue its work by refining and expanding the modeling and prediction tools in the expert system software.