Figure 1: Comparing the one-and-done “dead” data lifecycle to a one-to-many “live” data lifecycle. Images: ACD/Labs Within the pharmaceutical industry, the rapid identification, elucidation and characterization of synthetic, process impurities and degradation products is an intense and comprehensive undertaking. In the development of a formulated drug substance, the U.S. Food and Drug Administration (FDA) requires that all impurities introduced in the proposed process above 0.1% must be isolated and fully characterized.

Furthermore, in order to develop a robust drug product, degradation products must be characterized with the intent of minimizing their presence, thus preserving the shelf life of the formulated drug product. The emergence or presence of unqualified trace impurities or degradation products in a drug product can hinder development. In addition, regulatory responses requiring more information around specific impurities or degradation products can significantly delay the approval process. As a result, unambiguous determination and characterization of impurities, and their fate and impact, must be fully understood and communicated to regulatory authorities in a timely fashion. Unfortunately, depending on the complexity of the problem, the structure elucidation and characterization of an unknown can take weeks to resolve.

In an article for American Pharmaceutical Review entitled “An Integrated Approach to Impurity Identification in Pharmaceutical Development and Formulation,” a research group from GlaxoSmithKline shared that “perhaps the most difficult problem a spectroscopist can encounter is a request for structure elucidation of a complete unknown with no sample history.” The authors go on to explain the importance of the sample history. With all of this information in place, hours of analysis time can be saved. But how accessible is this sample history? How easily can the context of a sample’s history be related to higher-level project goals? More importantly, in a period where outsourcing of analysis and services is likely at an all-time high, how can sample history be captured and maintained as legacy information to assist in future decisions and responses?

Figure 2: A summary of key groups involved in the identification, elucidation and characterization of an impurity in pharmaceutical development.

The “dead” data problem
Since the 1980s, scientific R&D has applied new categories of software technology— LIMS, ELN, SDMS—yet a 2011 IDBS survey of R&D professionals reveals that 88% of R&D organizations lack adequate systems to automatically collect data for reporting, analysis and decision making. Furthermore, International Data Corp. (IDC) estimates an enterprise with 1,000 knowledge workers loses a minimum of $6 million a year in the time workers spend searching for—and not finding—needed information. As a result, many laboratory experiments represent expensive, time-consuming repeats of previous experiments, simply because the data cannot be found, or if found, cannot be reused.

The major source of these problems that significantly increases the time and cost of chemical R&D is the traditional one-and-done lifecycle (Figure 1), where knowledge is captured by a scientist, but essentially frozen as “dead” data in unstructured formats. This data is difficult to search and retrieve and nearly impossible to manipulate and re-analyze. A 2010 IDC Article states that 80% of new information growth is unstructured content, with 90% of that unmanaged.

The “live” data strategy
Unfortunately, developing a data and knowledge management strategy for pharmaceutical R&D is especially challenging due to the variety of different data types acquired and disseminated to solve challenging scientific problems in various disciplines. As a result, having a global Big Data strategy to manage, structure and leverage knowledge from all disparate data generated in a pharmaceutical R&D environment isn’t likely achievable.

Considering this, the most likely strategies for success would be those that address needs in a more focused application. One area where pharmaceutical companies are struggling to manage disparate and unstructured data is in the management of impurity knowledge in drug development. Impurity Resolution Management (IRM) is a specific application of Unified Laboratory Intelligence (ULI) to identify and characterize impurities for resolution and reporting during development and acceptance of a new drug substance. IRM enables cross-functional scientific collaboration among the process chemistry, analytical chemistry and structure elucidation groups through comprehensive and dynamic visualization of “live” data, optimal method development and quick and easy reporting.

Figure 3: ACD/Spectrus DB can capture “live” analytical data (right) and its related information and connect it with the chemical context of the development project. Within this view, the master synthetic route is viewable (top left), and clicking on individual steps will reveal a table of already identified impurities (bottom left) and their eventual synthetic fate (bottom middle). An IRM-based strategy can help drug development organizations embrace a “one-to-many” (Figure 1) chemical data lifecycle that creates intelligence so researchers can gain insight to make better decisions, accelerate development and reduce costs. Furthermore, it allows for improved communication and collaboration between cross-functional groups, namely process chemistry, method development and structure elucidation groups (Figure 2).

For example, an IRM platform—such as ACD/Labs’ Spectrus Platform—can provide members of the cross-functional groups with immediate access and context to a sample’s history. Knowledge of the synthetic route and the experimental details of an impurity profile can prove invaluable in speeding the isolation, and subsequent identification, of an impurity (Figure 3). Collaboration can enhance scientific productivity when collaborators bring special expertise and knowledge crucial to the research outcome, and in situations where there is a joint use of specialized equipment. In order for each of these strategies to help deliver a drug to market in less time—and, thereby, prolong its effective market life—scientists must be able to obtain the highest quality data possible in the shortest amount of time.

Finally, managing all this information and knowledge to capture the lifecycle of an active pharmaceutical ingredient, along with the fate of its related impurities and degradation products, can help organizations leverage and improve their process knowledge both on a specific project as well as future projects. The importance of capturing this knowledge and information in the context of “live” data is crucial in helping researchers understand the decisions made throughout the course of a decade-long development project.

Because the data is “live” it can be mined in a scientific and comprehensive manner, and can be easily re-processed, re-analyzed, compared and re-purposed for future use. This is especially important when the time comes to compile necessary information for a New Drug Application (NDA) or a regulatory response that requires an immediate access to data and findings related to impurities.