GlaxoSmithKline (GSK) recently implemented a new platform that integrates, simplifies, and unlocks all the data across its R&D division to drive new innovation and decision-making in the pharmaceutical space.

Their new platform promises to completely change the way drug discovery is done—narrowing down what usually takes a few years to 30 minutes. Drug Discovery & Development spoke with Mark Ramsey, Ph.D., chief data officer at GSK, in an exclusive interview about the company’s new analytics initiatives to manage R&D data and quickly bring new drugs to market.

Drug Discovery & Development: What planted the seed for this new platform?

Mark Ramsey:  We looked at GSK and other pharmaceutical companies and found that GSK, like most pharma companies were lagging behind other industries—financial, retail, telco—in data analytics so we put together a plan to accelerate the focus on data strategy and analytics in R&D, from early discovery through development.

The middle of last year we implemented the first R&D platform to be the foundation for the work that we’re doing and over the last year we very rapidly loaded that platform with the R&D data as well as external information and focused on how to drive the value in R&D. In fact, we were recognized by an industry panel to receive the ‘Rookie of the Year’ award for what we accomplished in the last 12 months for being able to deliver this information to our researchers and scientists.

What we’re really focused on is bringing together the large amount of data that GSK has inside of the organization. Generally, life science companies generate a lot of data but typically that information is captured and stored project by project and it’s very difficult to look across the collection of projects and the collection of clinical trials that have happened over the last two, three, five, ten years and use that information for exploratory purposes to help improve how things are done in the future. If I had to boil down what we’re doing, it’s harnessing the power of that internal and external information to transform the way we do drug discovery as well as clinical trials.

Drug Discovery & Development: What is the relationship with Trifacta?

Mark Ramsey: In order to create this platform we’ve integrated a number of technologies and Trifacta is one that allows our business users to access the data. The one way is we go and acquire the data from all of the operational sources available in R&D and bring that together on the platform and then we use machine learning and technologies to rationalize that into industry standards. For example, in the clinical trial space, we rationalize thousands of clinical trials into the CDISC standard so we can look at them consistently. Although that’s good because standards makes the data consistent across all the clinical trials, it’s not necessarily the way business wants to look at it, so Trifacta allows the business users to manipulate the data and make it business relevant in order to then visualize the information, and query it and make decisions from it.  

Drug Discovery & Development: How does it facilitate drugs to market?

Mark Ramsey: As a pharmaceutical company we conduct a lot of experiments, so there are a lot of assays that are run against compounds to try to determine their impact and toxicity and efficacy and those kinds of things. Typically what happens is each of those assays are done project by project and there’s not an easy way to look across the collection of assays to understand a particular molecule in every assay that’s ever been conducted. We have rationalized the data so that our scientists can actually query a repository of all assays, of all compounds, to understand the entire portfolio of experiments and understand how it can be applicable to something they’re working on. It gives them the ability to have a deep understanding before they run a new assay or new experiment and it helps accelerate that time in some of the new discovery areas.

By standardizing the clinical trial data it allows folks in the discovery area to analyze what has happened within clinical trials across all the therapeutic areas and allows them to then target the work that they’re doing with a new product to see how that might impact in the market so they can make adjustments very early in the process before it reaches the clinical trial stage. It allows them again, to really harness the power of this collection of clinical trial data. That’s one of the benefits of GSK being a large pharmaceutical company because we have hundreds and hundreds and thousands of clinical trials and those clinical trials typically have been used to really understand a particular asset but there’s value well beyond that by harnessing the power of the collection of clinical trials. That can help improve future clinical trials so they become a lot more efficient and effective and shorten the times of those trials but it also helps in the discovery area where you can understand what things we’ve seen in past clinical trials and influence what is going on in discovery.

Drug Discovery & Development: Is all of the data from just GSK?

Mark Ramsey: Well, the clinical data is all GSK’s data, but we also have partnerships with TransCelerate where we can access external clinical trial data and other external information. For example, we use the open targets data which is all of the target information that has been amassed by the European Bioinformatics Institute.   

Also, we recently signed a collaboration with the UK Biobank. The UK Biobank is one of the largest biobanks in the world and it has data for over 500,000 patients over ten years. As part of the collaboration GSK is actually doing full exome sequencing on those 500,000 patients so that we have not only all of the medical information but we have the full exome data that allows us to do phenotypical and full exome sequencing  at a very large scale. We have the ability to now do very very large-scale DNA sequencing to look for certain disease characteristics or phenotypical characteristics. We’re actually the first company to do that at that type of scale.

Drug Discovery & Development: How else does the platform benefit research?

Mark Ramsey: If you look at the clinical trial data one of the things that’s extremely important is to make sure the diversity of our clinical trials match the population diversity. By looking at the collection of clinical trials that we have we can better understand how to design the trial to be effective and efficient and also match the diversity. We’re also looking at using the data to do simulations of the control arm of the study so instead of having to have a placebo group within a clinical trial we can use the placebo in the control data from previous clinical trials. Reusing that on new clinical trials means we don’t have to have patients that are going through the process but not receiving the active compound. It’s obviously much better for patients not only because we don’t have hundreds or thousands of patients that don’t have or are not receiving active compounds but it also reduces the cost and it shortens the length of clinical trials because you don’t have to recruit the number of patients for the placebo arm.

Drug Discovery & Development: Have there been any studies that have fully utilized this platform?

Mark Ramsey: We’re just getting started…next year we’ll have our first study that uses data from a previous trial for the control arm. We’re very close to getting final FDA approval on that study. That will be one of the first—and to your point it does take a bit of time to work with the clinical trials though we are already seeing value as it relates to better understanding of clinical trial’s diversity and site selection and efficiency of design of the trial. As far as executing the trial using the information it’s a bit early for that. We’ll be doing that in the next 6 to 12 months. 

Drug Discovery & Development: The main benefit is the shortening of the drug pipeline?

Mark Ramsey: Right. That is the goal. If you can shorten the clinical trial process, then that will certainly have an overall impact on the pipeline. Quite frankly, as you know, in the discovery area typically it takes between five and seven years from early discovery to first time in human and we’re really pushing to see how far we can advance use of AI and computer simulation in the drug discovery process with the goal being to take the process to maybe less than two years.

We’re trying to destroy the drug discovery process. Five to seven years not only extends the pipeline, but the costs and the length of time it takes to get a new medicine to a patient. Earlier this year we signed a collaboration with the Department of Energy in the U.S. as well as the National Cancer Institute and Lawrence Livermore to use the top supercomputers in the world as a way to help us develop the algorithms to simulate drug discovery. So we’re using the data we’ve collected as part of the data strategy—work that I’ve been leading over the last year or so—coupled with that work with the top supercomputers and algorithms to come up with a way to really accelerate discovery.

Drug Discovery & Development: This is In addition to the old-school R&D process?

Mark Ramsey: We’re trying to drive shorter-term value to what you referred to as the old-school—to help researchers do the best they can, but in parallel we want to use completely new approaches to disrupt the way that drug discovery is done in general. At the same time that can take us a couple of years before we would start seeing any benefit. Right now we have hundreds if not thousands of our chemists that are already using the data from the platform, being able to analyze it, understand what has been done in the past, and it prevents them from having to redo experiments. That also is a savings—that may save us 10 or 15 or 20 percent off the time to do drug discovery and in addition to that, in parallel, we’re trying to do this artificial intelligence in silico drug discovery which would be an order of magnitude faster.

Drug Discovery & Development: What else would you like people to know about the technology?

Mark Ramsey: There is a lot of media coverage and publicity in things like machine learning and artificial intelligence to go after the drug discovery area—that is an exploding area but I think it will drive out value. What I think is unique is that we’ve used those same technologies to help us organize and rationalize that data so that it’s ready for our scientists to use it. I think that’s a step that a lot of folks miss. In order to use technologies, such as machine learning and artificial intelligence, you have to have the data organized. You have to be able to submit that data into those technologies and there’s not a lot of discussion around using machine learning and advanced technologies as a way to get the data ready.

I refer to it as we’re using big data in a big way. We’re not doing the little proof of concept, we’re actually looking across the entire landscape of R&D, we’re bringing in all of the data and we’re using advanced technologies to rationalize that data and organize it to make it available to the scientists and the researchers so that they in turn can change the way they’re doing their jobs.