Researchers at Johnson &
Johnson Pharmaceutical Research and Development (J&J PRD)
faced a challenge. Over the years, they have built a
state-of-the-art platform to enable discovery of small-molecule
drugs, but the expanding role of biologics in pharmaceutical
research required a new set of tools to handle large-molecule
compounds.
Developing such functionality from scratch was a daunting
proposition. It would take time and resources while delaying
development of novel treatments for debilitating diseases and
disorders.
Researchers at Microsoft Research had a solution. Their new,
open-source library of bioinformatics functions, the Microsoft
Biology Foundation (MBF), part of the Microsoft
Biology Initiative, was designed to address just such a
challenge. When the J&J PRD researchers learned about this,
they immediately became intrigued.
This confluence of need and opportunity occurred in late
November 2009. Now, less than a year later, the benefit has become
manifestly apparent. Instead of spending costly time building a
foundation for the new biological infrastructure, J&J PRD was
able to focus on delivering value-added functionality needed to
facilitate development of innovative treatments that have the
potential of improving the health and quality of life of patients
around the world.
By using MBF, we were able to provide our users with a greater
level of functionality in less time to our users for our initial
development phase in the large-molecule space. says Jeremy Kolpak,
J&J PRD senior analyst, who will be discussing his teams MBF
deployment during the 2010
eScience Workshop, being held in Berkeley, Calif., from Oct.
11-13, It allowed us to focus on value-added functionality for our
scientists and has helped us adapt to new requests quite
easily.
Such testimony brings a smile to the face of Simon
Mercer, director of
Health and Wellbeing for
External Research, a division of Microsoft Research.
Simon Mercer
The principal advantage of MBF, Mercer says, is that, because
its free and open-source, as a programmer, you get a certain amount
of prewritten functionality that you can just build on top of. It
gives you more time to do the real science, because weve already
supplied the basics.
It didnt take long for J&J PRD to grasp the implications of
MBF.
We were in the process of developing our own infrastructure to
work with sequences, Kolpak explains. This was part of a larger
move in our organization to improve how R&D with large
molecules was performed and integrate that process with an existing
and mature framework for working with small molecules.
We have been using MBF from the day we heard of it.
That is precisely the focus of the Health and Wellbeing effort
within External Research: to collaborate openly with the
bioinformatics community by applying advanced computing
technologies to provide unprecedented insight into disease and
human healthcare.
MBF, built on the Microsoft .NET Framework
and aimed at making it easier to implement biological applications
on the Windows platform, was launched in Boston on July 9 during
the 11th annual Bioinformatics Open Source Conference. Since then,
thousands of bioinformaticians have
downloaded the tool kit.
The Microsoft Research Biology Extension
for Excel, displaying the contents of a FASTA file containing an
Influenza A virus sequence.
There are a lot of biologists who start as post-docs but dont
end up going into biological research themselves, Mercer says. They
end up managing the data and writing the scientific applications
that the biologists need to do research. They can be anywhere on
the continuum between full biologists with no computing background
to full computer scientists with little or no biological
background.
They work alongside the biological scientists, but they wont
necessarily be those scientists. Theyll write scripts and
write programs to help the lab run, and theyll also probably do
some data analysis.
Companies and academics that pursue such work, naturally, are
more concerned with the value they can derive from using software
tools than with building the tools themselves.
Ive heard it over and over again from executives of different
pharmaceutical companies, Mercer says. Possibly 90 percent of their
software stack has been developed in house but offers them no
competitive advantage. The real crown jewels in bioinformatics are
relatively small compared with the huge bulk of software they have
to maintain.
Theyre often in a situation where they want to exchange data
with other pharmaceutical companies on a pre-compete level, and
they find that hard, because their processing pipelines are
uniquely their own. A lot of commercial companies are looking for
things like MBF to adopt as a common platform, so they are using
the same tools, analyzing the data in the same way, and they are
able to share data sets and cut costs.
In other words, MBF helps make bioinformaticians work a bit
simpler. That certainly appears to be the case at J&J PRD.
We have integrated it into our data-analysis and -visualization
platform, Third Dimension Explorer, which has been developed in
house, Kolpak says. This platform is used in a multitude of
different contexts.
With regard to J&J PRDs large-molecule exploration, he lists
the ability to achieve five distinct tasks:
- View sequences with their associated assay data to see how
variations across compounds impact targets.
- Align multiple sequences.
- View aligned sequences and their associated metadata, such as
complementarity-determining regions.
- Extract and translate regions of sequences.
- Work with sequences of different formats to provide a generic
platform for scientists to import and analyze them in one
place.
The Third Dimension Explorer sequence
viewer extension enables users to view data in different forms and
correlate it directly to the sequence where the data originated.
The table contains the sequence data, and the top view shows the
aligned sequences, color-coded by hydrophobicity. The views on the
right are examples of visualizations of the assay data in the
table.
The goal, Kolpak says, is to capture operations that are
performed routinely and make it extremely efficient to execute in
one place. But at the same time, we are not trying to replace
existing sequence-analysis tools for the more complex and less used
operations.
At Johnson & Johnson Pharmaceutical R&D, there are
hundreds of users of the Third Dimension Explorer tool. The
MBF-related development is still being completed and rolled out,
but 40 people already are using the enhanced data-analysis
platformand deriving significant benefits.
Its hard to quantify the amount of time it has saved us, Kolpak
says, due to the fact we work with an agile development methodology
and, for each iteration, we are finding new functionality in MBF
that we can utilize. I would say that, for our initial rollout,
which required a large amount of framework implementation, it saved
us around three months during a six-month initial development
cycle.
Biological work might not be the first thing that comes to mind
when people think about Microsoft, but it supports such scientists
nevertheless.
The Microsoft Research Sequence Assembler
presents the results of a Basic Local Alignment Search Tool query
for the current sequence in Silver Map, a visual control developed
by the Queensland University of Technology.
Inside Microsoft Research, weve done lots of biology, Mercer
says. Its not what everybody would expect, but a lot of researchers
apply their computer-science research in the biological domain for
healthcare. How can you apply Microsoft technologies to scientific
research? We often do that through collaborations with academics,
where the academic brings the biology, in this case, and Microsoft
brings the computer science. Together, hopefully, we advance
further than either side would have done independently.
Eventually, you have to ask yourself the question, Why dont we
just build a platform so that all of the common elements are
written once and dont need to be written again for every single
project? And once that platform exists, and its open-source and
free, why not give it away to the community so it can benefit?
There are specific ways in which MBF can assist in the
biological domain, such as with modularity, extensibility, and code
maintenance.
Those sorts of things that professional programmers think of
arent necessarily the first things in the minds of those who are
writing scripts to support a lab, Mercer continues. MBF sits in the
middle, with prewritten functionality in nice, digestible chunks,
very standardized.
There are quite a few other biological libraries akin to MBF
already in use, some of them for a decade or more. But over time,
they have grown unwieldy, making it hard to extend them. And they
tend to be written in script-based languages that have no type
checking. MBF, on the other hand, offers type checking and
guarantees, and its built atop the common-language runtime,
providing the flexibility to handle any of the more than 70
languages that work with .NET, thereby making easy for a
heterogeneous community to use without having to conform to a
single language.
Weve also wrapped the individual bits of MBF as workflow
activities for our
Trident workflow workbench, Mercer adds, which is also free and
downloadable. You dont even have to be a programmer to use MBF. You
can just drag and drop and connect the building blocks together to
build workflow pipelines.
External Research attempts to understand the precise scientific
challenge encountered by its MBF partners, a methodology termed
scenario-based development that identifies areas where MBF can be
made more useful. That methodology will be a key component of the
next wave of the tools enhancement.
The Microsoft Research Sequence Assembler
displays a series of short DNA sequences assembled by the Parallel
De Novo Assembler algorithm into a contig.
Were approaching our partners in the academic community and the
commercial world to define those scenarios, Mercer says, and thats
whats driving the direction in MBF v2. We encourage the wider
communitypeople who download the source code, understand it, and
start developing their own extensions to support their own
scienceto participate, because the more of those we get, the more
broadly we can develop MBF. It will grow by the actions of the
community, to support the science that the community wants to
support.
That, in the example of J&J PRD, is exactly what is
happening.
A lot of what is on our wish list we have been developing in
stride, Kolpak says, mainly a visualization tool for viewing
sequences, in addition to some other sequence file-format supports
that contain more than just sequence data. These are all things we
plan to contribute back to the MBF development.
And the community at which MBF is focused expects to use
open-source code.
If we want to run a project that would be recognizable and
familiar in form to the academic community, Mercer says, then that
would be a software-development project that is open-source,
because open-source is a very common model there. We want to get
contributions from as broad a set of people as possible.
We want scientists to get a value out of using Windows, he
concludes. We want scientists to pick up different tools that we
have and understand that they can help them do their research more
effectively and reach insights more quickly than they would
otherwise manage to do. Weve got a lot of value to offer in that
area.
The folks at Johnson & Johnson Pharmaceutical Research and
Development couldnt agree more.
I am a software developer by trade, Kolpak says, and by using
MBF, I have the confidence that what I am providing our users is
not just solid code, but also that the science behind it is
accurate.