FastBit Bitmap Index2008 R&D 100 Winner

Computing advances have allowed us to assemble monstrous repositories of information. Searching the latest terabyte-size databases is a daunting engineering challenge. The FastBit Bitmap Index, developed by Lawrence Berkeley National Laboratory, Berkeley, Calif., was originally designed to help particle physicists sort through billions of data records in databases like these. Previously, this job was tackled by tree-based indexing, which can’t find large datasets, or bitmap indexing, which fares poorly when asked to find a large value range.

FastBit is big improvement on bitmap indexing. Innovations include: vertical data organization, which filters out required variables to reduce disk reading; word-aligned hybrid (WAH) compression, a patented 10-fold speed improvement on existing routines; and multi-level encoding, which places bitmaps in coarser bins at high search levels and only those data values that fall within a boundary into finer bins at lower search levels. FastBit’s two-level encoding works with WAH compression to speed search by 30 to 50 times over other methods. Successfully applied in commercial settings, FastBit has greatly accelerated drug discovery software at the Univ. of Hamburg, improved content/ad matching by up to 100 times at Yahoo! Research, and enabled an award-winning grid-based high energy physics data analysis.

Bitmap indexing

Lawrence Berkeley National Laboratory