Under the hood with Partial Indexes

Partial indexes allow you to create an index that only includes documents in a collection that conform to a filter expression. These indexes can be much smaller, cutting down index overhead in storage space and update time, and by matching against the filter criteria, queries can use this slimmed-down index and run much faster. This is one of the new lightweight “schema where you need it” features we’re bringing to MongoDB in 3.2. The idea for this feature came from discussion with our users who are accustomed to it from relational databases like PostgreSQL, which introduced the feature in version 7.2. With partial indexes, you use your knowledge of your application to adjust the space/time tradeoff you make when creating indexes to fit your needs.

One great example of this is when you have a collection where documents go through an active phase, and then move into an archival state along with a state field update (like “billed” going from “false” to “true”), where they occupy the bulk of a collection’s footprint. Since you’re unlikely to want to access them from that state outside the context of looking up a single record by its primary key or an analytical collection scan, they would just clutter up your index, consume RAM, and make your other operations run slower.

So, here’s an architecture question… is this a storage engine change?

Well, that’s a trick(y) question. From a design standpoint it absolutely should not be. Storage engines are simple (conceptually) and need to be focused on one thing: storing and retrieving data efficiently. Indexing concerns belong to layers above a storage engine.

But in the pre-3.0 days, thiswould have had to be a storage engine change, because we had not yet created a nice separation of concerns. A ton of work had to be done behind the scenes as we built 2.4, 2.6, and 3.0 to make this possible, but now we’re seeing all that hard work pay off. Pluggable storage engines is a big part of the future of MongoDB, and a sane architecture separating these layers turned making partial indexes from a nightmare into some code that’s actually really pleasant to read. So pleasant, in fact, that I’m going to tour you through some of it, by tracing the path of an insert into a collection.

At a high level, an interaction with a MongoDB collection traverses several layers to get down to the storage engine. For this tour, we’ll skip the networking and user layers, and trace the path from the Collection object to the IndexManager to the StorageEngine.

(Note: all links here are to the 3.1.7 branch to make sure they are stable, so this code is already slightly out of date - see master for newer code. Line numbers will have changed, but the general flow will be the same. (For the next year at least!))

The entry point is Collection::insertDocument which hoists out error handling (including document validation, another one of our 3.2 features, but that’s for another post), and passes down to Collection::_insertDocument

This code contains a transition across areas of concern:

StatusWith<RecordId> loc = _recordStore->insertRecord(
    txn, docToInsert.objdata(), docToInsert.objsize(), _enforceQuota(enforceQuota));

_recordStore is an instance of our abstraction around storage engines (more detail can be found here), and you can see that we just hand the data for the document over to the _recordStore to handle.

The architecture detail of note is that this code doesn’t deal with indexes, nor is indexing buried below that called to insertRecord. Rather, after doing a little collection housekeeping _insertDocument just calls IndexCatalog::indexRecord.

which in turn calls _indexRecord for every index on the collection.

There, we simply do not index entries that do not match:

const MatchExpression* filter = index->getFilterExpression();
if (filter && !filter->matchesBSON(obj)) {
    return Status::OK();

For each index where the expression matches (or there is no filter), it calls IndexAccessMethod::insert, which generates the keys (0 to many, typically 1) and inserts each one. IndexAccessMethod is a superclass abstracting how indexes are used, since there are many types, such as geospatial, btree, and full text, and each will have their own implementation.

(Those of you following along in the code might notice the abstraction for the index itself is stored as the _newInterface member of the IndexAccessMethod class. At some point that will get a better name!)

So now the storage layer doesn’t know about partial indexes at all.

The reason that this works is that the storage engine layer is required to expose a transactional key/value api, through which all interactions pass, accompanied by a transaction descriptor. The layer above that treats both collections and their indexes as sets of key/value maps. So inserting a document into a collection with 2 indexes is 31 separate table insert calls to storage engine code from higher layers, with atomicity ensured by the storage engine’s transaction system.

  1. or more, in the case of multi-key indexing of arrays ↩︎