Eliot's Ramblings

AWS Pop-up Loft Talk

On August 25th I will be delivering a talk at the AWS Pop-Up Loft in NYC. The talk is entitled: “Behind the Scenes with MongoDB: Lessons from the CTO and Cofounder on Deploying MongoDB with AWS.” The AWS lofts combine hack days, talk series, bootcamps, and “ask an architect” opportunities, and mainly target engineers working on startup projects that are built on AWS, although other people do attend the talks.

Since this is a technical crowd, the talk will be highly technical, and since it’s an AWS event, I’ll be emphasising MongoDB’s uses in the AWS environment. Here’s the abstract:

Meet Eliot Horowitz, CTO and Co-Founder of MongoDB, the next gen database built for the cloud. Eliot will share his experience founding and scaling a successful startup, discuss the value of community, and urge you to throw away code as fast as you can.

Then he’ll get into specifics regarding how to deploy MongoDB in an AWS context. To focus the discussion, he will use the example of a MongoDB-backed, multiplayer mobile game hosted on AWS, and follow it from inception as a prototype to a global infrastructure spread across multiple regions and availability zones. You will learn specific methods enabling you to start lean while being prepared to scale massively, such as tag-aware sharding for geo-aware data residence, and using multiple storage engines to optimize for particular use cases.

Links:

I’m looking forward to it, and if you’re going to be there, let me know.

Extending the Aggregation Framework

The aggregation framework is one my favorite tools in MongoDB. Its a clean way to take a set of data and run it through a pipeline of steps to modify, analyze, and process data.

At MongoDB World, one of the features we talked about that is coming in MongoDB 3.2 is $lookup. $lookup is an aggregation stage that lets you run a query on a different collection and put the results into a document in your pipeline. This is a pretty powerful feature that we’ll talk more about in a later post.

In order to make writing $lookup a bit cleaner, we’ve done some work to make adding aggregation stages easier. While this is largely for MongoDB Developers, it could also be used by anyone to add a custom stage to do some cool processing on documents inside of MongoDB. Now, given that this requires compiling your own version of mongod, and writing c++ that could corrupt data, this is not for the faint of heart, but it is quite fun :)

For example, if you wanted to write an aggregation stage that injected a new field into every document that came through the pipe, you could do it like this:

https://gist.github.com/erh/957e3193b9bd79b16cb1

Now, you could use $project for this, but my new stage makes all the values into my birthday. So, that’s better.

In the end, not too bad. If anyone has some cool ideas please share!

I Want an Apple Watch

A lot of people I talk to are unsure about the Apple Watch, and the category in general. Me, I’m counting down the days till I get my Apple Watch. In fact, at this point my impatience is so great, the prospect of having to wait another month to get one almost makes me want to go out and buy a Pebble. So, score one for the Apple marketing team, I guess.

Before we get into why, I first want to talk about Apple’s VIP feature. You can mark certain people as VIP, and then you can see emails from just them, limit email notifications to just them, and probably more things I haven’t even tried yet. I have emails from VIPs appear on my phone lock screen. This allows me to quickly glance to see if there is anything I want to read. For better or worse, my habit (addiction) is that I need to look at that fairly often.

So the only things on my lock screen are VIP emails, text messages and my next calendar item. All of those are things I generally want to see very often. Right now, that involves either pulling my phone out of my pocket and looking at it, or keeping it on a table and pressing a button. Oh, and I do like to look at the time on my phone pretty often too.

Those four things all seem to be pretty well served by the basic functionality of the Apple Watch. Time, check. Upcoming appointment, I think check. Text messages, check. VIP emails… well, they haven’t been specific about that, but I’d be really surprised if they didn’t integrate that awesome feature into the watch. For me, being able to accomplish those four things without the interruption of going to the phone seems really appealing. Time will tell if it actually works, but I’m hoping. And being to be able to dismiss a call while keeping my phone in a pocket will also be really nice.

For these reasons, my excitement is currently all about the core feature set, but I’m also intrigued by all the interesting apps that are likely to appear over the next few years. For a lark I’ve done a little daydreaming about that, maybe I’ll write up a few ideas for a later post.

Gmail Jira Decorator

As discussed in other posts, I spend a lot of time in email, and much of the email I get is related to MongoDB’s Jira. I’ve written before about my Jira summarizer, which maintains a single message in your inbox with a summary of recent activity in projects you watch. In my continuing quest to make Jira email easier to deal with, I wrote a tool to make it easier to quickly assess the email notifications about individual issues.

The tool is a chrome extension that operates on my Gmail inbox. Every 30 seconds it scrapes the subjects of emails and does a Jira request to get some basic information. (It offloads most of this work to a separate server I wrote.) It than munges the HTML to decorate the subject of the email with the status, assignee, severity and fix version.

This allows me to quickly see things that are blockers or critical, not focus on things that are assigned to someone already, or know that someone has decided that it should be fixed in the next point release vs. at some point in the future.

Gmail Jira Decorator in action

Interested in the project? Feedback on my email-centered workflow? Let me know!

Dengue Fever

Last week I went to Las Vegas for MongoDB’s sales kickoff. The night before I left, Sunday, I came down with a decently high fever. I got a bit nervous, as it came on strong and fast, but I took some Advil, went to bed, and the next morning felt ok to get on a plane. That whole Monday was pretty good with the help of some more Advil. On Tuesday morning the Advil was giving ground, on Tuesday evening it was in full retreat, and Wednesday at 5am I found a helpful MongoDB employee in the hotel to take me to the ER.

Apparently, while in the Dominican Republic for a family vacation, a.k.a playing with my kids in the water, I was bitten by a mosquito carrying Dengue fever. So I have now officially crossed “Get a tropical disease” off of my bucket list. I’m very excited about that.

I have two take-aways from this experience:

First, I don’t recommend getting Dengue fever. It’s not pleasant. Use a lot of bug spray, really.

Second, if you do have to get Dengue fever, make sure when you get really sick, and are given a fair amount of morphine, that a) you do not write any code, and b) you be administered said morphine in the presence of co-workers, who then have blackmail material for life.

I’m still a bit under the weather, but at least I’m not contagious.

Seriously, though, don’t get Dengue fever.

MongoDB 3.0: Seizing Opportunities

MongoDB 3.0 has landed.

The development cycle for 3.0 has been the most eventful of my entire career. As originally planned, it would have been great, but still incremental in nature. Instead, we wound up acquiring our first company, integrating their next-gen storage engine, and by capitalizing on that unlooked-for opportunity, delivering a release so beyond its original conception that we revved its version number.

Renaming a release in-flight is out of the ordinary, so I wrote about our reasoning when we announced the change. We had originally planned to deliver document-level locking built into MMAPv1, and a storage engine API as an investment in the future, not part of a fully developed integration. That would have been our incremental improvement, in line with our storage engine efforts throughout the 2.x release series. We had already added database-level locking, iterated over many improvements to yielding and scheduling behavior, and refactored a ton of code to decouple components.

At the outset of this development cycle we did several things in parallel. We carved out the code layers to support our storage engine API, started building collection-level locking into MMAPv1, and started designing document-level locking. At the same time, we worked with storage engine builders to put our API through its paces. By the summer of 2014, we had a MMAPv1 prototype for document-level locking, which we demonstrated at MongoDB World. While this was not going to make our use of disks more efficient or solve other MMAPv1 problems, it was nonetheless a huge improvement, and exactly what we were aiming for.

Then the WiredTiger team called us and demonstrated a working integration with MongoDB’s storage engine API. Before long, we realized we had before us an opportunity to shoot the moon. We would have to scale back our plans for MMAPv1 to just collection-level locking, but by doing so, we could completely leapfrog our roadmap and supercharge our team. By delivering MongoDB with WiredTiger, we could offer our users everything we had promised, along with performance MMAPv1 will never match, and features it would take years more to build in. After all, WiredTiger was developed with laser focus on the raw fundamentals of data storage in a modern environment, allowing it to support massive concurrency and other great features like compression.

For all its magnificence, WiredTiger is not yet the default storage engine. We have every confidence in its ability– it is a shipping product in its own right, and has proven its mettle to customers with the most demanding production environments, such as Amazon. We are using it ourselves in production to back MMS. However, the use cases for MongoDB are so broad and varied, we need to gather a wide range of feedback. With that data, we’ll be able to optimize and tune the integration and provide robust guidance on the role of specific metrics in capacity planning, leading to better, more predictive monitoring, and a healthy collection of best practices.

The acquisition of WiredTiger marks an important transition for me as well. Storage engines are incredibly interesting components of a database, but as much as I might like to dig further into them, our goal to make MongoDB the go-to database requires me to be more pragmatic. With a team of world-renowned experts available, that know more about (for example) how to implement MVCC than I ever will, it makes sense to leave storage engines in their capable hands so I can focus on other areas.

MongoDB 3.0 is a great release. I am very proud of the massive team effort that produced it. We will not be resting on our laurels though. There is still a long list of features and improvements our users need to be successful, and with MongoDB 3.0, we expect MongoDB to be used in even more demanding and mission critical projects. Many of those projects will surprise us, and these surprises will create new demands. We are excited to get started on these challenges, further optimizing MongoDB, and extending its capabilities so the pioneers can continue to surprise us.

LiveScribe vs. Phone Camera Update: The NOOP Edition

In my first post on this topic, I said I’d post an update in a week or so. Ok, so that was about 7 weeks ago.

I abandoned the trial of both of these techniques because 2.8.0 is, frankly, more important than my experiments in productivity. I’m going to get back to it, but this is actually an opportunity to say something important about getting derailed from productivity projects by urgent items.

This happens to all of us from time to time when the pressure mounts, and that’s a good thing. The key is to keep your head, focus on the most urgent thing while it’s urgent, and remember to revisit those productivity projects. They are important in the long run, or you wouldn’t want to start on them in the first place. If you find yourself constantly saying “I had to drop that, I got too busy”, it’s time to re-evaluate.

MongoDB 2.8.0-rc0

Today our team made public our first release candidate of MongoDB 2.8, rc0.

Since June, beginning with MongoDB World 2014, I’ve been speaking publicly about MongoDB 2.8, and its headline features: document level locking and pluggable storage engines. What I haven’t said until now is just how related these two features are.

We’ve been working on our storage API for roughly a year, and with MongoDB 2.8 rc0, we’re rolling out the first fully supported and working storage engine integration: WiredTiger.

WiredTiger is a modern storage engine designed from the ground up and optimized to support high write performance, compression, and vertical scalability. By integrating the WiredTiger storage engine, MongoDB 2.8 will add document-level locking and high performance writes.

Migrating existing MongoDB instances to the new storage engine can be done with a rolling upgrade, the same process as is used to upgrade MongoDB versions. And as demonstrated at MongoDB World 2014, 2.8 supports mixed-mode deployments, so teams can test out the new engine before migrating entirely.

Our original storage engine, MMAPv1, will remain the default for this release, and is going from database-level to collection-level locking, offering a major win, completely maintenance free, to teams for which a storage engine upgrade is not a priority.

Please remember, rc0 is a release candidate! We can’t wait for our amazing community to take it for a spin and start giving us feedback, but we do not recommend you use it in production!

With every release of MongoDB, we try to offer our community the most important things they’ve been asking for. With v2.8, we hope we’ve done exactly that.

MongoDB London 2014

On November 6th, I’ll be delivering the keynote address at MongoDB London 2014. I’ll be talking about the upcoming 2.8 release, the future of storage engines in MongoDB, and Automation. Since our last conference (MongoDB Boston 2014), the revamped MMS with Automation has gone from soft launch to wide release, and the response from the MongoDB community has been fantastic. We’re seeing tons of adoption and getting lots of great feedback. We’ve also been hosting meetups in our offices, to demonstrate how easy it is to use Automation to deploy a MongoDB infrastructure at any scale.

So if you’re at MongoDB London, keep an eye out for me… I’ll be walking around, trying to meet as many of you as possible. I’d love to get your feedback on Automation if you’ve tried it out already, or hear what’s preventing you from using it.

The Road to MMS Automation

“MongoDB is as easy to operate at scale as it is to develop with.” From the very beginning of MongoDB, I’ve envisioned making that bold claim. Until today, it’s been a dream.

We just brought it firmly into the realm of the realistic. Today we rolled out a completely revamped MMS built atop Automation, our cloud service for deploying and running MongoDB. Automation works with any infrastructure, from AWS to private cloud to bare metal. It deploys brand new replica sets, adds new shards to clusters, adds replica set members, deploys version upgrades… all at the push of a button. Monitoring and Backups are maintained seamlessly via Automation as well. It also makes julienne fries.1

You can read about the new MMS on the MongoDB blog, or visit mms.mongodb.com to check it out for yourself. What I would like to explain here is the road that led to this point, which began with our mission to create a database that enabled the greatest possible productivity and horizontal scale.

The Consequences of Horizontal Scale

There are many possible designs for horizontally scaling data stores. As with all development, every design choice incurs trade-offs. Sometimes developer productivity comes at a cost of operational complexity. Moreover, MongoDB’s chosen mechanism for horizontal scaling results in an infrastructure where maintenance tasks must be performed in a specific order. We understood the consequences to these choices, and envisioned a layer of tooling above the raw MongoDB processes that would offer an operator a robust yet simple mechanism for maintenance.

The trouble is, no tool existed which could serve as that layer.

Existing Provisioning Tools

General purpose provisioning tools are not designed to manage non-homogenous, distributed resources for which ordering of maintenance operations matters. Every resource is treated as a stand-alone system, so operations which require ordering of events require complex and error-prone custom code. One example of this is upgrading a sharded cluster. The process is to upgrade the mongos processes (possible in parallel), then the config servers (serially), and then all the replica sets (sets in parallel, nodes within sets serially). While it is technically possible to express this within the frameworks of today’s provisioning tools, it is far from easy, and the tools do not provide primitives to address the problem domain. An operator with these needs is essentially on their own.

We’ve said all along that the complexities of operating a MongoDB cluster could and would be handled with tooling, but unfortunately, several years after MongoDB became by far the most popular non-relational database, that tooling still did not exist… until today.

MMS is not meant to replace provisioning tools like Chef, or Puppet… in fact many people will deploy the MMS automation agent via these tools, after provisioning and configuring their VMs with those same tools. Rather, it is meant to take over where they leave off.

MongoDB ♥ Ops

This is a love letter to operators everywhere. Our new MMS is a huge step forward for running MongoDB infrastructures, making even the most tedious and risk-fraught tasks simple.

Our mission is to take things that are overly complicated and make them simple, even enjoyable. First we did it with data storage programming, and now we’ve done it with our own infrastructure. We hope you enjoy this new chapter of MongoDB. Now go forth and deploy!


  1. Does not actually make julienne fries