Eliot's Ramblings

The Road to MMS Automation

“MongoDB is as easy to operate at scale as it is to develop with.” From the very beginning of MongoDB, I’ve envisioned making that bold claim. Until today, it’s been a dream.

We just brought it firmly into the realm of the realistic. Today we rolled out a completely revamped MMS built atop Automation, our cloud service for deploying and running MongoDB. Automation works with any infrastructure, from AWS to private cloud to bare metal. It deploys brand new replica sets, adds new shards to clusters, adds replica set members, deploys version upgrades… all at the push of a button. Monitoring and Backups are maintained seamlessly via Automation as well. It also makes julienne fries.1

You can read about the new MMS on the MongoDB blog, or visit mms.mongodb.com to check it out for yourself. What I would like to explain here is the road that led to this point, which began with our mission to create a database that enabled the greatest possible productivity and horizontal scale.

The Consequences of Horizontal Scale

There are many possible designs for horizontally scaling data stores. As with all development, every design choice incurs trade-offs. Sometimes developer productivity comes at a cost of operational complexity. Moreover, MongoDB’s chosen mechanism for horizontal scaling results in an infrastructure where maintenance tasks must be performed in a specific order. We understood the consequences to these choices, and envisioned a layer of tooling above the raw MongoDB processes that would offer an operator a robust yet simple mechanism for maintenance.

The trouble is, no tool existed which could serve as that layer.

Existing Provisioning Tools

General purpose provisioning tools are not designed to manage non-homogenous, distributed resources for which ordering of maintenance operations matters. Every resource is treated as a stand-alone system, so operations which require ordering of events require complex and error-prone custom code. One example of this is upgrading a sharded cluster. The process is to upgrade the mongos processes (possible in parallel), then the config servers (serially), and then all the replica sets (sets in parallel, nodes within sets serially). While it is technically possible to express this within the frameworks of today’s provisioning tools, it is far from easy, and the tools do not provide primitives to address the problem domain. An operator with these needs is essentially on their own.

We’ve said all along that the complexities of operating a MongoDB cluster could and would be handled with tooling, but unfortunately, several years after MongoDB became by far the most popular non-relational database, that tooling still did not exist… until today.

MMS is not meant to replace provisioning tools like Chef, or Puppet… in fact many people will deploy the MMS automation agent via these tools, after provisioning and configuring their VMs with those same tools. Rather, it is meant to take over where they leave off.

MongoDB ♥ Ops

This is a love letter to operators everywhere. Our new MMS is a huge step forward for running MongoDB infrastructures, making even the most tedious and risk-fraught tasks simple.

Our mission is to take things that are overly complicated and make them simple, even enjoyable. First we did it with data storage programming, and now we’ve done it with our own infrastructure. We hope you enjoy this new chapter of MongoDB. Now go forth and deploy!


  1. Does not actually make julienne fries

LiveScribe vs. Phone Camera

I’ve been using this new toy. Well, it’s for work, but until the novelty wears off, it’s definitely also a toy.

I like taking notes in meetings on paper as much as possible. It’s less distracting, and more friendly. I’ve tried various ways of doing this, but nothing has stuck yet. The closest has been a regular notebook. The biggest problem is that I don’t like carrying things to and from work, or to different places. So I invariably end up with 4 notebooks, and then I can’t get notes from a trip when I’m at the office.

I’ve been checking out Livescribe. They make a special pen with a camera in it (I have the Livescribe 3), which works with special paper that has a teeny (nearly invisible) grid of dots printed on it. The pen talks to an app on my phone via bluetooth, capturing what I write in real-time. The notes sync automatically to Evernote, where I file them.

I can leave a notebook in every location I might want to take notes, and then view them from anywhere. Kind of nifty.

Once I started trying this out, though, I started thinking about the problem in general, and wondered if there wasn’t a way to accomplish what I needed without carrying around another dedicated piece of hardware. Like I said, travelling light is top priority for me.

So I tried making do with hardware I already carry around everywhere – my phone. It’s pretty quick to save a new “snapshot note” in Evernote, which is where my notes wind up, anyhow. That actually works much better than you might think. There are two issues, though. LiveScribe will transcribe handwriting to text, which I can then go back and edit, or copy for pasting. Even more of an issue for me is I have to remember to take the picture. I’m not great at remembering things when I’m done with one thing and starting to think about the next.

So… remember the pen, or remember to take the picture? Is it going to be worth it enough to me to carry around a special pen all the time? I’m not sure, but I’ll let you know in a week or so where I land.

The Staff Meeting

Everyone with a staff knows they need a staff meeting on a recurring basis, often weekly. And those who don’t have staff are themselves in other people’s staff meetings, making it one of the most common meeting types for anyone to attend. Sadly, there is often ambiguity around what they are for, making them annoying and inefficient.

What I Want out of Staff Meetings

The purpose of these meetings is twofold: 1) status updates, and 2) key decision making or the precursor conversations for decision making. I want status updates to be very efficient, well thought out, and delivered only for items that are really required. I want thoughtful conversation on key topics, and I want to cut short conversation when the topics are sufficiently discussed.

Meeting Format

A meeting should be formatted based on what you want out of it.

After years of iterating, my team and I have landed on a format that makes staff meetings useful and efficient for all attendees. We tried a lot of our own ideas, and took good concepts from others (Amazon’s study hall). This isn’t the last iteration (there’s room for improvement), but we’ve finally reached the point where they are highly functional.

Each meeting has a shared google doc. They live in per-team folders, and have the meeting date in the title. They contain a status update section, and proposed agenda for the meeting.

Before the Meeting

Before the meeting, team members fill out their status update section and also put agenda items in a list with their initials. Others can “second” an agenda item with their initials and we will only cover items that have received at least a second vote. There is a strong social contract enforcing preparation such that it’s rare that prep work is not done in time.

The First 10 Minutes

For the first 10 minutes of the meeting, everyone gathers in the room (real or virtual) and begins reading the status updates, silently, together. There are two main benefits of doing it this way. Firstly, the deadline for writing is clear, and the time to start reading is clear. The second is that this reading period ensures that everyone reads the full material, as it is often otherwise skimmed or skipped in the interest of time.

While reading the status updates, meeting participants comment on the updates (using google doc comments), add more agenda items, and second other items. This is all done silently.

The Discussion Section

After the silent portion of the meeting, the team begins the discussion portion of the meeting. First comes the commented portions of the status updates. Then we proceed down the list of agenda items (but only those that have at least two sets of initials by them).

During the discussions, notes are taken in the document, in-line. They are written in red text, and are classified rigidly into 3 types of note: action items (with responsible personnel), key facts, and decisions.

The meeting ends on time (usually early), and anything unattended to becomes automatically added (without need for seconding) to the top of the agenda for the next meeting.

Sample Agenda

Here’s an example of an agenda:

Status

  • Person 1
    • I did X
    • My team did Y
    • I’ve been thinking about Z Person 2: What about trying Zeta?
  • Person 2
    • More status stuff
    • Cannot get Flooble to compilePerson 1: Have you tried –just-work-already?

Agenda

  • Agenda item 1 (initials of proposer)
  • Agenda item 2 (initials of proposer, of supporter, [initials of more supporters])
    • Decided: we will use a producer/consumer model for this component
  • Agenda item 3 (name or proposer, name of supporter, [names of more supporters])
    • Key Fact: We will not be able to have this QA’d until QA finishes with this other team’s crunch project
    • Action Item: Bob will create an ASCII art logo for MongoDB while waiting for QA to free up.

MongoDB 2.6 and the Future

MongoDB 2.6 has been released. For my thoughts on many of the features of the release, please see my blog post on mongodb.org.

Beyond the features, this release means a lot to me. In five years, we’ve gone from four people trying to figure out if a document database was a viable concept, to the fifth most popular database in the world. MongoDB 2.4 and all previous releases proved that the document model can transform how modern applications are developed and deployed. Despite this, we knew many of MongoDB’s core components were imperfect. It was time to address these shortcomings.

MongoDB 2.6 is the first release of the next generation of MongoDB. To smooth out the rough edges, we’ve ripped out and re-written large portions of the code base. We’ve built an entirely new set of tools to complete our vision of how easy it could be to manage a database cluster comprised of thousands of machines1. We’ve grown the team tremendously, both in terms of size and expertise, and are confident we can continue innovating for years to come.

MongoDB 2.6 is the beginning of the next generation, but is in no way the culmination. We’ll continue rebuilding concurrency, storage, networking and anything else that gets in the way of scale and performance. MongoDB 2.8 will have document level locking and MMS Automation will take control of the most challenging MongoDB deployments.

Personally, helping the database world re-invent itself is incredibly rewarding. Even if MongoDB isn’t right for everyone’s project, knowing that we’ve made a contribution to the way people think about data storage in modern applications is truly gratifying. I hope we at MongoDB can make the database that will continue to push the boundaries, and that helps teams focus on making great products rather than worry about storing data.

None of this would be possible without the contributions of the MongoDB community. Their feedback, code, and support is invaluable to the MongoDB team.

Thank you to the entire MongoDB team, whose hard work and dedication to everything that goes into the success of this project and company inspires me to work harder and smarter.

Debugging the Boss: The Martyr

Like The Superhero, The Martyr does their team’s work to make up for not managing. However, whereas The Superhero insists on hogging all the interesting work, The Martyr does work that no-one wants to do.

When a deadline is looming and things are looking down, they will pull all nighters to finish it themselves rather than do what a manager should do, such as motivating their team, or fixing the deadline. When things go wrong, they will take all the blame instead of teaching their team.

In the grand scheme of things, The Martyr poses less of a threat to an organization than some other buggy bosses, more analogous to inferior lubrication on moving parts than the active sabotage of The Glory Hog. Nonetheless, they do have negative effects, and often would love to do better if they only knew how.

Subtypes: Some Martyrs won’t delegate because they don’t want to – their personality craves the sacrificial behavior. Others do so because they can’t bring themselves to delegate unpleasant work, and doing it themselves is their only option.

Behavior in meetings: The Martyr will never miss an opportunity to call attention to their sacrifices. This overt, calculated attention-seeking is the opposite of inspiring.

Impact on team: The good news is that The Martyr does not persecute their team, belittle them, or in any way create an environment that punishes excellence. They do, however, lower the bar. As the team learns that the boss will just cover all the gaps, team members will not feel the need to stretch, or put in an extra, last-mile sprint when it’s crunch time. If The Martyr covers enough of the grunt work, some on their team may become downright spoiled, internalizing the idea that they should never have to do work that they dislike.

Impact on product: The Martyr refuses to delegate crappy tasks, which can bottleneck the team when those tasks are the ones that matter. When The Martyr inevitably reaches the end of their capacity, or even breaks down, product releases will be delayed.

Trait gone wrong: A sense of duty.

Debugging: The right approach will depend on what kind of Martyr you’re dealing with. For those who are simply squeamish about delegating work they know is unpleasant, you can focus on the negative impact their behavior has on the team. As with The Best Friend, realign their view of what’s good for their team members from the short-term dislike of certain tasks to the long-term needs of their careers.

The self-appointed Martyr is a harder debug. While they do care about the happiness of their team, they are most of all driven to seek acknowledgment and approval through sacrifice. They will never miss an opportunity to go unhealthily above and beyond, so they can moan about how hard it was to everyone. They need help to believe that their peers will value them even if they are not burning themselves to cinders. This is a job for a therapist; but peers, reports, and managers of Martyrs can best help by facing this head on with a private, compassionate, but frank conversation. Tell them straight out that they are doing themselves harm, and that they do not need to fear being judged as inadequate, for “only” putting in a full week’s work. If they appear receptive to this feedback, you should continue to reinforce this over time. If they do not, and instead become defensive, you can try again, but if you gain no traction, you will have to evaluate if they are doing more harm than good.

Not to be mistaken for: The Superhero, who wants all the work and doesn’t care if they don’t leave it for team, or The Best Friend, who might do work rather than having their team do it because they want to be liked.

Debugging the Boss: The Superhero

Like The Martyr, The Superhero does their team’s work to make up for not managing. They are super smart, super capable, and they can often do most or all of the jobs that their reports do better than their reports. They also care deeply about the quality of the product their team works on. Unfortunately, they are not inclined to delegate any of the interesting work, because they want it all for themselves. If one of their reports comes to them with a problem, they are more likely to just do the work for them than teach them how to solve the problem.

Essentially, the Superhero would rather be superheroic than excel at managing and mentoring their team. Their heroics may be appreciated by some, but they will make everyone under them feel lousy. You can easily imagine that in their internal monologue, The Superhero’s favorite line of dialogue is “I could do this all myself.”

Behavior in meetings: You barely need a guide to ID this boss. In meetings with their team, they will be very critical, often without suggesting improvements; when they do suggest improvements, they will do so at a pace too rapid to facilitate understanding and in a manner which makes people feel stupid. They will not necessarily hype their own work, however, as they’re not necessarily Glory Hogs), and it’s not impossible for Superheros to be modest. This is a very clear contrast that lets you know you’re not dealing with a Martyr (post coming soon).

Impact on team: This boss is a super de-motivator, because they are always acting like they are better. The Superhero makes this a self-fulfilling prophecy. Firstly, they commit their team to work based not on what they can do, but what they might deliver if they were composed of clones of the Superhero. Secondly, they create an environment where it’s embarrassing to ask for help, or even clarification. The tasks their reports should execute are set up for failure, the work doesn’t come out right, the Superhero redoes it, and no-one is happy.

As with The Martyr, team members will not drive to finish, as the Superhero will just finish their work anyway. The Superhero’s team in particular will stop caring about the quality and timeliness of all the work they do, whereas the Martyr only enables their team to avoid the crappy work.

Potential managers working under a Superhero are likely to internalize this behavior and pass it on in turn.

Impact on product: Note that Superheroism is a trait, not an ability. It’s inevitable that at some point they will fool themselves into thinking they have expertise in areas where they only know enough to be dangerous, because they are so used to being the expert. At that point they are likely to make a disastrous design decision that down the road maims or kills the product, or just costs tons of money and time to fix, while customers complain loudly and the brand tarnishes.

Trait gone wrong: The drive to make things the best they can be.

Debugging: The Superhero doesn’t appreciate that other people don’t share their super powers, and thus can’t understand why no one can do anything as well as they can. They also misunderstand the nature of their responsibilities, by focusing on the work their team does, and ignoring the growth needs of members of their team. Start by teaching them what a manager really does.

Often Superheroes just don’t care that much about the true responsibilities of managing, but it must be made clear to them that every time they do work on a team member’s behalf, they are demonstrating a failure of leadership.

The Superhero might be incorrigible. Return them to individual contributor status, reap the rewards of their abilities, and keep them from harming those around them.

Not to be mistaken for: The Martyr, who loudly heralds their sacrifices, only picks up the work their team leaves behind, and who otherwise may care for their team well enough.

Debugging the Boss: The Politician

The Politician’s main concern is making their bosses and peers think they are doing a great job, and are responsible for every success they can claim, regardless of reality. They are cousin to the Glory Hog, but are far less destructive than them, because their goal is to create a successful environment for themselves. Also, their behavior is driven by confidence, not under-confidence. They are not threatened by their reports’ accomplishments, because they intend to take credit for them. This means that as long as they look good, they don’t mind if other people do too.

General Behavior: Politicians are social creatures, and embody many of the qualities that make good leaders. They are well-liked by their peers, because they invest time in those people they think will make them look good. They speak well and have a natural sense for knowing what people want to hear. They are so attuned, however, that they often seek advancement through this capability alone, putting the good of the team and their mission second.

Behavior in meetings: The Politician will often keep their team out of external meetings, like the Isolationist. This helps them say one thing to their team and another to other stakeholders. Sucking up to the boss is by no means limited to one type of person, but the Politician is particularly adept at it. Watch for them to agree with the boss’ position when they are around, and qualify this agreement in private with their team.

Impact on team: Often their desire to look good leads them to commit their team to impractical goals, but then privately blame them when those goals are not achieved. Their team members will catch on to this eventually and go from adoring their manager to loathing them.

Another impact they can have is to allow high performers or those with great potential to languish if they are not outgoing and charming, because they don’t help the Politician look good. This is the opposite of a manager’s charter – their job is to find those diamonds in the rough and polish them up.

Impact on product: Thankfully the Politician is not a product killer, but they have a near-certain likelihood of keeping it from reaching its potential, because their dedication is to looking good at all times, even when it means they agree with a decision that results in a poor outcome, or making no decision when it would ruffle the wrong feathers.

Trait gone wrong: The ambition to be a key player; also the desire to get along with important people.

Debugging: As with other corrective measures, positive reinforcement is better than negative. Show how their powers can be used for good, rather than ill. Use a positive way to talk about this, by explaining that when politics is used in the service of a goal other than naked ambition, it becomes diplomacy.

Not to be mistaken for: The Isolationist, who seeks to keep their team under the radar for protective reasons. Or the Glory Hog, who may play politics, but is motivated by fear.

Glass as a Presentation Aid?

I’m intrigued by the idea of using Google Glass during a presentation to avoid ever having to look at or touch a computer. I’ve taken a cursory look over the apps that are currently available, and tried out Your Show and Glassentation.

I’m concerned about two things – one, pulling it off at all, meaning making sure that my audience is still focused on my talk and not my gadget, and two, being able to continuously engage the audience while referring to my notes quickly enough to not break the flow.

Any suggestions? At this point I think I’d do well using the Evernote glass app and putting my notes there. Anyone out there who’s seen this done, or done it themselves?

Debugging the Boss: The Isolationist

The Isolationist manager takes their job as a “crap umbrella” to a dysfunctional extreme. They try to limit interactions between their team members and other people in the organization. They take their responsibility toward their team very seriously, and their isolation is a misguided attempt to make them more productive.

Behavior in meetings: The Isolationist isn’t so much identified by behavior in meetings, as much as by the influence they have on organizing meetings. They do their utmost to prevent their team members from attending meetings with external teams. This makes them a huge bottleneck.

Impact on team: The Isolationist may have some team members who are very happy to be isolated, some who chafe at their lack of interaction with other teams, and some who are indifferent. The major impact to the team, however, is not lowered morale.

Teams need exposure. Seniority comes with experience, and if a team member’s only point of contact to the world is their manager, they aren’t going to get the experience of working with many people. That makes them poorly positioned to work in any other environment. They also need exposure so that their colleagues know, challenge, and respect them. Without contact with a larger team, they won’t feel like part of a larger team.

Impact on product: Increased risk of failure! Isolated workers are always going to be missing major parts of the story, so they won’t get why what they’re doing matters. These circumstances vastly increase the risk that requirements will not be correctly understood. If, for example, you don’t understand the concerns of your colleagues on the business side, you can’t help catch a mistaken assumption.

Trait gone wrong: Protectiveness, primarily. Isolationism is most often motivated by a sincere desire to help their team. This can be exacerbated by their skepticism of the capabilities of other teams, which, unsurprisingly, may derive from their having been managed by an Isolationist themselves. As well, they might be overconfident, if they think can handle the throughput of syncing their team members with everything they need to know from the meetings they don’t get to.

Debugging: The Isolationist is motivated by concern for their team, so debugging requires refocusing that concern on their team members long-term effectiveness and career development.

Usually an Isolationist has learned the behavior from a poor example, and it’s important to present them with specific examples of ways that problems that cropped up could have been prevented had their team members been more involved in cross-team communication.

One exercise for de-isolating is to select a single team member, and a single project they are involved in, and send them to one carefully chosen meeting (maybe a recurring one) that they could contribute to, or benefit from. Bring an engineer who wrote the code on a project to the next brainstorming meeting for that product.

*Not to be mistaken for: The Politician.

Jira Email Summarizer

I’ve written a Python program to do something fancy with JIRA that I couldn’t get using built-in facilities. You already get notifications from Jira about the tickets you personally care about, based on your notification settings. My tool will give you, additionally, an hourly email in your inbox summarizing all the changes in projects you care about, skipping the the ones you already got direct notifications of. Not only that, but it will make sure that you only ever have one of these summaries in your inbox, by consolidating them when a new summary is generated. It’s only at version 0.2 at the moment, but I’m opening it up today and hopefully some of you will find it useful. Over time, I hope we can polish it some more.

The Motivation

I get Jira notices all day long (at least 300 per day), for a variety of projects. Sometimes I’m in a situation where I want to review and potentially act on these notifications immediately. At other times I take a hiatus from email and focus on meetings, coding, etc., and all those notifications pile up (I’ve written another, more complicated tool to handle these, which I’ll cover later).

When I go from a task back to email mode, and review the ruin that time has wrought on my inbox, one of the things I scan for is the latest Jira developments. However, even with my ticket mail notification pruning tool, the signal to noise ratio was not good enough. Plus, I still had to review each update, email by email.

To Hell With Filters

So why not just do the obvious thing and use filters? Well, I’ve tried filters many many times. The problem is, once I filter something, it’s not in my inbox any more. I have so much other mail piling up in my inbox that I never go looking into the filter folders. So filtered mail is just forever dead to me. In some cases that’s actually just fine, and if there’s something I’ve never read but then need to find, I can search for it. With Jira tickets, however, that isn’t sufficient – I have to stay on top of those.

I looked for a Jira plugin that could help, but there wasn’t anything remotely suitable. If I was going to get any help, I was going to have to write something myself.

Now, I’ve got plenty of high priority code-related work on my plate, so taking the time to write a tool like this required a non-trivial decision. What made me pull the trigger was a convergence of a few things. I was about to go on a work trip, and email always builds up when I travel. This conincided with my recently missing a couple of important Jira notifications due to volume, while other non-Jira emails were slipping through the cracks. I’d been consistently declaring email bankruptcy, about once a month, too.

The Summarizer

Since I was going to have to write my own tool, I could consider what I really wanted:

  • Notifications of changes to tickets that I care about to hit my inbox.
  • A managed TODO list of Jira tickets that have changed with some indication of priority, and if I should look at them.

I started with this quick algorithm:

  • Find all tickets that have changed in last X hours (more on X later)
  • Organize them by what field in the ticket changed
  • Find out what’s changed
  • Put that all into a nicely formatted email

That happens once an hour, and that alone would produce a nice hourly summary.

But what if I don’t deal with Jira for 7 hours? There would be 7 of those summaries to read, and worse, the same ticket might be in all 7, driving me nuts.

So, the program does more than just summarize:

  • First read the inbox, and find any summary email still in there
  • Instead of changes in last hour, go back and combine time frames
  • After sending a new email, automatically archive the older one.

Now I only ever have exactly 1 Jira summary email. To commit bankruptcy: just archive one email start fresh!

The Jira Email Summarizer is available on Github.

A sample of the output:

**CREATED**

SERVER-12351 Correct auditing comments in action_types.txt
   https://jira.mongodb.org/browse/SERVER-12351
   status: Open
   reporter: someone@10gen.com
   assignee: someone@10gen.com
   fixVersions: <version>
   priority: <priority>
   components: Logging,Security

SERVER-12350 Failed update with $mul errors with reference to $inc
   https://jira.mongodb.org/browse/SERVER-12350
   status: Open
   reporter: someone@10gen.com
   assignee: NONE
   fixVersions:
   priority: Minor - P4
   components: Updates
   changes
      adamc/adamc@10gen.com
         description
            Trivial to reproduce, just set a field to a string, then try to multiply it and you get the $inc error:

            {code:js}
            db.foo.update({}, {$mul : {a : 2}}, false, {multi : true}, {ordered : false})
            Update WriteResult({
            ****snipped****

SERVER-12348 dropIndex is incorrectly labeled as auditing only in action_types.txt
   https://jira.mongodb.org/browse/SERVER-12348
   status: Resolved
   reporter: someone@10gen.com
   assignee: someone@10gen.com
   fixVersions: 2.5.5
   priority: Major - P3
   components: Security
   changes
      someone@10gen.com
         Link
            This issue is related to QA-341
      somsone@10gen.com
         assignee
            An Assignee
      somsone@10gen.com
         status
            In Code Review
      somsone@10gen.com
         status
            Resolved
         resolution
            Fixed
   comments:
      by: someone@10gen.com
         Code review url: http://codereview.10gen.com/6284539290714112
      by: xgen-internal-githook/internal-tools+githook@10gen.com
         Message: SERVER-12348 removed false comment from action_types.txt
         Branch: master
         https://github.com/mongodb/mongo/commit/...

SERVER-12347 WriteBatchExecutor should re-use an UpdateDriver across all updates
   https://jira.mongodb.org/browse/SERVER-12347
   status: Open
   reporter: somesone@10gen.com
   assignee: NONE
   fixVersions: Needs Triage
   priority: Major - P3
   components: Internal Code,Performance


[Many more updates]

below supressed since should have gotten email
SERVER-xxxxx
SERVER-xxxxx
SERVER-xxxxx
SERVER-xxxxx
SERVER-xxxxx
SERVER-xxxxx