Eliot's Ramblings

Reading Hundreds of Tickets at Once

I’m trying a relatively new thing these days: working through huge lists of open MongoDB JIRA tickets using a pencil and a big printout. This turns out to be a better way for me to handle this workload than sitting at a browser and doing it interactively. To explain this, I suppose I have to explain why I’m reading all these JIRA tickets.

I’m reading all these JIRA tickets because I don’t want to lose touch with the needs of MongoDB users, in spite of the ever increasing volume of related articles, blog posts, and yes, JIRA tickets. By reading all of these things, I am trying to keep an “on the ground” sense of use cases, issues, complaints, needs, and desires, which is invaluable in decision making. Knowing that this gestalt sense is honed and working well is crucial to my peace of mind. However, this sense is dampened and feels unfocused when the information comes through layers of delegation and summarizing.

Over the years I’ve developed a good ability for pattern recognition in the vast sea of open tickets. For example, while doing this very thing on a flight back from London recently, I saw that there were 10 open tickets related to one query parsing example that would be easy to fix in the new matcher. Reading all the tickets also prevents me from getting too distracted by new issues that attract attention, but aren’t as important as older issues.

So I read tons of JIRA tickets, and lately I’ve been using my favorite method so far: taking a giant printout of cases with me when I fly somewhere. My last batch was a single bucket of tickets, which printed out to 36 pages, containing about 600 tickets. I use a pencil, and mark tickets with which version to put them in, whether they are duplicates, if they should be closed, notes about implementation and what other tickets they might be related to. I generally mark about 20% of the cases in each pass.

During this process, I go into a bit of a zone, not the same as, but similar to, the zone I go into when coding. I’m not trying to triage particular cases or do meaningful work on an individual case, I’m trying to make mental links. So I move through the caseload quickly, keeping the overall view in memory, cross-referencing, looking for patterns. As I go, I’m building up a sense of areas that need more work, or where we can make headway quickly. This all-at-once method lets me observe patterns over time I might not otherwise notice. (If something comes up once every other week, it tends not to leave a dent in my thought process, but if it comes up once every other week for 4 years, that’s probably worth thinking about.)

Of course none of this works at all without 10gen’s amazing project managers that take these scribbles and do meaningful things with them.

So for those of you who use jira.mongodb.org, know that your tickets do actually all get read, and even if a ticket is really old, it doesn’t mean that we think it’s not important or that it’s being ignored.

In the end, this process gives me the confidence that when we have to rapidly shift plans, I can intuitively understand what the pros and cons will be. Given my role on the MongoDB project over the next few years, the ability to be agile will remain crucial.

In Praise of SSHFS

Emacs is the only editor I can use effectively at this point. It doesn’t matter if there are better choices (there aren’t ;-), it’s the one I’ve invested all of my muscle memory into. When working on files locally, I use normal emacs, and things are grand. Life, however, dictates that a great deal of my coding is done on remote machines. I had tried a variety of solutions to edit remote files (emacs in a shell, emacs of x, samba, nfs, etc…), none working terribly well for me.

Enter sshfs. I’m not entirely sure when sshfs crossed the divide between merely ok to solid, but it’s been more than a year that I’ve been using it happily. sshfs lets you mount a filesystem from any remote machine you can ssh into using FUSE. So when I’m at home, I can do

sshfs office2:work/mongo localworkmongo
to (and this will be intuitive for users of scp and mount) mount the ~/work/mongo directory on office2 onto the localmongo directory on my Mac. Then I can edit files in localworkmongo normally, and I just ssh into the work machine for compilation and execution.

This has been a life changer for me. It has made using a Mac desktop as my only workstation feasible, since most of my development is on a linux desktop sitting next to the Mac under my desk. It makes transitioning between work, home, or a different office painless, and even lets me do tricks like mount ec2 volumes just to poke around if I want to do something locally.

The key for me is that security is all ssh. I already use ssh for everything and go through painstaking efforts to make sure that is both secure and easy. So having to do no additional work to mount things is magic. And just to be clear, the target machine needs no additional software. If I can ssh into it, I can mount files on it.

Also, note that this solution isn’t about using a Mac everywhere, it’s about getting to use your favorite desktop apps to edit your work files no matter where they are… this solution applies just as well to editing Photoshop files mounted on a remote server.

sshfs is in package management on many platforms, including Homebrew for the Mac. As it says in the caveats, make sure you understand the info regarding the FUSE kernel extension before trying to use it.

Mongo’s New Matcher

MongoDB 2.5.0 (an unstable dev build) has a new implementation of the “Matcher”. The old Matcher is the bit of code in Mongo that takes a query and decides if a document matches a query expression. It also has to understand indexes so that it can do things like create a subsets of queries suitable for index covering. However, the structure of the Matcher code hasn’t changed significantly in more than four years and until this release, it lacked the ability to be easily extended. It was also structured in such a way that its knowledge could not be reused for query optimization. It was clearly ready for a rewrite.

The “New Matcher” in 2.5.0 is a total rewrite. It contains three separate pieces: an abstract syntax tree (hereafter ‘AST’) for expression match expressions, a parser from BSON into said AST, and a Matcher API layer that simulates the old Matcher interface while using all new internals. This new version is much easier to extend, easier to reason about, and will allow us to use the same structure for matching as for query analysis and rewriting.

This matcher rewrite is part of a larger project to restructure query execution, to optimize them, and to lay the groundwork for more advanced queries in the future. One planned optimization is index intersection. For example, if you have an index on each of ‘a’ and ‘b’ attributes, we want a query of the form { a : 5 , b : 6 } to do an index intersection of the two indexes rather than just use one index and discard the documents from that index that don’t match. Index intersection would also be suitable for merging geo-spatial, text and regular indexes together in fun and interesting ways (i.e. a query to return all the users in a 3.5 mile radius of a location with a greater than #x# reputation who are RSVP’ed ‘yes’ for an event).

A good example of an extension we’d like to enable is self referential queries, such as finding all documents where a = b + c. (This would be written { a : { $sum : [ “$b” , “$c” ] } }.) With the new Matcher, such queries are easy to implement as a native part of the language.

Now that the Matcher re-write is ready for testing, we’d love people to help test it by trying out MongoDB 2.5.0. (Release Notes)

Code

Why Fly to London for 48 Hours

I visited London a few weeks ago to attend and speak at MongoDB London. The event was very successful, and I enjoyed many conversations with attendees and staff during the event. But having the opportunity to spend time with our 10gen London team makes the value of the trips far exceed my contribution to the conference.

Although my time with the team was relatively short since my entire trip to the UK lasted only two days, it provided yet another example of “no substitute for in-person collaboration”.

While I was in the office with the team, some of us began discussing a particular technical topic (related to mutability vs immutability for a specific class hierarchy). This discussion had actually started several weeks before, when a working group was attempting to get a specification for a new feature finalized. However, the geographical distance and time zone differences between the participants had meant that the discussion was drawn out and hard to finalize. During this phase, I had been persuaded of a particular viewpoint.

Working together in person, however, means more than just lower latency. It means better instantaneous understanding. When we met face-to-face, we were able to move rapidly from discussion to quick prototypes and, rather surprisingly, I found myself changing my point of view (as did one of the engineers in London). We therefore changed the spec.

10gen is a very distributed company, with offices in 7 cities and more to come. Maintaining our agility would not be possible without the benefits of teleconferencing in all of its forms; yet as useful as it is, I find no replacement for being in the same room with someone. It may be that I am particularly bad at remote communication. Regardless, I know my frequent trips to other 10gen offices are well worth the air time.

10gen’s New Office

Monday was a big day for 10gen in New York; we moved into our new offices on West 43rd Street. The last time we moved (about 16 months ago), our then new office seemed quite spacious and impressions were that it would last quite a while. That turned out to be a bit short sighted. By January of this year we were bursting at the seams, with every desk full, expansion space taken, and competition for conference rooms straining everyone’s patience.

Our new office is one we built ourselves, and I’m happy to say that because of that, it represents more than just an end to the constraints on our resource scheduling for the moment. It means we had the opportunity to build the type of space that suits our culture – an environment for serious work, but with enough comforts to make life at the office very enjoyable. In some future posts I’ll cover some of the choices we made and why, but for now I’d just like to say “phew!”

Streaming Twitter Into MongoDB

curl http://stream.twitter.com/1/statuses/sample.json -u: | mongoimport -c twitter_live

One thing that you can do with mongo is have 1 streaming master and 1 read/write master

server A:

./mongod —master —dbpath /tmp/a

server B

./mongod —dbpath /tmp/b —master —slave —source localhost:27017 —port 9999

You can then pipe the stream into server A, and it will only process the live stream.

Server B will replicate all changes. You can also write to it, query on it, etc… This way you can do operations that block writing on server B, but server A will never backlog.