Eliot's Ramblings

Open Source and the SSPL

Two weeks ago I submitted the second version of the Server Side Public License (SSPL) to the OSI for review. The revision was based on feedback from the OSI license-review mailing list, which highlighted areas in the first version that would benefit from clarification. We think open source is important, which is why we chose to remain open source with the SSPL, as opposed to going proprietary or source-available. I am hopeful that the SSPL will also be declared an OSI-approved license, because the business conditions that prompted MongoDB to issue the SSPL are not unique to us and I believe the SSPL can lead to a new era of open source investment.

Open source software is better for its users than proprietary software for several reasons. The open source approach leads to more robust and secure software, because more people are able to look for and fix bugs. It’s much less risky for a business to commit to using open source projects as key components, because it can’t be stranded the same way it could be if the maker of proprietary software it relied on closed up shop. Building applications on open source projects and investing expertise into them means that no matter what, you can still fix bugs, add features, and fork those projects if necessary. The same remedies are available to anyone if a company stewarding an open source project becomes “evil”, or if for any reason it just won’t address their concerns about bugs or missing features.

The OSI opened the door for open source software in the mainstream with pragmatic, business-case advocacy and by introducing a standard, accepted definition of open source. That standard — the Open Source Definition (OSD) — isn’t just well-intentioned, it’s also well-crafted. Its ten criteria are carefully composed to preserve the freedom of those using open source software, to encourage the maximum amount of evolutionary benefit to software that comes from modification and experimentation, and to do both while ensuring that open source appeals to commercial endeavors as much as possible. As they write in an annotation of OSD criteria 6, No Discrimination Against Fields of Endeavor:

The major intention of this clause is to prohibit license traps that prevent open source from being used commercially. We want commercial users to join our community, not feel excluded from it.

Which brings me back to the SSPL, which was written to address the new landscape where large cloud providers are positioned to capture most of the value created by open source projects, without contributing anything back. As I wrote when I announced the SSPL v1, it should be a time of incredible opportunity for open source, but if we continue to see project after project co-opted into cloud vendor ecosystems, it will not be a sound strategy to invest money into building open source projects, and many projects that would otherwise be open will instead be closed.

So that’s the intention of the Server Side Public License. For pragmatic reasons, MongoDB transitioned to use of SSPL v1 at the same time as we submitted the license to the OSI for approval, but the feedback that has come out of the process has made the SSPL a better license, and if version 2 is approved by OSI, we plan to apply it to the next release of MongoDB.

Parallel Engineering Efforts

Parallel Tracks

One of the most important things an organization can do as it grows out of startup-hood to maturity is to learn to run parallel engineering efforts. A parallel set of projects might implement multiple takes on an overall idea, testing out different approaches. Or it might implement the same idea at multiple points on a time/quality tradeoff continuum: a low-effort, easily delivered prototype, and a more fleshed-out version that takes longer to deliver. This essential institutional skill confers advantages by reducing risk and improving throughput.

It’s well established that when creating a new product, rapid feedback is more important than polish. You need to validate or disprove your assumptions and learn where to focus, preventing you from wasting money, time and effort. This philosophy is enshrined in the concept of the MVP.

Startups are essentially all MVP. A startup can only do one thing at a time. If their experiment works, they survive. If not, they die. (Pivots, runways, blah blah blah…) Because of this existential threat, the technique of releasing an MVP and iterating has become the dominant method of launching a startup – to the extent that those flouting this common wisdom are considered crackpots. And rightly so, in almost every case.

But early, constant feedback is the key to every project’s success. New projects at mature companies aren’t less likely to fail, they’re just less likely to kill the company if they do. You still have to expect your assumptions to be incorrect, for your vision to need tweaking or overhauling, for your users to care about things you didn’t expect them to, and for your initial efforts at implementation to be sub-optimal. Furthermore, an established company generally has a higher bar to pass when releasing a new product than a startup does, as clients expect a greater level of quality, polish, and integration with existing products.

Luckily, a company that is no longer living on the edge can do something that startups can only dream of – it can run multiple experiments at once. By diversifying the field of experimentation, a company can improve its outcome. Different benefits can be realized, depending on what aspect of a project this diversification varies over.

Throwaway and investment projects

Sometimes you have a project where you’re pretty sure you know what you need to build, but building it right will take a long time. This calls for a throw-away project that you can build quickly, in parallel with the more rigorous solution. The throwaway project will help some users, validate your assumptions, and teach you about the space. You will duplicate some efforts, and write code destined for the bit bucket, but it’s totally worth it.

MongoDB used this approach to develop our BI connector. We wanted to let our customers take advantage of the many BI tools out there that visualize data stored in databases. All of the mature solutions were built to work with SQL databases, so the best thing for our customers was to build a translator. There were many possible options for implementing one of these; the top two were:

Use a PostgreSQL foreign data wrapper -> easy but very limited Write a full sql translation layer for MongoDB-> hard but highly useful

We wanted to be able to solve some real problems as soon as we could, but we had no idea how long implementing a full SQL layer would take. So in June of 2015, we started building both. After about a month or so, we had satisfied ourselves that both solutions would work. We had a good POC of the postgres solution, and a super super rough POC of the full one.

We shipped the postgres-based project as v1 in December of 2015 and just shipped the full SQL proxy as v2 in November 2016. v2 is already performing way better than v1 (but can still get much better) and is a lot easier to manage (but can still get much better). v1 was limited, but we were able to ship it an entire year earlier than v2, and it addressed a very real need that a subset of our customers had. Now it’s retired, but using it we were able to validate our approach, initiate partner relationships, and iron out integration wrinkles.

Multiple competing MVPs

Sometimes you need to build something but you don’t know what the right approach is. You may have a few alternatives in mind, but no idea which is the better one, for whichever definition of “better” you value most for that project. You would address this condition by building multiple competing solutions, with the understanding that all but the winning solution will be abandoned. (In the absolute worst case, all your efforts fail, but in that case you are left with less mystery as to what factors affected the outcome, as you have tested more things.)

MongoDB has used the approach of multiple competing solutions, most notably when we were working on document level locking for MongoDB 3.0. We had built a prototype into the original storage engine, mmapv1; we were looking at WiredTiger; and we were looking at other storage engines to embed as well. Obviously, the WiredTiger solution was the winner and we ended up acquiring WiredTiger and it is now our default storage engine.

Multiple, coexisting (for now) solutions

Sometimes different audiences want the same type of solution, but in different ways. In that case you can build parallel projects to serve them, and you might wind up running with all of them for a while. Maybe over time things will converge, maybe they won’t, but you’ll be getting feedback from these parallel projects, and you’ll be able to incorporate the learning from all of them to improve them all as well.

Examples of this in action can be found in the different management systems and services we provide for the variety of environments into which MongoDB can be deployed. Regardless of whether that is fully in the cloud, fully on-prem, or some degree of hybrid, we have the same key goals: make MongoDB easy to spin up, put into production, grow, and manage.

In December of 2015, as part of MongoDB 3.2, we released MongoDB Compass, a tool focused on real time interaction with your database. In June of 2016, we released MongoDB Atlas, our database-as-a-service for MongoDB. When released, these products had no overlapping functionality. In the last few months however, Atlas has added some features from Compass. The first was real time server stats, and soon we’ll be adding the first piece of a CRUD roadmap for Atlas. In addition, Atlas features tend to flow into our Cloud Manager and Ops Manager products.

This sets up a bit of a race between Atlas and Compass. That’s ok though! It creates a bit of a competition between these teams in terms of adoption, but they are actually working together to share resources like css, design, and user research. We’re not sure where this will fall out, but our focus isn’t on the success of a particular artifact of software, it’s on getting features to our users and acting on what we learn. Over time we’re likely to see more of a convergence, but in the meanwhile we can explore the space with multiple teams, and none of that effort is wasted.

Avoiding the pitfalls and harnessing the benefits

When you run parallel efforts, it’s critical to make sure the teams have a collaborative relationship, not an antagonistic one. Some competition can be good, but it can quickly turn toxic. Diversification isn’t worth losing a team to hard feelings. Furthermore, it’s better for them all to learn from each other than than it is for them to gain a marginal productivity boost.

To start with, you need absolute and full transparency. Any level of secrecy about one of the projects is a really bad idea. Not only can it lead to teams undermining each other, it completely wastes one of the core benefits of having multiple things going on in parallel. As long as you’re getting learning from a broader surface area, you should be maximizing its impact across all the efforts, not siloing it within each. Parallel efforts should focus on the experiment, confining the competition to the areas where the different approaches are truly distinct, and leveling the playing field everywhere else.

The document-level locking project I mentioned before is a good example of how teams working on competing solutions can collaborate. While mmapv1 team worked on document-level locking in the existing storage engine, and the WiredTiger team worked on integrating with MongoDB, they both collaborated to enable document-level locking at the layers above the storage engine.

Don’t set things up so there will be a “winner”, and definitely don’t put money on the line. Bear in mind, the only way an experiment can fail is by not generating results; a failed effort is actually a successful experiment. The team that built the “losing” solution did just as much to contribute to the company’s overall success by exploring – and eliminating – some of the search space.

Relish doing it twice

The specifics of these three types of parallel engineering efforts are different, but the unifying principle is that sometimes you aren’t sure which solution will work, and you shouldn’t be scared of doing it twice. Writing code isn’t the hardest part about building software, it’s building the right code that you can live with for years.

Always Be Working With Customers

Last week I was in Israel for the MongoDBeer meetup and an enterprise event, both hosted by Matrix, one of our partners, and a few really great client meetings. One of the things that I don’t get to do often enough these days is work directly with customers on interesting technical challenges, so those client meetings were really quite invigorating.

I was reminded of this recently when I was doing a fireside chat with Albert Wenger at NYCode, an event hosted by NextView Ventures. We were talking about some of the things we did early on at MongoDB that led to the great momentum we now have. Albert said that a major factor was how obsessed I was with making our users successful with their deployments. That’s true, I was completely obsessed. I had this thing about all the questions on our google group being answered as fast as possible. Day or night, if someone had a problem, I was trying to fix it with them.

As my responsibilities to my team grew, I had to leave that phase of my role behind me. It was hard to do. Albert joked on stage about how it became a board-level priority for me to stop handling support issues.

But my obsession produced more than successful users. In that formative period, those interactions taught me what people wanted from our product, enabling me to steer MongoDB towards where it should be better than I possibly could have without those direct relationships.

Of course, that “formative period” will never end, and my role remains to ensure that MongoDB is always evolving to meet the needs of as many users as possible. This is the sibling to my claim that engineering managers have to keep their hands in the code: a technology leader should never stop working directly on customer issues. If you do, and only get filtered information, you will not be able to help make good decisions, and in fact run the risk of making poor decisions. Does that take a lot of time? Of course it does. But spending that time isn’t a nice-to-have. It’s core to your job, and a well-run team works independently enough that you should have time for it. If you can’t find the time, you need to to reevaluate the rest of your commitments, as I periodically have to.

Are you getting enough exposure to your customers’ issues?

DotScale 2016 Talk: The Case for Cross-Service Joins

Back on April 25th I spoke at dotScale in Paris; I gave a talk called “The Case for Cross-Service Joins,” as in queries that join data across multiple 3rd party services. For example, analytics over data that comes from both SalesForce and Googe Analytics. I’ve been thinking a lot about this topic, because MongoDB sits at the middle of a lot of apps that utilize 3rd-party services, and the benefits of building your app on top of such services comes at the cost of that data being siloed away, and difficult analyze it in a holistic way. My thinking on this topic continues to evolve, and I’ll be writing more about that, as well.

DotScale was a great conference to speak at, with a lot of very insightful talks, and beyond that, the gorgeous Théâtre de Paris is easily the classiest venue I’ve ever spoken in.

The video is up now:

Refreshing My Client-Side JavaScript Chops

My overarching goal as CTO of MongoDB is to make building applications easier and easier. Given my day job, it’s actually been a little while since I did any work on the front end.

Since we started MongoDB, client side javascript has really taken off. This pleases me, because our decision to make JavaScript a core part of the MongoDB experience was based on a view that JavaScript would continue to rise in capability and prominence. Back in 2005, I was spending quite a bit of time in JavaScript, but the state of the art has changed dramatically since then.

Wanting to refresh myself, I wrote a proof-of-concept real time dashboard for MongoDB, using JavaScript and Handlebars.

Sure, it’s not much to look at, but I was really excited about how easy it was to do full client rendering with just API calls for data compared to last time I tried. In addition, I finally took ES6 for a test drive, and have to say classes and modules live up to what I had hoped for.

I think next up is react.

Innovate vs. Appropriate

One theme I kept harping on at MongoDB World a few weeks ago was knowing when to innovate around new ideas and when to just reuse what already works well for products that have been successful. This comes up continuously at MongoDB, because having a good understanding of it is a significant competitive advantage. I attribute a large extent of MongoDB’s success to our unbending adherence to this discipline.

When we started MongoDB, we had a clear goal - make data and databases easier for developers and operators, so that data and databases serve their users, not the other way around. To that end, there were two key things we wanted to change. First, we wanted to reduce the impedance mismatch between application code and data, which we addressed by building our database around collections of documents, rather than tables of rows. Second, we wanted to make distributed systems accessible and usable by most organizations, which we have done by making them first-class components and intuitive to use, instead of leaving them to higher layers of the stack to build.

Everything else we wanted to leave the same. We would ask ourselves, “Does this need to be different in a document based distributed database?” For example, indexes in MongoDB have the same semantics as they do in their relational counterparts, because they have worked quite well for decades, and their semantics match those used to query a document database. Query results composed of documents need the same filtering and/or sorting as those composed of rows, and indexes that cover an ordered set of fields (a, b, c) are redundant with indexes that cover a same-ordered subset of those fields (a, b). MongoDB provides a shell that mimics a relational shell, because in our world, the needs to explore databases and collections, do ad hoc queries, create indexes, get stats, and perform analytics is identical to those needs in a relational world. Even mongodump name came about because I had been using mysqldump for a decade, and making data easy to work with and distributed systems accessible would not be in any way furthered by changing that aspect of a database.

Today, we are constantly improving features and adding new ones to MongoDB. Every time we do, the question is: do we need to invent something new to make this feature fit into the document world or a distributed systems world? If the answer is yes, we innovate to try to make suitable for MongoDB. If not, we try to find the best solution out there and apply it to MongoDB (in a patent safe way of course).

Why is this so important? First off, it takes a lot more though to invent something than to copy something. Taking semantics or ideas from successful systems focuses design and architecture work where it is needed most. Focusing innovation also makes it easier for users to learn a new system. If everything is needlessly different, it will be more frustrating for your users, so there better be a good reason. And lastly, every innovation involves risk. You think you are improving something, but if you’re wrong, you’ve wasted time and have to do it all over again.

This is an important concept for all companies to master, both new and old. Like the adage “Is this a core competency” for helping decide if you should build or buy, all product teams adding features should be asking themselves “Does this need to be different in our domain?”

Backend as a Service: Security and Privacy

Almost all modern applications are composed of presentation layers, services executing business logic, and backing stores where the data resides. Developers could be more productive and agile if they could work more directly with the backing data without having to build specific APIs for every access type, but is quite a challenging problem. An emerging class of solution known as Backend as a Service (BaaS) has tried to address this problem over the last few years, but hasn’t become the norm yet.

In an ideal world, it would be great if your web or mobile app could talk directly to a database. In the real world, though, this is never done, for several reasons. Let’s start today with the security and privacy area: fine grained access needs to be built in from the ground up, and also be expressive enough to let any complex application to be built.

Security and privacy challenges are about allowing different users to have access to different data, different documents, and maybe even different fields. One might need to query on computed values of fields without being allowed to see those fields directly. A famous example of this is Yao’s Millionaires Problem in which two millionaires want to determine which one is richer without revelaing their net worth. Solving problems like that requires the kind of fine-grained access control to allow a user to run queries such as “show me all documents where a % 5 == 1” but not be able to see the actual value of a. A broad category of problems, of which Yao’s Millionaires is one, is called secure multi-party communication, and thier solutions all rely entirely on offering that kind of access control. If you are building your own REST api for your web app, building in that logic is trivial. If you are trying to build a generic BaaS, it’s a lot more complex.

There are a few BaaS providers working on this problem. Parse and Firebase are probably the best examples at the moment. They both definitely have pushed this along pretty well, but I think another big step function is needed. Further pushing the security and privacy model to allow apps to be more expressive will allow BaaS to radically improve time to market for many applications.

CSNYC, the 2016 NYC CS Fair, and Bootstrap

Last Thursday (4/7/2016) I spoke at the 2016 NYC CS Fair. Their number one goal is to encourage public high school students who study CS to stick with it, by showcasing all the great opportunities that await them should they pursue a career there. I talked about being a hacker, how to negotiate CS studies in higher education, the difference between CS and software engineering, and the importance of a good mentor. It was a great group of kids, and if even 25% of them go on to become CS students, I think the industry is going to see some positive change.

The CS Fair is put on by CSNYC, with the mission of “Computer Science for all”. One of the reasons I support csnyc is that I believe it actually has a chance of moving the needle for diversity in tech. My personal belief is the only way to make meaningful, lasting change in this area is to get kids excited about the field earlier and earlier. The more children think of CS as their thing, the more they can resist the fear, negative pressure, or bias they contend with, pushing them them to drop it.

In order to get kids excited about computer science, a few things need to happen, but the most interesting one to me at the moment is bootstrap. Bootstrap is a program that integrates a math and computer science curriculum that is meant to drop into an existing math class and be a better way to teach concepts like functions and variables, while also learning computer science. There are quite a few schools starting to use bootstrap, and I’m trying to help drive adoption. Others are doing way better and more than me, but I’ll keep trying anyway. If you’re involved in CS education, leave me a comment. How can I help?