Wednesday, August 5, 2020

Neo4j

Neo4j INTRODUCTIONMartin: This time we are in beautiful San Mateo. Hi, Emil, who are you and what do you do?Emil: Guten Morgen! That’s it, that’s all the German I know. So my name is Emil Eifrem and I run a company called Neo Technology and we are a graph database company.BUSINESS MODEL OF NEO4JMartin: Cool, what is that?Emil: So a graph database is a database model that is inspired by the human brain. The human brain is structured in neurons with synapses connecting neurons which build up the big network and the mathematical word for network is a graph. So what we have built is a database that rather than using tables which is sort of the standard model or it was used to be the standard model, it uses nodes and then relationships between these nodes which then builds up this big graph. And people know the word graph now because of Mark Zuckerberg like Social graph and that’s definitely a very common use case. The nodes are people, the relationships are whether you know each other. B ut we have a lot of other use cases, in fact social is not even the most popular use case for us.So for example fraud detection ends up with every node is a transaction or an individual and then you have relationships connecting all of these and you want to find patterns in the big graph of payments, so there is one use case.Identity and access management, so you are a big corporation and you are a big financial institution, so we have a lot of big financial institutions and you want to onboard a new trader and that trader has access to the subset of all of the collateral that the bank has produced and the specific subset is controlled by what nationality they actually are, what products they worked on, even what colleagues they have worked with before because sometimes you may have insider trading rules if two colleagues who have worked before have access to the same thing. So that’s a very big connected, complicated mass.Another final example is recommendation engine, people who bought this have also bought that, those kinds of things are also very graphy in nature.Those are some examples of use cases, if you have connected data, you sometimes get ten times faster performance than relational database and existing table based database but sometimes you even get a thousand times faster or a million times faster, so it is dramatically faster when it comes to this type of connected data operations.Martin: Emil, you are from Sweden. How did you come up with this idea and how did you start?Emil: So we actually ran into the problem ourselves. We worked at a start-up in Sweden, three founders of the project at least. And we worked at the enterprise content management company which is basicallyâ€" Can I draw? Will that stick on camera if I draw?Martin: Yes, I guess so.Emil: So basically the problem that we had, we were building an enterprise content management system. And enterprise content management is basically like web content management which is the popular on e that everyone knows today. So it is basically a big file system on the web where you have folders, like this, where you have other folders in those and inside of those folders you have files. This of course is a big tree but it turns out that when you add security to this, so you are able to say ‘Here is Martin’, over here. He belongs to this group, maybe Product Marketing, let’s say you are in product marketing. This Product Marketing group belongs to the Marketing group. Marketing has read access to this folder but product marketing has write access to this one. So all of the sudden, when Martin logs on and we need to check whether he has access to all these things, we have to look at all big, connected mess over here and this big connected mess over there and the connections between them. As we have this problem and we try to store that in normal square static tables which is entirely possible, entirely doable but it is just really, really hard.And so what ends up happeni ng is that, you end up doing a lot of joins, you end up doing a lot of cumulated things. When we started, we were 10 people in the company, 5 people in the engineering team and I was like twenty years ago. But a year later it was 50 60 people and twenty person engineering team and I was the CTO and I noticed that about the half of my team basically spent the vast majority of the time just fighting with the relational database. At that point we said, “What’s going on here? In all my other projects, the relational database has been my friend. So what is going wrong here?” And then we realized that after we double-click to that and really tried to find out what is going on; it was this miss-match with the shape of data that we had and the tabular abstractions that were exposed.So at this point we said, “There has got to be another way.” What If there was a database that had this graph structure, exactly like the database but had the graph structure, instead of tables, that w ould be amazing, that would solve all our problems. So we said, “There has got to be someone else must have had this problem, we didn’t google around, we altavisted around the search engine at the time but basically we didn’t find anything. At that point we saidâ€"the famous words said, “Let’s just build it ourselves. How hard can it be?” It turns out fifteen years, this is back in 2000, fifteen years later it is pretty hard to build a database.So basically that’s when we decided to build this thing. And we built it for a couple of years but only as an internal tool. Put it in production and in 2003, then backing that enterprise content management system. We always thought of it as something that is very generic. We did not optimize for this particular use case or anything like that and we really initially hadâ€" already from the start we had very high inspirations and felt that this was something that we wanted to unleash upon the world because it just seemed unlikely that we would be the only people with this problem. At a macro perspective, if you take a step back are we moving to a more disconnected world or a more connected world? That is kind of a naïve question how obvious it is, right? Well but if that’s true, that’s going to get me consequences in all parts of the stack right and ultimately everything we do with technology ends up in a freaking database. Everything we do â€" every software that we touch, this mobile phone, everything we touch multiple times per hour, all of that ultimately leads to a consequence in some database somewhere.And if the world is becoming increasingly connected and there is value in representing those connections, then that’s going to exert a lot of pressure on the existing infrastructure and we just didn’t see that over time, we would become less relevant, on the contrary, we felt like we were serving on the right side of history.But in the early 2000s there was absolutely zero market acceptances fo r taking a new type of database to the market. So I don’t know how old you are or if you were around back then but basically in the mid-nineties there was this surge of object oriented programing languages, and on the tail end of that there was also a surge of object oriented databases and the inertia was that we have round objects now, we can’t put them in square tables. Instead as the world is going to move to an object oriented paradigm for their programing languages we are going also to store those objects in object oriented databases. Makes sense, except it didn’t work at all. And there were a number of reasons why and the key contributing factor was one keynote by Larry Ellison at Oracle OpenWorld where he basically wiped out an entire industry with one keynote, Larry Ellison style.The industry kind of tried out this object oriented database thing, failed miserably and so the discourse in the early 2000s was something like the relational database will always be the only database model. It has now proven itself, it’s like people thought of it as a mathematical axiom. We can build things on top of the relational database but it will always be that fundamental thing. And that was the discourse in the industry in the early 2000s. We thought, we have this amazing graph database and it gives us all kind of benefits, and again we thought that we were on the right side of history, like macro trends should be in our favor but we said, that there is no acceptance in the market to take this out there. And that changed in 06’ and 07’.So what happened in 06’ and 07’ was that Amazon published a research paper, an academic paper called Dynamo DB, where they said, “We are Amazon, we tried a bunch of different things but we were unable to solve our problems without having to invent our own database, Invent own database”, right? And if your goal is to be an e-commerce site or sell books or sell computer resources, whatever it is that Amazon wants to d o, like you don’t want to build your own database. You want to use some other database off the shelves so you can invest your energy elsewhere. But the very, very, very smart people at Amazon had concluded that there was nothing off the shelf that worked for them and then they wrote a paper on how they did that. And then a little bit later Google announced basically the same thing, they wrote an academic paper called Big Table where they say “Hey, we are Google, we have some amount of expertise managing data and we have also tried the relational database and it’s also failed for us, so we also had to invent our own, new type of database and we called it ‘Big Table’.”And so this caused a lot of stir in the development community and all of the sudden people started realizing that, “Well actually maybe the relational database isn’t the only thing that is going to be out there”. And then of course for a while, as with everything there is a pendulum, so people then star ted thinking that the relational database is going to die and go completely away but of course it is never true and now I think we are sort of a little bit of a fairly informed state where I think people generally acknowledge the fact that the era of one size fits all database is over. We are no longer just going to take all our data and shove it into a single system, in the past that has been the relational database. But on the flipside, what we are going to do as data architects is, we are going to look at our big data set because all data sets will be big and we are going to look at this part over here in my data is tabular in shape, so let’s put that in a relational database. This part over here is what I call tall skinny tables, so just key value bars, like this, let’s put that in a key value store and this part over here is big and messy and connected and chaotic and dynamic, awesome, let’s put this in a graph database. So we saw that and spun out the company in 2007, to ok all the IP on the database side and put that into this new company, bootstrapped for a couple of years and then raised a small seed round in 09’, focused on community and product, we were open source. And then in 2011, we raised our A round and moved over here and started focusing on building an organization, actually commercial customers and that’s what we have been doing ever since.Martin: Cool. Let’s talk about the technology, so imagine I am a company and I believe in the big data paradigm, I have built all my data pipeline and then I would, for a specific use case only, for example use Neo4j and I would only take a subset of a data which I think applies for the use case. Is your database scalable over nodes?Emil: Yes, so it scales out horizontally. We don’t use the word nodes because nodes mean something else for us. In the graph they are called nodes, the data elements, right? It’sa little bit of terminology confusion. But it scales out across machines, so you can scale horizontally on commodity hardware. It runs on top of the JVM so it really can run wherever the JVM runs which is most places. It also scales up very well. So one of the interesting aspects about a graph database is that you typically don’t want to split up the graph across multiple machines, you can but it is really hard and it sometimes leads to problems where, in order to satisfy one query you are going to need to pop across the network. That’s typically not very fast. So it is awesome if you can fit the entire graph into one machine. You don’t have to but if you can, that’s good. And so we’ve worked also in addition to working a lot on scale out, we worked a lot on scale up, so that made sure that if there is a lot of memory in a machine that we honor that and we use that very efficiently.Martin: And the Neo4j is only the graph database or are you also offering tools for pattern analyzers, data visualizations, etc.?Emil: We do a little bit of tooling. But say 95 percent of our bandwidth goes into building the core database engine. Just because we are a small team and it is quite a big effort building a database but there is some amount of tooling offered by us and a lot of tooling offered by the ecosystem. Today we are the most popular graph database by a wide margin. Actually if you look at some objective measures, we are probably twice as big as all the other graph databases combined and not necessarily because we are so much smarter or so much better than anyone else but we just got started earlier and that does lead to number of really interesting benefits, in particular we run ecosystem. So since we have the largest user base of graph database users. It just makes more sense for any tooling provider to ride in our ecosystem. That’s a nice benefit of being the leader in a category and so we rely a lot on external tooling providers to provide the stuff around the database.Martin: Cool. What things are you doing in order to foster this kind of ecosystem?Emil: A couple of things. First off, we are open source and I think that’s really the key thing. We have a community edition which is available for free. You can use it, wherever you use MySQL for free, you can use Neo4j Community for free, it’s the same license as GPL. So that’s the key one, then of course we do a lot of things to try to grow the community and engage the community. Last year we ran, this is kind of crazy, we ran 500 Neo4j events last year, 500.Martin: Only in US orâ€"?Emil: Worldwide. So if you go online on http://neo4j.com/events/ today, when you watch this, you are going to see 2 â€" 3 events somewhere, probably. And I kind of lied there because I said we run them and that’s not strictly true because the vast majority of those is just volunteers; people who love the technology so much and are so fascinated by it, that they started meet up group in Kuala Lumpur or in Onaka or whatever. They just talk about use cases, they talk about cust omers, they talk about new features, etc. And so our role in those is typically, we write a check for the pizza or something like this. But we also have big events, so next week we have Graph Connect which is our annual big conference that we run twice per year, that’s how annual it is. In the fall we run it here in San Francisco and in spring in London. We are expecting about a 1000 people next Wednesday and Thursday, here in San Francisco. So it really ranges from the 10, 15, 20 people spontaneously, informally organized pizza and beer, all the way up to a big professional event. Those are some of the things that we are doing to foster and grow and engage the community.Martin: Emil, you said that you are open source basically. How do you make money?Emil: We’re open source, we also have the Community edition which is available for free of the website. We also have an Enterprise edition which has a number of features that if you are a big company, you don’t need them but you r eally want them. Things like the clustering that we discussed before which if you are Walmart, who is a customer of ours or UPS is a customer of ours and you have a graph database, running in production, you don’t want that running on just a single machine. You want that replicated and clustered across a number of machines so that if one goes down, the cluster will still be up and running. There is a large financial institution which use us for onboarding of traders â€" the use case I mentioned earlier. And if we down in minutes, the entire bank stands still. It handles 50 million requests per day. If that’s down for a minute that’smillions and millions of dollars. Also that just cannot happen. So obviously, then you want clustering and that’s available in the Enterprise edition. So that’s how we make money. We sell it in the normal fashion these days which is a subscription based model so you pay every year for your right to use the software. So that’s how I am able to buy water at Starbucks and things alike.Martin: And how do you acquire those customers? Is it mainly due to the community aspect or is it that you are having a direct sales force or maybe even a partner network?Emil: Yes, that’s a great question. The actual acquisition I’d say 95+ percent is organic, inbound through the community work that we are doing. So it’s someone out there who picks up the software, plays around with it, typically during weekends and evenings, likes it, realizes that, “Hey, I actually had a problem, like last year that this could have solved” and the following year thy run into the similar problems like, “Maybe I’ll try this graph database thing”. Then they try it out, start playing around with it; realize it does solve the problem. At that point if they work in a big company, typically they call us and the moment they call us we have a direct sales force. We are very much a traditional enterprise software company in the sense that we have actu al people answering the phones. But we do vast majority via phone so it’s not go out and visit with customers, that’s the primary one. But we do million dollar deals, in all recurring revenue million dollar deals with global 200 companies working with the CIOs and it’s a very big strategic bet for them. And at that point of course we go out there and we shake hands. So that’s how that model works.Martin: In the beginning of the interview you said, “How hard can it be to build up a database”, right? If you look back, why was it so hard?Emil: Wow, that’s a great question. I think there are two aspects to that question. Sure there are multiple nuances but I will focus on two aspects.First off, it is technically very difficult to do a database and we have very high aspirations. There is a number of those no-sequel databases out there like for example, they said that the relational database is good with some things but they threw away a lot of other things. One of the thing s they threw away that we disagree with is transactions. Transactions means that if you run a number operations, if you write to the database and then you say ‘commit’, than once the database says, “yup, that’s committed”, the database will guarantee that your data will be there forever. And we think for a database that’s not a negotiable feature. That has to be there. And actually a lot of people, strangely enough from your perspective, disagree with that and it’s very popular today to talk about eventual consistency and things like that.We actually agree with eventual consistency but we want to do that layer on top of a transactional core. My point is that writing this software is really, really hard; really, really hard. I mean it is the kind of thing, where it is like nine women won’t give birth to baby in one month. It requires calendar time. It requires you to be out in the wild, with customers, in production for a long time in order to really get the kinks out of that system.Just as an example, early on we had situations that, it is like back in 03’ and 04’, so a long time ago, where if someone was writing a transaction to the database and the database crashed, so one thing that we do, unlike some other databases today is that we will always roll back to safe state so you will either see; not see that transaction at all or you will see the full transaction. You will never see half written data. In order to do that you basically use what’s called a transaction log. And without geeking out too much in this, although I’d love to do that. It’s suffice to say that basically what’s called a transaction log will write your data. Now so what ends up happeningâ€" or what happened to us in 03’ and 04’ was that if the database crashed while you wrote this data that was fine. When you booted it up after the machine booted up when you started the database it will just recover, bring it back to stable state. Except there is a little bi t of a process, called a recovery process where it reads the logs, tries to figure out what is that stable state. What happens if you crash during that time? Then you will need to be able to recover from that.Martin: It’s an infinite loop.Emil: Exactly. And those are just one tiny little examples of the loopholes that once are up and running with tens of thousands of customers in production, you are going to run into all of these kind of eventualities and it’s going to be on the combinatorial explosion of different versions of the Java virtual machines combined with different versions of the 10, 20, 30 OS’s out there, of the different versions of disk controllers and that’s a very large combination of things that you need to guarantee that it works because that’s what we ultimately sell to our customers. It is piece of mind, trustability of the data and so it must never fail. And writing that kind of software, it just takes a lot of time. So that’s one aspect.The second aspect that I actually alluded to before which is that there was just no market acceptance for a new type of database. And what we have done is one of the hardest things in technology that we’ve created a new category. This equivalent to what, for example, VM Ware did back in the late nineties. No one knew what virtualization was. It actually had been invented earlier in the mainframe era but basically they took the concept and created a market around virtualization. And Palm Pilot did that when they launched, if you remember Palm Pilot.Martin: Doesn’t ring a bell.Emil: Well, that dates you actually. So they launched this thing that was this ‘personal digital assistant’ PDA, right? It was that phones end up killing them. But they created this new category. And we have been able to do that with graph databases. Graph databases is a term that we put together with some academic articles from the 80’s but that looked nothing like the modern graph database. So we just took the word graph and database and put it together and started defining it, giving it meaning and popularizing it. And now it isâ€"Forester researchers which is one of the big analysts firm says that 25 percent of enterprise will be running on graph databases in 2017. Garker says that 75 to 80 percent of the leading organizations are going to be piloting and proof of concepting graph databases by 2018. The entire Global 2000, the entire Fortune 500 will be using graph databases in production by the end of this decade. That’s a very much zero to one kind of Peter Thiels terminology; so going from absolutely zero putting those two words together into where we are heading, we are nowhere near done yet but where we are heading is very, very hard and it takes a lot of work.Martin: When I look at entrepreneurs I always think, ok one thing they need is vision and they need to be naïve. And this is a good example because if you have expected how hard it would be you would never have started.Emi l: For sure. That is very, very true and if someone had told me in 2000 that 15 years later you still going to be working on this piece of software, I would be like, “Dude, that’s never going to happen.”Martin: Six months maximum.Emil: Exactly! That’s very true. If we had known how difficult it is to pull off and all the things that could have killed us and should have killed us we never would have even started.Martin: Good.ADVICE TO ENTREPRENEURS FROM EMIL EIFREM In San Mateo (CA), we meet CEO and Co-Founder of Neo4j, Emil Eifrem. Emil talks about his story how he came up with the idea and founded Emil Eifrem, how the current business model works, as well as he provides some advice for young entrepreneurs.INTRODUCTIONMartin: This time we are in beautiful San Mateo. Hi, Emil, who are you and what do you do?Emil: Guten Morgen! That’s it, that’s all the German I know. So my name is Emil Eifrem and I run a company called Neo Technology and we are a graph database company.BUSINESS MODEL OF NEO4JMartin: Cool, what is that?Emil: So a graph database is a database model that is inspired by the human brain. The human brain is structured in neurons with synapses connecting neurons which build up the big network and the mathematical word for network is a graph. So what we have built is a database that rather than using tables which is sort of the standard model or it was used to be the standard model, it uses nodes and then relationships between thes e nodes which then builds up this big graph. And people know the word graph now because of Mark Zuckerberg like Social graph and that’s definitely a very common use case. The nodes are people, the relationships are whether you know each other. But we have a lot of other use cases, in fact social is not even the most popular use case for us.So for example fraud detection ends up with every node is a transaction or an individual and then you have relationships connecting all of these and you want to find patterns in the big graph of payments, so there is one use case.Identity and access management, so you are a big corporation and you are a big financial institution, so we have a lot of big financial institutions and you want to onboard a new trader and that trader has access to the subset of all of the collateral that the bank has produced and the specific subset is controlled by what nationality they actually are, what products they worked on, even what colleagues they have worked with before because sometimes you may have insider trading rules if two colleagues who have worked before have access to the same thing. So that’s a very big connected, complicated mass.Another final example is recommendation engine, people who bought this have also bought that, those kinds of things are also very graphy in nature.Those are some examples of use cases, if you have connected data, you sometimes get ten times faster performance than relational database and existing table based database but sometimes you even get a thousand times faster or a million times faster, so it is dramatically faster when it comes to this type of connected data operations.Martin: Emil, you are from Sweden. How did you come up with this idea and how did you start?Emil: So we actually ran into the problem ourselves. We worked at a start-up in Sweden, three founders of the project at least. And we worked at the enterprise content management company which is basicallyâ€" Can I draw? Will that sti ck on camera if I draw?Martin: Yes, I guess so.Emil: So basically the problem that we had, we were building an enterprise content management system. And enterprise content management is basically like web content management which is the popular one that everyone knows today. So it is basically a big file system on the web where you have folders, like this, where you have other folders in those and inside of those folders you have files. This of course is a big tree but it turns out that when you add security to this, so you are able to say ‘Here is Martin’, over here. He belongs to this group, maybe Product Marketing, let’s say you are in product marketing. This Product Marketing group belongs to the Marketing group. Marketing has read access to this folder but product marketing has write access to this one. So all of the sudden, when Martin logs on and we need to check whether he has access to all these things, we have to look at all big, connected mess over here and this big connected mess over there and the connections between them. As we have this problem and we try to store that in normal square static tables which is entirely possible, entirely doable but it is just really, really hard.And so what ends up happening is that, you end up doing a lot of joins, you end up doing a lot of cumulated things. When we started, we were 10 people in the company, 5 people in the engineering team and I was like twenty years ago. But a year later it was 50 60 people and twenty person engineering team and I was the CTO and I noticed that about the half of my team basically spent the vast majority of the time just fighting with the relational database. At that point we said, “What’s going on here? In all my other projects, the relational database has been my friend. So what is going wrong here?” And then we realized that after we double-click to that and really tried to find out what is going on; it was this miss-match with the shape of data that we had and t he tabular abstractions that were exposed.So at this point we said, “There has got to be another way.” What If there was a database that had this graph structure, exactly like the database but had the graph structure, instead of tables, that would be amazing, that would solve all our problems. So we said, “There has got to be someone else must have had this problem, we didn’t google around, we altavisted around the search engine at the time but basically we didn’t find anything. At that point we saidâ€"the famous words said, “Let’s just build it ourselves. How hard can it be?” It turns out fifteen years, this is back in 2000, fifteen years later it is pretty hard to build a database.So basically that’s when we decided to build this thing. And we built it for a couple of years but only as an internal tool. Put it in production and in 2003, then backing that enterprise content management system. We always thought of it as something that is very generic. We did not op timize for this particular use case or anything like that and we really initially hadâ€" already from the start we had very high inspirations and felt that this was something that we wanted to unleash upon the world because it just seemed unlikely that we would be the only people with this problem. At a macro perspective, if you take a step back are we moving to a more disconnected world or a more connected world? That is kind of a naïve question how obvious it is, right? Well but if that’s true, that’s going to get me consequences in all parts of the stack right and ultimately everything we do with technology ends up in a freaking database. Everything we do â€" every software that we touch, this mobile phone, everything we touch multiple times per hour, all of that ultimately leads to a consequence in some database somewhere.And if the world is becoming increasingly connected and there is value in representing those connections, then that’s going to exert a lot of pressure o n the existing infrastructure and we just didn’t see that over time, we would become less relevant, on the contrary, we felt like we were serving on the right side of history.But in the early 2000s there was absolutely zero market acceptances for taking a new type of database to the market. So I don’t know how old you are or if you were around back then but basically in the mid-nineties there was this surge of object oriented programing languages, and on the tail end of that there was also a surge of object oriented databases and the inertia was that we have round objects now, we can’t put them in square tables. Instead as the world is going to move to an object oriented paradigm for their programing languages we are going also to store those objects in object oriented databases. Makes sense, except it didn’t work at all. And there were a number of reasons why and the key contributing factor was one keynote by Larry Ellison at Oracle OpenWorld where he basically wiped out an entire industry with one keynote, Larry Ellison style.The industry kind of tried out this object oriented database thing, failed miserably and so the discourse in the early 2000s was something like the relational database will always be the only database model. It has now proven itself, it’s like people thought of it as a mathematical axiom. We can build things on top of the relational database but it will always be that fundamental thing. And that was the discourse in the industry in the early 2000s. We thought, we have this amazing graph database and it gives us all kind of benefits, and again we thought that we were on the right side of history, like macro trends should be in our favor but we said, that there is no acceptance in the market to take this out there. And that changed in 06’ and 07’.So what happened in 06’ and 07’ was that Amazon published a research paper, an academic paper called Dynamo DB, where they said, “We are Amazon, we tried a bunch of different things but we were unable to solve our problems without having to invent our own database, Invent own database”, right? And if your goal is to be an e-commerce site or sell books or sell computer resources, whatever it is that Amazon wants to do, like you don’t want to build your own database. You want to use some other database off the shelves so you can invest your energy elsewhere. But the very, very, very smart people at Amazon had concluded that there was nothing off the shelf that worked for them and then they wrote a paper on how they did that. And then a little bit later Google announced basically the same thing, they wrote an academic paper called Big Table where they say “Hey, we are Google, we have some amount of expertise managing data and we have also tried the relational database and it’s also failed for us, so we also had to invent our own, new type of database and we called it ‘Big Table’.”And so this caused a lot of stir in the development community a nd all of the sudden people started realizing that, “Well actually maybe the relational database isn’t the only thing that is going to be out there”. And then of course for a while, as with everything there is a pendulum, so people then started thinking that the relational database is going to die and go completely away but of course it is never true and now I think we are sort of a little bit of a fairly informed state where I think people generally acknowledge the fact that the era of one size fits all database is over. We are no longer just going to take all our data and shove it into a single system, in the past that has been the relational database. But on the flipside, what we are going to do as data architects is, we are going to look at our big data set because all data sets will be big and we are going to look at this part over here in my data is tabular in shape, so let’s put that in a relational database. This part over here is what I call tall skinny tables, so j ust key value bars, like this, let’s put that in a key value store and this part over here is big and messy and connected and chaotic and dynamic, awesome, let’s put this in a graph database. So we saw that and spun out the company in 2007, took all the IP on the database side and put that into this new company, bootstrapped for a couple of years and then raised a small seed round in 09’, focused on community and product, we were open source. And then in 2011, we raised our A round and moved over here and started focusing on building an organization, actually commercial customers and that’s what we have been doing ever since.Martin: Cool. Let’s talk about the technology, so imagine I am a company and I believe in the big data paradigm, I have built all my data pipeline and then I would, for a specific use case only, for example use Neo4j and I would only take a subset of a data which I think applies for the use case. Is your database scalable over nodes?Emil: Yes, so it sc ales out horizontally. We don’t use the word nodes because nodes mean something else for us. In the graph they are called nodes, the data elements, right? It’sa little bit of terminology confusion. But it scales out across machines, so you can scale horizontally on commodity hardware. It runs on top of the JVM so it really can run wherever the JVM runs which is most places. It also scales up very well. So one of the interesting aspects about a graph database is that you typically don’t want to split up the graph across multiple machines, you can but it is really hard and it sometimes leads to problems where, in order to satisfy one query you are going to need to pop across the network. That’s typically not very fast. So it is awesome if you can fit the entire graph into one machine. You don’t have to but if you can, that’s good. And so we’ve worked also in addition to working a lot on scale out, we worked a lot on scale up, so that made sure that if there is a lot of m emory in a machine that we honor that and we use that very efficiently.Martin: And the Neo4j is only the graph database or are you also offering tools for pattern analyzers, data visualizations, etc.?Emil: We do a little bit of tooling. But say 95 percent of our bandwidth goes into building the core database engine. Just because we are a small team and it is quite a big effort building a database but there is some amount of tooling offered by us and a lot of tooling offered by the ecosystem. Today we are the most popular graph database by a wide margin. Actually if you look at some objective measures, we are probably twice as big as all the other graph databases combined and not necessarily because we are so much smarter or so much better than anyone else but we just got started earlier and that does lead to number of really interesting benefits, in particular we run ecosystem. So since we have the largest user base of graph database users. It just makes more sense for any tooling p rovider to ride in our ecosystem. That’s a nice benefit of being the leader in a category and so we rely a lot on external tooling providers to provide the stuff around the database.Martin: Cool. What things are you doing in order to foster this kind of ecosystem?Emil: A couple of things. First off, we are open source and I think that’s really the key thing. We have a community edition which is available for free. You can use it, wherever you use MySQL for free, you can use Neo4j Community for free, it’s the same license as GPL. So that’s the key one, then of course we do a lot of things to try to grow the community and engage the community. Last year we ran, this is kind of crazy, we ran 500 Neo4j events last year, 500.Martin: Only in US orâ€"?Emil: Worldwide. So if you go online on http://neo4j.com/events/ today, when you watch this, you are going to see 2 â€" 3 events somewhere, probably. And I kind of lied there because I said we run them and that’s not strictly true b ecause the vast majority of those is just volunteers; people who love the technology so much and are so fascinated by it, that they started meet up group in Kuala Lumpur or in Onaka or whatever. They just talk about use cases, they talk about customers, they talk about new features, etc. And so our role in those is typically, we write a check for the pizza or something like this. But we also have big events, so next week we have Graph Connect which is our annual big conference that we run twice per year, that’s how annual it is. In the fall we run it here in San Francisco and in spring in London. We are expecting about a 1000 people next Wednesday and Thursday, here in San Francisco. So it really ranges from the 10, 15, 20 people spontaneously, informally organized pizza and beer, all the way up to a big professional event. Those are some of the things that we are doing to foster and grow and engage the community.Martin: Emil, you said that you are open source basically. How do yo u make money?Emil: We’re open source, we also have the Community edition which is available for free of the website. We also have an Enterprise edition which has a number of features that if you are a big company, you don’t need them but you really want them. Things like the clustering that we discussed before which if you are Walmart, who is a customer of ours or UPS is a customer of ours and you have a graph database, running in production, you don’t want that running on just a single machine. You want that replicated and clustered across a number of machines so that if one goes down, the cluster will still be up and running. There is a large financial institution which use us for onboarding of traders â€" the use case I mentioned earlier. And if we down in minutes, the entire bank stands still. It handles 50 million requests per day. If that’s down for a minute that’smillions and millions of dollars. Also that just cannot happen. So obviously, then you want clustering a nd that’s available in the Enterprise edition. So that’s how we make money. We sell it in the normal fashion these days which is a subscription based model so you pay every year for your right to use the software. So that’s how I am able to buy water at Starbucks and things alike.Martin: And how do you acquire those customers? Is it mainly due to the community aspect or is it that you are having a direct sales force or maybe even a partner network?Emil: Yes, that’s a great question. The actual acquisition I’d say 95+ percent is organic, inbound through the community work that we are doing. So it’s someone out there who picks up the software, plays around with it, typically during weekends and evenings, likes it, realizes that, “Hey, I actually had a problem, like last year that this could have solved” and the following year thy run into the similar problems like, “Maybe I’ll try this graph database thing”. Then they try it out, start playing around with it; rea lize it does solve the problem. At that point if they work in a big company, typically they call us and the moment they call us we have a direct sales force. We are very much a traditional enterprise software company in the sense that we have actual people answering the phones. But we do vast majority via phone so it’s not go out and visit with customers, that’s the primary one. But we do million dollar deals, in all recurring revenue million dollar deals with global 200 companies working with the CIOs and it’s a very big strategic bet for them. And at that point of course we go out there and we shake hands. So that’s how that model works.Martin: In the beginning of the interview you said, “How hard can it be to build up a database”, right? If you look back, why was it so hard?Emil: Wow, that’s a great question. I think there are two aspects to that question. Sure there are multiple nuances but I will focus on two aspects.First off, it is technically very difficult to do a database and we have very high aspirations. There is a number of those no-sequel databases out there like for example, they said that the relational database is good with some things but they threw away a lot of other things. One of the things they threw away that we disagree with is transactions. Transactions means that if you run a number operations, if you write to the database and then you say ‘commit’, than once the database says, “yup, that’s committed”, the database will guarantee that your data will be there forever. And we think for a database that’s not a negotiable feature. That has to be there. And actually a lot of people, strangely enough from your perspective, disagree with that and it’s very popular today to talk about eventual consistency and things like that.We actually agree with eventual consistency but we want to do that layer on top of a transactional core. My point is that writing this software is really, really hard; really, really hard. I mean it is the kind of thing, where it is like nine women won’t give birth to baby in one month. It requires calendar time. It requires you to be out in the wild, with customers, in production for a long time in order to really get the kinks out of that system.Just as an example, early on we had situations that, it is like back in 03’ and 04’, so a long time ago, where if someone was writing a transaction to the database and the database crashed, so one thing that we do, unlike some other databases today is that we will always roll back to safe state so you will either see; not see that transaction at all or you will see the full transaction. You will never see half written data. In order to do that you basically use what’s called a transaction log. And without geeking out too much in this, although I’d love to do that. It’s suffice to say that basically what’s called a transaction log will write your data. Now so what ends up happeningâ€" or what happened to us in 03â €™ and 04’ was that if the database crashed while you wrote this data that was fine. When you booted it up after the machine booted up when you started the database it will just recover, bring it back to stable state. Except there is a little bit of a process, called a recovery process where it reads the logs, tries to figure out what is that stable state. What happens if you crash during that time? Then you will need to be able to recover from that.Martin: It’s an infinite loop.Emil: Exactly. And those are just one tiny little examples of the loopholes that once are up and running with tens of thousands of customers in production, you are going to run into all of these kind of eventualities and it’s going to be on the combinatorial explosion of different versions of the Java virtual machines combined with different versions of the 10, 20, 30 OS’s out there, of the different versions of disk controllers and that’s a very large combination of things that you need to guarant ee that it works because that’s what we ultimately sell to our customers. It is piece of mind, trustability of the data and so it must never fail. And writing that kind of software, it just takes a lot of time. So that’s one aspect.The second aspect that I actually alluded to before which is that there was just no market acceptance for a new type of database. And what we have done is one of the hardest things in technology that we’ve created a new category. This equivalent to what, for example, VM Ware did back in the late nineties. No one knew what virtualization was. It actually had been invented earlier in the mainframe era but basically they took the concept and created a market around virtualization. And Palm Pilot did that when they launched, if you remember Palm Pilot.Martin: Doesn’t ring a bell.Emil: Well, that dates you actually. So they launched this thing that was this ‘personal digital assistant’ PDA, right? It was that phones end up killing them. But they cr eated this new category. And we have been able to do that with graph databases. Graph databases is a term that we put together with some academic articles from the 80’s but that looked nothing like the modern graph database. So we just took the word graph and database and put it together and started defining it, giving it meaning and popularizing it. And now it isâ€"Forester researchers which is one of the big analysts firm says that 25 percent of enterprise will be running on graph databases in 2017. Garker says that 75 to 80 percent of the leading organizations are going to be piloting and proof of concepting graph databases by 2018. The entire Global 2000, the entire Fortune 500 will be using graph databases in production by the end of this decade. That’s a very much zero to one kind of Peter Thiels terminology; so going from absolutely zero putting those two words together into where we are heading, we are nowhere near done yet but where we are heading is very, very hard and it takes a lot of work.Martin: When I look at entrepreneurs I always think, ok one thing they need is vision and they need to be naïve. And this is a good example because if you have expected how hard it would be you would never have started.Emil: For sure. That is very, very true and if someone had told me in 2000 that 15 years later you still going to be working on this piece of software, I would be like, “Dude, that’s never going to happen.”Martin: Six months maximum.Emil: Exactly! That’s very true. If we had known how difficult it is to pull off and all the things that could have killed us and should have killed us we never would have even started.Martin: Good.ADVICE TO ENTREPRENEURS FROM EMIL EIFREMMartin: Emil, what start-up advice could you give to first time entrepreneurs so they can make less errors that they could avoid?Emil: So first off start-up advice, I think start-up is really hard and really dangerous because I think so much is contextualized and I actually think that some of those brilliant things in life but let’s focus on building companies, some the most brilliant things in building companies comes from people who go completely 180 degrees from common wisdom. And so I try to refrain from giving generic start up advice. Having said that, I think the thing that have helped me is the obvious thing is, the obvious thing which everyone say which is passion for what you do. I’ve been doing this for 15 years, sure the company for 7 â€" 8 years but worked on the technology for 15 years and every freaking year I’ve had more fun than previous year. When we were two guys and hadn’t had salary for a year and we are just so completely dirt poor I still had so much fun. And then when we grew the team to like 6 people, I was like, “Oh wow, we actually have a team now!” It’s just amazing. And then we gain 15 and 20 and it’s like, “We need some kind of management or something here”, all the way up to now, I guess we are 110, 12 0 people and we are across 12-14 countries. I am still having as much fun as I’ve ever had in my entire life. So I think that has to be there, just because it’s just so hard that if you are not crazily passionate about what you do, you just don’t have the persistence to do it. So that would be the first and obvious one.And then there are the tactics, stay close to your customers. If you aren’t the customer yourself and some of the best technologies I think in the word have been written for the people are themselves a customer, then really, really have empathy for the customers, stay close to the customer.Both are truisms, both are things that everyone is saying. They have helped me a lot.Martin: Emil, thank you so much for your time!Emil: Pleasure! Dankeschön!Martin: And next time when you are thinking about starting a company you have to be passionate. But you need to think about ideas that are contrary to what that mainstream is thinking about but you have to be right. An d then you are starting a successful company.Emil: Contrarian and right.Martin: Right. But contrarian false is not such a good choice. OK, thank you so much. Great!Emil: Awesome!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.