Betting on Hadoop for big data

In its own words, MapR Technologies “delivers on the promise” of open source big data framework Hadoop with a “proven, enterprise-grade platform that supports a broad set of mission-critical and real-time production uses”. In just five years, MapR has built a 700-strong global customer base and counts among its partners Amazon, Cisco, Google, Teradata and HP. Freya Purnell spoke with MapR CEO and co-founder John Schroeder on a recent visit to Australia about why he decided to bet on Hadoop and what customers want to manage big data now.

Freya Purnell: What was the background to the foundation of MapR?

John Schroeder: I have been in BI and database for over 20 years. I sold the last company that I ran to Microsoft in 2008. Then I spent all of 2008 chasing big data. I have a pretty good network to C-level executives, and I just took many, many meetings, asking “What are your top priorities? If you get them done, what does it mean to your company? If you don’t get them done, what does it mean?”

From that, you could see that big data was the top priority for all these companies for many different reasons. If you’re selling to customers, the old ways to reach them are not working any more. With data, you can do much more to target customers, but the noise is really high. In other industries, they were looking for the ability to do things like surveillance or manufacturing optimisation. So that really framed up the market opportunity.

Then we asked what technology they were thinking about using, what they liked about it, what they didn’t like about it, and ideally how would they would like it to work. You could see Hadoop was going to be the big winner in the market. To be successful, you need to be able to do everything from an analytics operation to the interactive world, so Hadoop had the foundation to be able to grow in that direction.

But what we heard from the customers was what they didn’t like about Hadoop was that it wasn’t enterprise grade, it crashes, it’s hard to get data in and out, it’s difficult to build applications.

So we developed the value proposition to build something that closes the gaps in Hadoop. We vetted it with that same group of customers, and those customers later became our beta accounts. We’ve got a lot of technical abilities of our own, but we also really listened to what customers needed and what was the highest priority, and that was how we put the company together. We’ve raised US$174 million, and our last round was led by Google.

FP: And have those key priorities around big data named by your original customers five years ago changed since then?

JS: The feedback we got then has been very lasting as far as where we set the strategy. The first phase was really to make Hadoop enterprise grade – so it’s high availability for everything, disaster recovery, and much better performance so you could trust it to be a platform for critical apps and data storage. Up until then, customers didn’t trust it.

Now we support the best SQL database in the industry as part of our Hadoop distribution, and that’s because our customers said they also need to be able to do things in real time. So we released that phase a year later.

Then there’s interactive traditional BI – data exploration, dashboarding and reporting using tools like Tableau, MicroStrategy and BusinessObjects. We went through a project called Apache Drill, which marries the best of the old world with the best of the new. What we wanted from the old world is standard SQL, so those tools will work, but you don’t want to limit data centres to traditional structured data. So we added capabilities to also build these queries through semi-structured data using technologies like JSON.

Now we will move down to expanding the platform. Whenever you’re talking about a platform, you’re talking about security, governance, multi-tenancy.  You can secure, provision and operate the unified data platform through the Hadoop APIs, the SAP HANA APIs, and more on apps like SAP Mobility, which you can’t do with other Hadoop distribution. With them, you have to build connectors and pipe data between platforms, which is expensive and there’s latency. It has been a long journey there and that’s been foundational.

FP: For a business user new to Hadoop, how do you explain it?

JS: At a high level, the way you can explain it is if you are a dealer in a casino with one deck of cards, you could shuffle and deal. If you had three decks, you could still do it, but it’s starting to get difficult, you would probably drop cards on the floor. If you have the shoe with eights cards, obviously you can’t shuffle that. It’s the same way Hadoop works – if I get that many cards and that much data, I give a little bit to you, a little bit to a few other people, I say shuffle these, and then I will take them and combine the results.

Instead of trying to grow monolithically and vertically with thousands of servers, I am going to take your application and data and we will share it across a large customer base, so all those servers will process the data and then you will shuffle together the results.

The easiest way [to help business users understand the technology] is talking about use cases and extrapolating the use cases from other companies that are bigger challenges and opportunities. It’s another scale that they’ve never seen before, and it allows you to actually observe behaviour around sampling or after the fact analysis.

FP: How are customers approaching big data challenges at the moment?

JS: You can really cut the market in half. There are the companies that have had some experience with big data and Hadoop, and they understand the technology, have gotten through the learning curve. Sometimes that’s driven by IT, but most likely it is where the business users have pushed their agenda through the company. You’ll have a visionary that says, “I want to revolutionise risk within a credit card company, and I also want to monetise by serving offers”. For the companies who have been through that, it’s a much shorter conversation, if they understand the nuances of the technology.

The other side of the market is those new to Hadoop, and they need less technology, and more training and consulting and hand-holding on what are the use cases that are interesting to them. In those projects, we’ve done a lot of training with the IT staff on the new tools and technologies.

FP: For organisations undertaking big data initiatives, are they bringing in specialised skills or resources to help them structure those?

JS: If you look at the business side of the house, they generally have the vertical industry experience in what they want to do. So it’s not really a new skill-set as much as the understanding they can do something at scale. One of the concepts is you don’t need better algorithms for analytics, you need to be able to run it at scale. Algorithms are 30 years old – the difference is today we are combining those with petabytes of data. Two things then happen – you can adjust the parameters. If you’re looking at cardholders, for example, you can say “I thought gender, age and pays on time is most important, but now gender is not as important, it’s really location”. You can change those parameters and run it again, and it’s going to give you feedback on how many clusters there are.
For IT, they need some training, they need some help on data engineering services.

FP: What developments should we be looking for next from MapR?

JS: We are really always ahead of the curve. Most organisations are still doing batched analytics. Now they have the capability to do real-time analytics through SQL, and they have the ability to do more traditional business intelligence across structured and unstructured data as a bigger scale. So I think we’re going to see over the next few years a continued push on the technology, but there’s going to be an uptick in broader usage of Hadoop across a broader set of users.