Spark catches alight with users

User adoption of open source big data platform Apache Spark is growing fast to its ease of use, rapid deployment, reliable fast performance, and suitability for real-time and advanced analytics, according to the 2015 Spark User Survey.

The survey, conducted by Databricks, a company founded by the creators of Apache Spark, found that 91 per cent of those surveyed adopted Spark because its performance, 77 per cent cited ease of programming, 71 per cent cited ease of deployment, 64 per cent cited advanced analytics capabilities, and 52 per cent cited real-time streaming capabilities.

The most popular use cases were revealed through the research, with 52 per cent of users deploying it for data warehousing, 68 per cent for business intelligence, 40 per cent for processing application and system logs, 48 per cent to build recommendation engines, 36 per cent for user-facing services, and 29 per cent for fraud detection and security.

Spark’s streaming capabilities are proving particularly popular, with 56 per cent streaming users in 2015 than in 2014, and the production use of advanced analytics for applications such as machine learning and graph processing increased from 11 per cent in 2014 to 15 per cent in 2015.

From a user perspective, both data scientists (22 per cent) and data engineers (41 per cent) are using Spark collaboratively to solve data problems, with a variety of different languages (including Scala, Python, SQL, Java and R) being employed to solve problems.

In a significant shift, the survey found that Spark is outgrowing Hadoop, with the number of standalone deployment of Spark (48 per cent) exceeding those running Spark on YARN within Hadoop (40 per cent), and the proportion of Spark users not using any Hadoop components more than doubling in 2015. Fifty-one per cent of respondents said they now run Spark on a public cloud.

“The continued growth of Spark has been highly encouraging, as companies are going into production to obtain real business value, and they are doing so in a wide range of environments beyond Hadoop clusters,” said Matei Zaharia, creator of Apache Spark and CTO of Databricks.

“Databricks and our partners are 100 per cent committed to the long-term growth of Spark and we’ll continue to make improvements based on this survey data and our ongoing community feedback, to make the most complete big data analytics toolkit accessible to all businesses.”