Research: Apache Spark continues to attract new users

The open source project increased the number of code contributors by 67 percent in 12 months

Apache Spark, an open source distributed computing framework used for advanced analytics and machine learning, continues to grow in popularity. According to a survey by Databricks, the number of Spark deployments in public cloud environments has increased by 10 percent since 2015, while the number of on-premises deployments has decreased.

Databricks collected 1,600 responses from 900 organizations and found that the framework has seen massive growth among data frame users following the introduction of relevant APIs last year, but is also attracting new, less technical audience, including business analysts who develop Spark-based applications for Windows.

Shining bright

Spark is a cluster computing engine that relies on in-memory processing. It was born at the AMPLab of UC Berkeley in 2009, as a PhD thesis by computer scientist Matei Zaharia. Zaharia also co-created Apache Mesos cluster manager (commercialized by Mesosphere), and played an important part in early development of Apache Hadoop. He now serves as the Chief Technologist at Databricks.

In certain applications, Spark performs much faster than the popular MapReduce framework and its derivatives like Apache Hadoop. It is especially suitable for projects that involve thousands of individual servers, and has been applied in fields like machine learning and cognitive computing.

According to the latest annual Apache Spark Survey, there’s been a 67 percent increase in the number of code contributors to the project in the past 12 months.

The banking sector saw the highest rate of Spark adoption since 2015, as did health, biotech and pharmacology industries.

Respondents showed an increased move towards building real-time applications using the Spark Streaming framework, and over half said streaming functionality was vital to develop modern apps.

The largest yearly increase in popularity was among data frame users (153 percent), followed by Spark SQL users (67 percent) and Streaming users (57 percent).

Asked what Apache Spark components developers use to build complex solutions for their use cases, 74 percent of respondents said they use two or more components (e.g. Spark SQL, MLib, YARN, Mesos) to build different types of products.

Research: Apache Spark continues to attract new users

Shining bright

Tags

2023 Data Center Market Trends: Hong Kong Asia's Connectivity Hub

Emerging Energy Storage Technologies

Success story: Kao Data and Cadence

Deliver high quality hyperscale projects