Cookie policy: This site uses cookies (small files stored on your computer) to simplify and improve your experience of this website. Cookies are small text files stored on the device you are using to access this website. For more information on how we use and manage cookies please take a look at our privacy and cookie policies. Some parts of the site may not work properly if you choose not to accept cookies.

sections

Research: Apache Spark continues to attract new users

  • Print
  • Share
  • Comment
  • Save

The open source project increased the number of code contributors by 67 percent in 12 months

Apache Spark, an open source distributed computing framework used for advanced analytics and machine learning, continues to grow in popularity. According to a survey by Databricks, the number of Spark deployments in public cloud environments has increased by 10 percent since 2015, while the number of on-premises deployments has decreased.

Databricks collected 1,600 responses from 900 organizations and found that the framework has seen massive growth among data frame users following the introduction of relevant APIs last year, but is also attracting new, less technical audience, including business analysts who develop Spark-based applications for Windows.

Annual Apache Spark Survey 2016

Source: Databricks

Annual Apache Spark Survey 2016

Shining bright

Spark is a cluster computing engine that relies on in-memory processing. It was born at the AMPLab of UC Berkeley in 2009, as a PhD thesis by computer scientist Matei Zaharia. Zaharia also co-created Apache Mesos cluster manager (commercialized by Mesosphere), and played an important part in early development of Apache Hadoop. He now serves as the Chief Technologist at Databricks.

In certain applications, Spark performs much faster than the popular MapReduce framework and its derivatives like Apache Hadoop. It is especially suitable for projects that involve thousands of individual servers, and has been applied in fields like machine learning and cognitive computing.

According to the latest annual Apache Spark Survey, there’s been a 67 percent increase in the number of code contributors to the project in the past 12 months.

The banking sector saw the highest rate of Spark adoption since 2015, as did health, biotech and pharmacology industries.

Respondents showed an increased move towards building real-time applications using the Spark Streaming framework, and over half said streaming functionality was vital to develop modern apps.

The largest yearly increase in popularity was among data frame users (153 percent), followed by Spark SQL users (67 percent) and Streaming users (57 percent).

Asked what Apache Spark components developers use to build complex solutions for their use cases, 74 percent of respondents said they use two or more components (e.g. Spark SQL, MLib, YARN, Mesos) to build different types of products.

Have your say

Please view our terms and conditions before submitting your comment.

required
required
required
required
  • Print
  • Share
  • Comment
  • Save

Webinars

More link