IBM has announced a number of initiatives aimed at advancing the Apache Spark cluster computing framework, saying it has the potential to become “the most significant open source project of the next decade.”

IBM plans to embed Spark into its analytics and commerce platforms, and to offer Spark in the cloud through Bluemix development service.

The company will open a dedicated Spark Technology Center in San Francisco, and donate its SystemML machine learning technology to the open source community.

“We believe strongly in the power of open source as the basis to build value for clients, and are fully committed to Spark as a foundational technology platform for accelerating innovation and driving analytics across every business in a fundamental way,” said Beth Smith, general manager of the IBM Analytics Platform business.

Research muscle

IBM Watson
IBM Watson – IBM

IBM is a founding member of the AMPLab at UC Berkeley, where Spark was invented in 2009. The software was released to the open source community in 2010.

In certain applications, Spark performs much faster than the popular MapReduce framework and its derivatives like Apache Hadoop. It is especially suitable for machine learning algorithms, an area of special interest for IBM as it continues development of Watson cognitive computing platform.

IBM says it plans to leverage Spark as part of its Watson Health Cloud – a recently announced service which will analyze clinical, research and social data from a diverse range of health sources, and make the results available to both patients and medical professionals.

In total, IBM will put more than 3,500 IBM researchers and developers to work on Spark-related projects at more than a dozen labs worldwide, including those at the brand new Spark Technology Center in the Silicon Valley.

The company also plans to educate more than one million data scientists and data engineers on Spark through extensive partnerships with organizations like the DataCamp and Big Data University.

Earlier this month, IBM launched SuperVessel – a free, cloud-based application development service for Power processors and the OpenPower ecosystem.