The Complete Magazine on Open Source

Apache Spark 2.2 brings improved support of R language

Apache Spark 2.2

Apache Spark has been upgraded to v2.2. The latest version has brought multipurpose in-memory data processing framework and enhanced the existing R language support.

With the latest tweaks, Apache Spark is now able to support R language for a total of 10 distributed algorithms. This makes R support within Spark just in line with Java code. Furthermore, the platform leverages the algorithm to offer a wider list of machine learning abilities.

In addition to the extended R support, the version 2.2 has introduced Structured Streaming. The feature can help Spark in processing native data streams to batch-based data handling metaphors. Apache is likely to focus sharply on the feature and enhance its areas in coming versions.

The new version also enables Apache Spark to work as a source or a sink for Apache Kafka source. The server platform manages to offer lower latency for a Kafka connection. While Kafka is usually paired with Apache Storm for stream processing, Spark can offer more functionalities with less APIs for developers.

The triggering mechanism that is an integral part of Apache Spark has also been tweaked to run and quit streaming jobs. Similarly, Spark has been designed with more efficient execution model as compared to running Spark batch jobs on intervals.