SparkR : Where R meets Spark

R is one of the widely used Statistical tools. In recent years R is gaining lot of popularity and it is forecasted to beat SPSS and SAS in near future.

R is

Open Source, used and supported by millions of people
Thousands of packages, most of which are used/developed by people in the statistics. ( I feel most of the products are developed by people who truly do not understand the end use case)

Cons

It runs on single core of processor,
Requires all data to be stored in RAM
Does not handle big data processing

Here is where Spark comes in, Spark is designed for high speed big data processing or real time big data processing.

High speed processing, 100X times faster than hadoop.
Ease of use
shark,MLib,spark streaming makes spark really powerful.

SparkR is a lightweight interface for spark through R. This aid the big data processing in R and results can be used in further statistics.

How to install SparkR

1. SparkR requires packages

require(devtools)

require(rJava)

2. Install SparkR - this is a fairly easy step,

install_github("amplab-extras/SparkR-pkg", subdir="pkg")

More Information on github or Amplab SparkR

Beyond Big Data

Search This Blog

SparkR : Where R meets Spark

Comments

Post a Comment

Popular posts from this blog

Upgrading nodejs in ubuntu 14.04

Common issues on Shark with CDH5-beta2

Common Issues with Solr Data Import Handler (DIH)