R is one of the widely used Statistical tools. In recent years R is gaining lot of popularity and it is forecasted to beat SPSS and SAS in near future.
R is
2. Install SparkR - this is a fairly easy step,
R is
- Open Source, used and supported by millions of people
- Thousands of packages, most of which are used/developed by people in the statistics. ( I feel most of the products are developed by people who truly do not understand the end use case)
Cons
- It runs on single core of processor,
- Requires all data to be stored in RAM
- Does not handle big data processing
Here is where Spark comes in, Spark is designed for high speed big data processing or real time big data processing.
- High speed processing, 100X times faster than hadoop.
- Ease of use
- shark,MLib,spark streaming makes spark really powerful.
SparkR is a lightweight interface for spark through R. This aid the big data processing in R and results can be used in further statistics.
How to install SparkR
1. SparkR requires packages
require(devtools)require(rJava)
2. Install SparkR - this is a fairly easy step,
install_github("amplab-extras/SparkR-pkg", subdir="pkg")More Information on github or Amplab SparkR
Comments
Post a Comment