Skip to main content

Spark: Next big thing in Big Data?

Spark indeed has gained a lot of popularity. Its mailing list one of the most active in all of the big data projects(It shows lot of people are banging it). As one of the big data enthusiasts, I'm going to dig in ;-)



   Watch out this space for benchmarking + optimization factors in spark. Lets play spark

Comments

Popular posts from this blog

Common issues on Shark with CDH5-beta2

Issues on Shark with CDH5-beta2 1. IncompatibleClassChangeError: Implementing class Exception in thread "main" java.lang.IncompatibleClassChangeError: Implementing class at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.ClassLoader.defineC...

Upgrading nodejs in ubuntu 14.04

My machine has 5.x installed and had lot of trouble updating it to 8.x. Below are the steps I followed to upgrade nodejs from 5.x to 8.x #add the new source list sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 68576280  sudo apt-add-repository "deb https://deb.nodesource.com/node_8.x $(lsb_release -sc) main" sudo apt-get update #Remove the previous installation sudo apt-get purge nodejs npm  #Verify if proper version is going to be installed apt-cache policy <package> #Install new version sudo apt-get install -y nodejs

spark java.lang.IllegalArgumentException: java.net.UnknownHostException: user

Today I faced an error while trying to use Spark shell. This is how I resolved. scala> val file = sc.textFile("hdfs://...") 14/10/21 13:34:23 INFO MemoryStore: ensureFreeSpace(217085) called with curMem=0, maxMem=309225062 14/10/21 13:34:23 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 212.0 KB, free 294.7 MB) file: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at <console>:12 scala> file.count() java.lang.IllegalArgumentException: java.net.UnknownHostException: user     at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)     at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:237)     at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:141) This error can be fixed by giving proper hostname and port sc.textFile("hdfs://{hostname}:8020/{filepath}...") scala> file.count() 14/10/21 13:44:23 IN...