Today I faced an error while trying to use Spark shell. This is how I resolved.
This error can be fixed by giving proper hostname and port
scala> val file = sc.textFile("hdfs://...")
14/10/21 13:34:23 INFO MemoryStore: ensureFreeSpace(217085) called with curMem=0, maxMem=309225062
14/10/21 13:34:23 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 212.0 KB, free 294.7 MB)
file: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at <console>:12
scala> file.count()
java.lang.IllegalArgumentException: java.net.UnknownHostException: user
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:237)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:141)
This error can be fixed by giving proper hostname and port
sc.textFile("hdfs://{hostname}:8020/{filepath}...")
scala> file.count()
14/10/21 13:44:23 INFO FileInputFormat: Total input paths to process : 1
14/10/21 13:44:23 INFO SparkContext: Starting job: count at <console>:15
14/10/21 13:44:23 INFO DAGScheduler: Got job 0 (count at <console>:15) with 2 output partitions (allowLocal=false)
14/10/21 13:44:23 INFO DAGScheduler: Final stage: Stage 0(count at <console>:15)
14/10/21 13:44:23 INFO DAGScheduler: Parents of final stage: List()
14/10/21 13:44:23 INFO DAGScheduler: Missing parents: List()
14/10/21 13:44:23 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[5] at textFile at <console>:12), which has no missing parents
14/10/21 13:44:23 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[5] at textFile at <console>:12)
14/10/21 13:44:23 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
14/10/21 13:44:23 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on executor 3: cdh.node2 (NODE_LOCAL)
14/10/21 13:44:23 INFO TaskSetManager: Serialized task 0.0:0 as 1795 bytes in 3 ms
14/10/21 13:44:23 INFO TaskSetManager: Starting task 0.0:1 as TID 1 on executor 1: cdh.node7 (NODE_LOCAL)
14/10/21 13:44:23 INFO TaskSetManager: Serialized task 0.0:1 as 1795 bytes in 1 ms
14/10/21 13:44:26 INFO TaskSetManager: Finished TID 1 in 2337 ms on cdh.node7 (progress: 1/2)
14/10/21 13:44:26 INFO DAGScheduler: Completed ResultTask(0, 1)
14/10/21 13:44:26 INFO DAGScheduler: Completed ResultTask(0, 0)
14/10/21 13:44:26 INFO TaskSetManager: Finished TID 0 in 2363 ms on cdh.node2 (progress: 2/2)
14/10/21 13:44:26 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
14/10/21 13:44:26 INFO DAGScheduler: Stage 0 (count at <console>:15) finished in 2.369 s
14/10/21 13:44:26 INFO SparkContext: Job finished: count at <console>:15, took 2.455767724 s
res2: Long = 39
scala> val textFile = sc.textFile("README.md")
ReplyDeletejava.lang.IllegalArgumentException
at org.apache.spark.io.SnappyCompressionCodec.(CompressionCodec.scala:152)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:68)
at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:60)
at org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$setConf(TorrentBroadcast.scala:73)
at org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:79)
at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1070)
at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:763)
at org.apache.spark.SparkContext.textFile(SparkContext.scala:591)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:21)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:26)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:28)
at $iwC$$iwC$$iwC$$iwC$$iwC.(:30)
at $iwC$$iwC$$iwC$$iwC.(:32)
at $iwC$$iwC$$iwC.(:34)
at $iwC$$iwC.(:36)
at $iwC.(:38)
at (:40)
at .(:44)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:856)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:901)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:813)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:656)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:664)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:669)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:996