Skip to main content

spark java.lang.IllegalArgumentException: java.net.UnknownHostException: user

Today I faced an error while trying to use Spark shell. This is how I resolved.
scala> val file = sc.textFile("hdfs://...")
14/10/21 13:34:23 INFO MemoryStore: ensureFreeSpace(217085) called with curMem=0, maxMem=309225062
14/10/21 13:34:23 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 212.0 KB, free 294.7 MB)
file: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at <console>:12

scala> file.count()
java.lang.IllegalArgumentException: java.net.UnknownHostException: user
    at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
    at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:237)
    at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:141)

This error can be fixed by giving proper hostname and port

sc.textFile("hdfs://{hostname}:8020/{filepath}...")
scala> file.count()
14/10/21 13:44:23 INFO FileInputFormat: Total input paths to process : 1
14/10/21 13:44:23 INFO SparkContext: Starting job: count at <console>:15
14/10/21 13:44:23 INFO DAGScheduler: Got job 0 (count at <console>:15) with 2 output partitions (allowLocal=false)
14/10/21 13:44:23 INFO DAGScheduler: Final stage: Stage 0(count at <console>:15)
14/10/21 13:44:23 INFO DAGScheduler: Parents of final stage: List()
14/10/21 13:44:23 INFO DAGScheduler: Missing parents: List()
14/10/21 13:44:23 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[5] at textFile at <console>:12), which has no missing parents
14/10/21 13:44:23 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[5] at textFile at <console>:12)
14/10/21 13:44:23 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
14/10/21 13:44:23 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on executor 3: cdh.node2 (NODE_LOCAL)
14/10/21 13:44:23 INFO TaskSetManager: Serialized task 0.0:0 as 1795 bytes in 3 ms
14/10/21 13:44:23 INFO TaskSetManager: Starting task 0.0:1 as TID 1 on executor 1: cdh.node7 (NODE_LOCAL)
14/10/21 13:44:23 INFO TaskSetManager: Serialized task 0.0:1 as 1795 bytes in 1 ms
14/10/21 13:44:26 INFO TaskSetManager: Finished TID 1 in 2337 ms on cdh.node7 (progress: 1/2)
14/10/21 13:44:26 INFO DAGScheduler: Completed ResultTask(0, 1)
14/10/21 13:44:26 INFO DAGScheduler: Completed ResultTask(0, 0)
14/10/21 13:44:26 INFO TaskSetManager: Finished TID 0 in 2363 ms on cdh.node2 (progress: 2/2)
14/10/21 13:44:26 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
14/10/21 13:44:26 INFO DAGScheduler: Stage 0 (count at <console>:15) finished in 2.369 s
14/10/21 13:44:26 INFO SparkContext: Job finished: count at <console>:15, took 2.455767724 s
res2: Long = 39


Comments

  1. scala> val textFile = sc.textFile("README.md")
    java.lang.IllegalArgumentException
    at org.apache.spark.io.SnappyCompressionCodec.(CompressionCodec.scala:152)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
    at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:68)
    at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:60)
    at org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$setConf(TorrentBroadcast.scala:73)
    at org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:79)
    at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
    at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
    at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1070)
    at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:763)
    at org.apache.spark.SparkContext.textFile(SparkContext.scala:591)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:21)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:26)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:28)
    at $iwC$$iwC$$iwC$$iwC$$iwC.(:30)
    at $iwC$$iwC$$iwC$$iwC.(:32)
    at $iwC$$iwC$$iwC.(:34)
    at $iwC$$iwC.(:36)
    at $iwC.(:38)
    at (:40)
    at .(:44)
    at .()
    at .(:7)
    at .()
    at $print()
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
    at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
    at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
    at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:856)
    at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:901)
    at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:813)
    at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:656)
    at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:664)
    at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:669)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:996

    ReplyDelete

Post a Comment

Popular posts from this blog

Common issues on Shark with CDH5-beta2

Issues on Shark with CDH5-beta2 1. IncompatibleClassChangeError: Implementing class Exception in thread "main" java.lang.IncompatibleClassChangeError: Implementing class at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.ClassLoader.defineC

Common Issues with Solr Data Import Handler (DIH)

1. Could not load driver: org.postgresql.Driver org.apache.solr.common.SolrException; Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Could not load driver: org.postgresql.Driver Solution : Put rmdbs driver, in my case postgres driver in $SOLR_HOME/dist folder and point it in solrconfig.xml <lib dir="${solr.install.dir:../../../..}/dist/" regex="postgresql.*\.jar" />  2. ERROR StreamingSolrClients org.apache.solr.common.SolrException: Bad Request request: http://host:7574/solr/collection_shard2_replica1/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fhost%3A8983%2Fsolr%2Fcollection_shard1_replica2%2F&wt=javabin&version=2 at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurr