Skip to main content


Upgrading nodejs in ubuntu 14.04

My machine has 5.x installed and had lot of trouble updating it to 8.x. Below are the steps I followed to upgrade nodejs from 5.x to 8.x #add the new source list sudo apt-key adv --keyserver --recv 68576280  sudo apt-add-repository "deb $(lsb_release -sc) main" sudo apt-get update #Remove the previous installation sudo apt-get purge nodejs npm  #Verify if proper version is going to be installed apt-cache policy <package> #Install new version sudo apt-get install -y nodejs
Recent posts

Spark & Open Street Data | How to read PBF data

Recently I started playing with open street data in spark. Here are the steps to load the data into spark 1. Convert the PBF data into Parquet format. 2.  Read the data in Spark spark.sqlContext.setConf("spark.sql.parquet.binaryAsString","true") This ensures, tags are properly read as string instead of binary objects

Common Issues with Solr Data Import Handler (DIH)

1. Could not load driver: org.postgresql.Driver org.apache.solr.common.SolrException; Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Could not load driver: org.postgresql.Driver Solution : Put rmdbs driver, in my case postgres driver in $SOLR_HOME/dist folder and point it in solrconfig.xml <lib dir="${solr.install.dir:../../../..}/dist/" regex="postgresql.*\.jar" />  2. ERROR StreamingSolrClients org.apache.solr.common.SolrException: Bad Request request: http://host:7574/solr/collection_shard2_replica1/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fhost%3A8983%2Fsolr%2Fcollection_shard1_replica2%2F&wt=javabin&version=2 at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$ at java.util.concurrent.ThreadPoolExecutor.runWorker( at java.util.concurr...

solr 5.1 DIH

Recently I had to use Data Import Handler to index data from postgres database. Unfortunately I had to encounter few issues, I'm blogging the steps and the issues faced. Setting up DataImportHandler Edit your solrconfig.xml to add the request handler <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst name="defaults"> <str name="config">data-config.xml</str> </lst> </requestHandler> Create a data-config.xml file as follows and save it to the conf dir <dataConfig> <dataSource type="JdbcDataSource" driver="org.postgresql.Driver" url="jdbc:postgresql://host:port/dbname" user="username" password="password"/> <document> <entity name="col_id" query="select * from report_ks"> ...

rWordCloud - An htmlwidget interface for D3 word cloud

With htmlwidget, its become easy to bind d3 scripts to R. rWordCloud is one such package. To install rWordCloud require(devtools) install_github('adymimos/rWordCloud') Two main functions in rWordClouds are d3TextCloud - this function takes strings as input, and performs word count. Before word count, it does stemming, and stop word removal. content <- c('R is a programming language and software environment for statistical computing and graphics open source','The R language is widely used among statisticians and data miners for developing statistical software and data analysis','Polls, surveys of data miners,and studies of scholarly literature databases show that R popularity has increased substantially in recent years','languages programming study open source, analysis') label <- c('a1','a2','a3','a4') d3TextCloud(content = content, label = label ) d3Cloud - Function accepts word and it...

spark java.lang.IllegalArgumentException: user

Today I faced an error while trying to use Spark shell. This is how I resolved. scala> val file = sc.textFile("hdfs://...") 14/10/21 13:34:23 INFO MemoryStore: ensureFreeSpace(217085) called with curMem=0, maxMem=309225062 14/10/21 13:34:23 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 212.0 KB, free 294.7 MB) file: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at <console>:12 scala> file.count() java.lang.IllegalArgumentException: user     at     at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(     at org.apache.hadoop.hdfs.NameNodeProxies.createProxy( This error can be fixed by giving proper hostname and port sc.textFile("hdfs://{hostname}:8020/{filepath}...") scala> file.count() 14/10/21 13:44:23 IN...

/lib/spark/bin/ No such file or directory in CDH-5.2

In the latest version of CDH5.2, while trying to run spark-shell will encounter this error. user@spark-master:~# spark-shell /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/bin/../lib/spark/bin/spark-shell: line 44: /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/spark/bin/ No such file or directory Solution: can be downloaded from github. I'm not sure this is the perfect solution, but things seems to be working after putting the file 1. get the file from 2. copy to  /opt/cloudera/parcels/CDH/lib/spark/bin/  user@spark-master:/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/spark/bin# spark-shell SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/jars/spark-assembly-1.1.0-cdh5.2.0-hadoop2.5.0-cdh5.2.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/clo...