Skip to main content

solr 5.1 DIH

Recently I had to use Data Import Handler to index data from postgres database. Unfortunately I had to encounter few issues, I'm blogging the steps and the issues faced.


Setting up DataImportHandler
  • Edit your solrconfig.xml to add the request handler

    <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
    <lst name="defaults">
      <str name="config">data-config.xml</str>
    </lst>
    </requestHandler>
  • Create a data-config.xml file as follows and save it to the conf dir

    <dataConfig>
      <dataSource type="JdbcDataSource" 
                  driver="org.postgresql.Driver"
                  url="jdbc:postgresql://host:port/dbname" 
                  user="username" 
                  password="password"/>
      <document>
        <entity name="col_id" 
                query="select * from report_ks">
        </entity>
      </document>
    </dataConfig>
    •  You need to add table fields in schema.xml
  • Put rmdbs driver, in my case postgres driver in $SOLR_HOME/dist folder and point it in solrconfig.xml

    <lib dir="${solr.install.dir:../../../..}/dist/" regex="postgresql.*\.jar" />
  •  add dataImportHandler jars in solrconfig.xml

    <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-dataimporthandler-.*\.jar" />

Comments

Popular posts from this blog

Upgrading nodejs in ubuntu 14.04

My machine has 5.x installed and had lot of trouble updating it to 8.x. Below are the steps I followed to upgrade nodejs from 5.x to 8.x #add the new source list sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 68576280  sudo apt-add-repository "deb https://deb.nodesource.com/node_8.x $(lsb_release -sc) main" sudo apt-get update #Remove the previous installation sudo apt-get purge nodejs npm  #Verify if proper version is going to be installed apt-cache policy <package> #Install new version sudo apt-get install -y nodejs

org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.

Recently installed the latest cloudera hadoop. First issue I faced while working with hive. Diagnostic Messages for this Task: Container launch failed for container_1406173012885_0009_01_000021 : org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container . This token is expired. current time is 1406254943000 found 1406254938244     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)     at java.lang.reflect.Constructor.newInstance(Constructor.java:526)     at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)     at org.apache.hadoop.yarn.api.records.impl.pb.Serial...

spark java.lang.IllegalArgumentException: java.net.UnknownHostException: user

Today I faced an error while trying to use Spark shell. This is how I resolved. scala> val file = sc.textFile("hdfs://...") 14/10/21 13:34:23 INFO MemoryStore: ensureFreeSpace(217085) called with curMem=0, maxMem=309225062 14/10/21 13:34:23 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 212.0 KB, free 294.7 MB) file: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at <console>:12 scala> file.count() java.lang.IllegalArgumentException: java.net.UnknownHostException: user     at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)     at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:237)     at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:141) This error can be fixed by giving proper hostname and port sc.textFile("hdfs://{hostname}:8020/{filepath}...") scala> file.count() 14/10/21 13:44:23 IN...