Recently I started playing with open street data in spark. Here are the steps to load the data into spark 1. Convert the PBF data into Parquet format. https://github.com/adrianulbona/osm-parquetizer 2. Read the data in Spark spark.sqlContext.setConf("spark.sql.parquet.binaryAsString","true") This ensures, tags are properly read as string instead of binary objects