In addition to indexing HBase updates in near-real-time, it’s also possible to run a batch indexing job that will index data already contained within an HBase table. The batch indexing tool operates with the same indexing semantics as the near-real-time indexer, and it is run as a MapReduce job.
In its most basic running mode, the batch indexing can be run as multiple indexers that run over HBase regions and write data directly to Solr, as follows:
1 | hadoop jar hbase-indexer-mr-*-job.jar \ |
It is also possible to generate offline index shards in HDFS by supplying -1 or a positive integer for the –reducers argument, as shown below:
1 | hadoop jar hbase-indexer-mr-*-job.jar \ |
Finally, indexing shards can be generated offline and then merged into a running SolrCloud cluster using the –go-live flag as follows:
1 | hadoop jar hbase-indexer-mr-*-job.jar \ |