CrazyAirhead

疯狂的傻瓜,傻瓜也疯狂——傻方能执著,疯狂才专注!

0%

batch-indexing - 批量索引

In addition to indexing HBase updates in near-real-time, it’s also possible to run a batch indexing job that will index data already contained within an HBase table. The batch indexing tool operates with the same indexing semantics as the near-real-time indexer, and it is run as a MapReduce job.

In its most basic running mode, the batch indexing can be run as multiple indexers that run over HBase regions and write data directly to Solr, as follows:

1
2
3
4
hadoop jar hbase-indexer-mr-*-job.jar \ 
--hbase-indexer-zk zk01 \
--hbase-indexer-name docindexer
--reducers 0

It is also possible to generate offline index shards in HDFS by supplying -1 or a positive integer for the –reducers argument, as shown below:

1
2
3
4
5
hadoop jar hbase-indexer-mr-*-job.jar \ 
--hbase-indexer-zk zk01 \
--hbase-indexer-name docindexer \
--reducers -1 \
--output-dir hdfs://namenode/solroutput

Finally, indexing shards can be generated offline and then merged into a running SolrCloud cluster using the –go-live flag as follows:

1
2
3
4
hadoop jar hbase-indexer-mr-*-job.jar \
--hbase-indexer-zk zk01 \
--hbase-indexer-name docindexer \
--go-live

欢迎关注我的其它发布渠道