测试环境因时间同步的问题导致Hbase集群都挂了,经过一番修复Hbase集群以及可以跑起来,修复过程大约如下:
hdfs fsck / -delete 删除missing的block hadoop dfsadmin -report 查看hadoop集群状态 zkCli.sh 登录zk rmr /hbase 删除/hbase节点 关闭所有的 regionserver 执行hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair 修复源数据 修复完成后启动master 启动regionserver
此时Hbase集群以及可以启动,并且Hbase Shell也是正常的,但是一旦涉及到scan数据那就出错了,如下:
hbase(main):010:0> scan 'ApplicationIndex' ROW COLUMN+CELL ERROR: No server address listed in hbase:meta for region ApplicationIndex,,1517480748304.31070658ad5552726d4b24c47f47a727. containing row
看报错信息应该是表存在,但是表没有分配region,没有server信息。可通过如下命令查看是否是这样的:
hbase(main):011:0> scan 'hbase:meta' , {LIMIT=>10,FILTER=>"PrefixFilter('ApplicationIndex')"} ROW COLUMN+CELL ApplicationIndex,,1517480748304.31070658ad5552726d4 column=info:regioninfo, timestamp=1523277765889, value={ENCODED => 31070658ad5552726d4b24c47f47a727, NAME => 'ApplicationIndex,,1517480748304.31070658ad5 b24c47f47a727. 552726d4b24c47f47a727.', STARTKEY => '', ENDKEY => ''} 1 row(s) in 0.0360 seconds
此时发现这里只有regioninfo,并没有server的记录:
hbase(main):013:0> scan 'hbase:meta' , {LIMIT=>10,FILTER=>"PrefixFilter('hbase')"} ROW COLUMN+CELL hbase:namespace,,1517480288768.ab4a22852177a867ae63 column=info:regioninfo, timestamp=1523278150687, value={ENCODED => ab4a22852177a867ae631aa109fe2b81, NAME => 'hbase:namespace,,1517480288768.ab4a22852177 1aa109fe2b81. a867ae631aa109fe2b81.', STARTKEY => '', ENDKEY => ''} hbase:namespace,,1517480288768.ab4a22852177a867ae63 column=info:seqnumDuringOpen, timestamp=1523278150687, value=\x00\x00\x00\x00\x00\x00\x00\x05 1aa109fe2b81. hbase:namespace,,1517480288768.ab4a22852177a867ae63 column=info:server, timestamp=1523278150687, value=OP-APM-08:16020 1aa109fe2b81. hbase:namespace,,1517480288768.ab4a22852177a867ae63 column=info:serverstartcode, timestamp=1523278150687, value=1523278113491 1aa109fe2b81. hbase:namespace,,1523276566308.73f980027b0b2d3badb4 column=info:regioninfo, timestamp=1523278150793, value={ENCODED => 73f980027b0b2d3badb4627dc1fc5c67, NAME => 'hbase:namespace,,1523276566308.73f980027b0b 627dc1fc5c67. 2d3badb4627dc1fc5c67.', STARTKEY => '', ENDKEY => ''} hbase:namespace,,1523276566308.73f980027b0b2d3badb4 column=info:seqnumDuringOpen, timestamp=1523278150793, value=\x00\x00\x00\x00\x00\x00\x00\x0C 627dc1fc5c67. hbase:namespace,,1523276566308.73f980027b0b2d3badb4 column=info:server, timestamp=1523278150793, value=OP-APM-06:16020 627dc1fc5c67. hbase:namespace,,1523276566308.73f980027b0b2d3badb4 column=info:serverstartcode, timestamp=1523278150793, value=1523277988017 627dc1fc5c67. 2 row(s) in 0.0160 seconds
这里发现info:server这个数据没有了,修复办法很简单:
先关闭表
hbase(main):014:0> disable 'ApplicationIndex'
然后再开启表
hbase(main):015:0> enable 'ApplicationIndex'
此时,会自动分配一个server,检测结果:
hbase(main):016:0> scan 'hbase:meta' , {LIMIT=>10,FILTER=>"PrefixFilter('ApplicationIndex')"} ROW COLUMN+CELL ApplicationIndex,,1517480748304.31070658ad5552726d4 column=info:regioninfo, timestamp=1523325348714, value={ENCODED => 31070658ad5552726d4b24c47f47a727, NAME => 'ApplicationIndex,,1517480748304.31070658ad5 b24c47f47a727. 552726d4b24c47f47a727.', STARTKEY => '', ENDKEY => ''} ApplicationIndex,,1517480748304.31070658ad5552726d4 column=info:seqnumDuringOpen, timestamp=1523325348714, value=\x00\x00\x00\x00\x00\x00\x03\xB8 b24c47f47a727. ApplicationIndex,,1517480748304.31070658ad5552726d4 column=info:server, timestamp=1523325348714, value=OP-APM-06:16020 b24c47f47a727. ApplicationIndex,,1517480748304.31070658ad5552726d4 column=info:serverstartcode, timestamp=1523325348714, value=1523277988017 b24c47f47a727. 1 row(s) in 0.0190 seconds
此时该表已经可以查询数据了。