FusionInsight HD C30L 集群Hbase扩容后,客户端无法访问扩容节点

发布时间:  2016-12-06 浏览次数:  254 下载次数:  0
问题描述

1.FusionInsight HD C30L集群,做Hbase扩容

2.扩容先没有按照指导手册中说明,先修改客户端服务器(集群之外节点)的 etc/hosts配件文件。(需要将扩容节点的 IP和服务器hostname添加进来)

3.扩容完成后,修改客户端 etc/hosts文件,将新添加的节点IP及hostname 信息添加到etc/hosts文件中

4.客户端无法正常访问新扩容的Hbase节点,抛出异常:

Caused by: java.net.UnknownHostException: unknown host: pdccsfbdpsvr340
       at org.apache.hadoop.hbase.ipc.RpcClient$Connection.<init>(RpcClient.java:385)
       at org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351)
       at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1530)



告警信息
       at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:129)
       at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:90)
       at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:282)
       at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:187)
       at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:182)
       at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:109)
       at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:741)
       at com.icbc.pcrm.hbase.TableScan.getCustHisAttrsByRecNum(Unknown Source)
       at com.icbc.pcrm.rpcserver.HbaseHandler.getCustHisAttrsByRctNum_M(Unknown Source)
       at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:606)
       at org.apache.avro.ipc.specific.SpecificResponder.respond(SpecificResponder.java:91)
       at org.apache.avro.ipc.Responder.respond(Responder.java:151)
       at org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.messageReceived(NettyServer.java:188)
       at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
       at org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:173)
       at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
       at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
       at org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:43)
       at org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:67)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException: unknown host: pdccsfbdpsvr340
       at org.apache.hadoop.hbase.ipc.RpcClient$Connection.<init>(RpcClient.java:385)
       at org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351)
       at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1530)
       at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
       at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
       at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
       at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:29900)
       at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:308)
       at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:164)
       at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:59)
       at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
       ... 23 more
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=35, exceptions:
Thu Oct 20 11:31:37 CST 2016, org.apache.hadoop.hbase.client.RpcRetryingCaller@40bb70d, java.net.UnknownHostException: unknown host: pdccsfbdpsvr333
Thu Oct 20 11:31:37 CST 2016, org.apache.hadoop.hbase.client.RpcRetryingCaller@40bb70d, java.net.UnknownHostException: unknown host: pdccsfbdpsvr333
处理过程

1. 客户端与Hbase扩容节点之间的网络是否畅通(通过)

2. 查看链接日志,发现能够发现 扩容节点,只是无法识别:

Caused by: java.net.UnknownHostException: unknown host: pdccsfbdpsvr340

3. 分析访问代码

Connection(ConnectionId remoteId, final Codec codec, final CompressionCodec compressor)
    throws IOException {
      if (remoteId.getAddress().isUnresolved()) {
        throw new UnknownHostException("unknown host: " + remoteId.getAddress().getHostName());
      }

Caused by: java.net.UnknownHostException: unknownhost: pdccsfbdpsvr340 
       at org.apache.hadoop.hbase.ipc.RpcClient$Connection.<init>(RpcClient.java:385)

4.根据错误信息,查看源代码

Caused by: java.net.UnknownHostException:unknown host: pdccsfbdpsvr340 
       at org.apache.hadoop.hbase.ipc.RpcClient$Connection.<init>(RpcClient.java:385)
 
       at org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351)
 
       at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1530)
 
       at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
 
       at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)

       at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
 
       at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:29900)
 
       at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:308)

发现在C30L 版本的源代码中没有下面红框中内容:



事实上,在这里应该抛出链接异常 ,但是因为C30L没有这部分代码,上述方法未抛出异常,因此后续会写入在stubs缓存中。

 

从而导致在不重启客户端进程情况下,后续会通过stubs获取缓存记录,从而造成一直出现UnknownHost异常打印。


5.确认是缓存了链接的问题,且链接没有刷新导致当前异常。

处理办法:重启客户端链接或者Hbase组件服务










根因

该问题已经确认为产品bug(社区bug),单号https://issues.apache.org/jira/browse/HBASE-15856

扩容前修改和扩容后修改的区别在C30版本中存在缓存相关bug。如果在扩容后修改,会缓存一个异常的连接,导致后续再修改/etc/hosts也不会进行刷新。



解决方案

重启Hbase组件服务或者Hbase客户端连接池


建议与总结
方法一:按照文档操作,先修改应用程序部署节点的hosts信息,然后进行扩容节点
方法二:升级FusionInsigntC60版本解决社区问题。

END