当前位置: 代码迷 >> 综合 >> 关于Flume Sink Hdfs,产生的文件无法Cat或者Copy等的问题
  详细解决方案

关于Flume Sink Hdfs,产生的文件无法Cat或者Copy等的问题

热度:76   发布时间:2024-01-22 02:38:22.0

关于Flume Sink Hdfs时,产生的文件无法Cat或者Copy等的问题

异常:

将落地到hdfs的文件迁移到腾讯云Cos存储时采取hadoop distcp 的方式迁移
报错

Error: java.io.IOException: File copy failed: hdfs://mycluster/user/hive/warehouse/ods/up_event/dt=2021-06-03/event-node2901.16d22649601925.lzo.tmp --> cosn://bd-backup-130120889962/miot/up_event/dt=2021-06-03/event-node2901.1622649601925.lzo.tmpat org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:259)at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:217)at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:48)at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
Caused by: java.io.IOException: Couldn't run retriable-command: Copying hdfs://mycluster/user/hive/warehouse/ods/up_event/dt=2021-06-03/event-node2901.1622649601925.lzo.tmp to cosn://bd-backup-1300889962/miot/up_event/dt=2021-06-03/event-node2901.1622649601925.lzo.tmpat org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:256)... 10 more
Caused by: org.apache.hadoop.tools.mapred.RetriableFileCopyCommand$CopyReadException: org.apache.hadoop.hdfs.CannotObtainBlockLengthException: Cannot obtain block length for LocatedBlock{
    BP-1690677239-172.18.4.7-1599186972756:blk_1075224544_1514131; getBlockSize()=237; corrupt=false; offset=134217728; locs=[DatanodeInfoWithStorage[172.18.4.17:50010,DS-9169c8f0-df0b-41e3-81d3-ba0974b42e91,DISK], DatanodeInfoWithStorage[172.18.4.3:50010,DS-af458d19-b4ef-416d-850e-aee679d073b2,DISK]]}at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.getInputStream(RetriableFileCopyCommand.java:348)at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:277)at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToFile(RetriableFileCopyCommand.java:193)at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:123)at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:99)at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)... 11 more
Caused by: org.apache.hadoop.hdfs.CannotObtainBlockLengthException: Cannot obtain block length for LocatedBlock{
    BP-1690677239-172.18.4.7-1599186972756:blk_1075224544_1514131; getBlockSize()=237; corrupt=false; offset=134217728; locs=[DatanodeInfoWithStorage[172.18.4.17:50010,DS-9169c8f0-df0b-41e3-81d3-ba0974b42e91,DISK], DatanodeInfoWithStorage[172.18.4.3:50010,DS-af458d19-b4ef-416d-850e-aee679d073b2,DISK]]}

本地hdfs copy时报错

cp: Cannot obtain block length for LocatedBlock{BP-1690677239-172.18.4.7-15991862972756:blk_10752245434_1514131; getBlockSize()=237; corrupt=false; offset=134217728; locs=[DatanodeInfoWithStorage[172.18.4.17:50010,DS-9169c8f0-df0b-41e3-81d3-ba0974b42e91,DISK], DatanodeInfoWithStorage[172.18.4.3:50010,DS-af458d19-b4ef-416d-850e-aee679d073b2,DISK]]} [hdfs@aiot-bigdata-6 ~]$ hdfs debug recoverLease /user/hive/warehouse/ods/up_event/dt=2021-06-03/event-node2901.162272161985345.lzo.tmp -retries 5 You must supply a -path argument to recoverLease.

**

排查过程

**

hdfs fsck 文件路径

发现文件并没有出现损坏

hdfs fsck 文件路径 –openforwrite

发现该hdfs文件还是处于openForwrite的状态,应该是文件处于写入状态,但是这个文件在文件夹里以及有了一个非.tmp结尾的文件,为了保险起见不删除该.tmp文件,而是先修复该文件的状态,应该是该文件的没有正常的关闭,该.tmp文件还保持在openForwrite的状态,所以无法上报具体的block信息给DatabNode
推测原因:
hdfs文件系统在该.tmp文件写入时出现异常,导致Flume写入该文件时租约未释放,所以采用

hdfs debug recoverLease -path 文件路径 -retries 重试次数

加上重试次数是由于执行一次是可能失败的

  相关解决方案