hive:MoveTask
2014-02-14 14:58 680人阅读 评论(1) 收藏 举报
hive
运行SQL时出了个错:
SQL: INSERT OVERWRITE DIRECTORY 'result/testConsole' select count(1) from nutable;
错误信息:
Failed with exception Unable to rename: hdfs://indigo:8020/tmp/hive-root/hive_2013-08-22_17-35-05_006_3570546713731431770/-ext-10000 to: result/testConsoleFAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
另一个SQL的错误,这是日志中的:
1042 2013-08-22 17:08:54,411 INFO exec.Task (SessionState.java:printInfo(412)) - Moving data to: result/userName831810250/54cbcd2980a64fe78cf54abb3116d2dc from hdfs://indigo:8020/tmp/hive-hive/hive_2013-08-22_17-08-40_062_3976325306495167351/-ext-10000
1043 2013-08-22 17:08:54,414 ERROR exec.Task (SessionState.java:printError(421)) - Failed with exception Unable to rename: hdfs://indigo:8020/tmp/hive-hive/hive_2013-08-22_17-08-40_062_3976325306495167351/-ext-10000 to: result/userName831810250/54cbcd2980a64fe78cf54abb3116d2dc
下面看看出现异常的地方。
执行SQL时,最后一个任务是MoveTask,它的作用是将运行SQL生成的Mapeduce任务结果文件放到SQL中指定的存储查询结果的路径中,具体方法就是重命名
下面是 org.apache.hadoop.hive.ql.exec.MoveTask 中对结果文件重命名的一段代码:
- //这个sourcePath参数就是存放Mapeduce结果文件的目录,所以它的值可能是
- //hdfs://indigo:8020/tmp/hive-root/hive_2013-08-22_18-42-03_218_2856924886757165243/-ext-10000
- if (fs.exists(sourcePath)) {
- Path deletePath = null;
- // If it multiple level of folder are there fs.rename is failing so first
- // create the targetpath.getParent() if it not exist
- if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_INSERT_INTO_MULTILEVEL_DIRS)) {
- deletePath = createTargetPath(targetPath, fs);
- }
- //这里targetPath的值就是指定的放置结果文件的目录,值可能是 result/userName154122639/4e574b5d9f894a70b074ccd3981ca0f1
- if (!fs.rename(sourcePath, targetPath)) {//上面产生的异常就是因为这里rename失败,进了if,throw了个异常
- try {
- if (deletePath != null) {
- fs.delete(deletePath, true);
- }
- } catch (IOException e) {
- LOG.info("Unable to delete the path created for facilitating rename"
- + deletePath);
- }
- throw new HiveException("Unable to rename: " + sourcePath
- + " to: " + targetPath);
- }
- }
其实之前已经检查和创建targetPath了:
- private Path createTargetPath(Path targetPath, FileSystem fs) throws IOException {
- Path deletePath = null;
- Path mkDirPath = targetPath.getParent();
- if (mkDirPath != null & !fs.exists(mkDirPath)) {
- Path actualPath = mkDirPath;
- while (actualPath != null && !fs.exists(actualPath)) {
- deletePath = actualPath;
- actualPath = actualPath.getParent();
- }
- fs.mkdirs(mkDirPath);
- }
- return deletePath;//返回新创建的最顶层的目录,万一失败用来删除用
- }
Apache出现过这个 问题,已经解决掉了
CDH 竟然加了个参数 hive.insert.into.multilevel.dirs,默认是false,意思是我还有这BUG呢哈。
当你被坑了,想打个patch时,会发现改个配置就可以了。
意思是我保留这个BUG,但你要是被坑了也不能说我有BUG,自己改配置好了。$+@*^.!"?......
目前还没发现其他地方用到了这个参数,在这里唯一作用就是限制SQL中指定存放结果文件不存在的目录的深度不能大于1.
不过也没发现这有什么好处。
折腾半天,加个配置就可以了:
- <property>
- <name>hive.insert.into.multilevel.dirs</name>
- <value>true</value>
- </property>
后来 建立上层路径搞定了
转:http://blog.csdn.net/johnny_lee/article/details/19200357