sqoop导数从oracle到hive中,遇到RAW等类型时,会报错:
013-09-17 19:33:12,184 ERROR org.apache.sqoop.tool.ImportTool: Encountered IOException running import job: java.io.IOException: [color=darkred]Hive does not support the SQL type for column RAW_TYPE_ID[/color] at rg.apache.sqoop.hive.TableDefWriter.getCreateTableStmt(TableDefWriter.java:195) at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:187) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:425) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502) at org.apache.sqoop.Sqoop.run(Sqoop.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229) at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
看到这个错,原以为是sqoop在导入数据时,不认识RAW类型,但是查询资料发现,RAW类型在oracle中表示的类型是 java.sql.Types.BINARY 或 java.sql.Types.VARBINARY,而这两种类型,在sqoop转java类型时都有处理,它转成了BytesWritable类型,这个类型是sqoop专门为处理byte[]类型处理的。
ConnManager中对oracle类型的转换对应关系
public String toJavaType(int sqlType) { // Mappings taken from: // http://java.sun.com/j2se/1.3/docs/guide/jdbc/getstart/mapping.html if (sqlType == Types.INTEGER) { return "Integer"; } else if (sqlType == Types.VARCHAR) { return "String"; } else if (sqlType == Types.CHAR) { return "String"; } else if (sqlType == Types.LONGVARCHAR) { return "String"; } else if (sqlType == Types.NVARCHAR) { return "String"; } else if (sqlType == Types.NCHAR) { return "String"; } else if (sqlType == Types.LONGNVARCHAR) { return "String"; } else if (sqlType == Types.NUMERIC) { return "java.math.BigDecimal"; } else if (sqlType == Types.DECIMAL) { return "java.math.BigDecimal"; } else if (sqlType == Types.BIT) { return "Boolean"; } else if (sqlType == Types.BOOLEAN) { return "Boolean"; } else if (sqlType == Types.TINYINT) { return "Integer"; } else if (sqlType == Types.SMALLINT) { return "Integer"; } else if (sqlType == Types.BIGINT) { return "Long"; } else if (sqlType == Types.REAL) { return "Float"; } else if (sqlType == Types.FLOAT) { return "Double"; } else if (sqlType == Types.DOUBLE) { return "Double"; } else if (sqlType == Types.DATE) { return "java.sql.Date"; } else if (sqlType == Types.TIME) { return "java.sql.Time"; } else if (sqlType == Types.TIMESTAMP) { return "java.sql.Timestamp"; } else if (sqlType == Types.BINARY || sqlType == Types.VARBINARY) { return BytesWritable.class.getName(); } else if (sqlType == Types.CLOB) { return ClobRef.class.getName(); } else if (sqlType == Types.BLOB || sqlType == Types.LONGVARBINARY) { return BlobRef.class.getName(); } else { // TODO(aaron): Support DISTINCT, ARRAY, STRUCT, REF, JAVA_OBJECT. // Return null indicating database-specific manager should return a // java data type if it can find one for any nonstandard type. return null; }
后来再查看源码,发现这个报错是在创建hive表是报出来的,是在oracle类型转为对应的hive类型时报的:
TableDefWriter.getCreateTableStmt()方法中:
String hiveColType = userMapping.getProperty(col); if (hiveColType == null) { hiveColType = connManager.toHiveType(inputTableName, col, colType); } if (null == hiveColType) { throw new IOException("Hive does not support the SQL type for column " + col); }
再查发现:org.apache.sqoop.hive.HiveTypes中确实没有对应的BINARY和VARBINARY的处理类型:
public static String toHiveType(int sqlType) { switch (sqlType) { case Types.INTEGER: case Types.SMALLINT: return "INT"; case Types.VARCHAR: case Types.CHAR: case Types.LONGVARCHAR: case Types.NVARCHAR: case Types.NCHAR: case Types.LONGNVARCHAR: case Types.DATE: case Types.TIME: case Types.TIMESTAMP: case Types.CLOB: return "STRING"; case Types.NUMERIC: case Types.DECIMAL: case Types.FLOAT: case Types.DOUBLE: case Types.REAL: return "DOUBLE"; case Types.BIT: case Types.BOOLEAN: return "BOOLEAN"; case Types.TINYINT: return "TINYINT"; case Types.BIGINT: return "BIGINT"; default: // TODO(aaron): Support BINARY, VARBINARY, LONGVARBINARY, DISTINCT, // BLOB, ARRAY, STRUCT, REF, JAVA_OBJECT. return null; } }
于是问题定位到了:
在默认创建hive表时,sqoop根据oracle的RAW类型无法找到对应的HIVE类型,所以报错。
于是解决方法为:
1.通过 --map-column-hive 自己提供列对应的类型
如:
--map-column-hive RAW_TYPE_ID=STRING
这样就指定RAW_TYPE_ID对应的HIVE类型为STRING类型
擦,sqoop导入数据时居然每次都强迫创建一遍hive表,不能自动取消...