当前位置: 代码迷 >> 综合 >> pyspark sql数据类型
  详细解决方案

pyspark sql数据类型

热度:85   发布时间:2023-12-19 02:59:24.0

1. pyspark数据类型

“DataType”, “NullType”, “StringType”, “BinaryType”, “BooleanType”, “DateType”,
“TimestampType”, “DecimalType”, “DoubleType”, “FloatType”, “ByteType”, “IntegerType”,
“LongType”, “ShortType”, “ArrayType”, “MapType”, “StructField”, “StructType”

2. 示例 StructField

class StructField(DataType):"""A field in :class:`StructType`.:param name: string, name of the field.:param dataType: :class:`DataType` of the field.:param nullable: boolean, whether the field can be null (None) or not.:param metadata: a dict from string to simple type that can be toInternald to JSON automatically"""def __init__(self, name, dataType, nullable=True, metadata=None):""">>> (StructField("f1", StringType(), True)... == StructField("f1", StringType(), True))True>>> (StructField("f1", StringType(), True)... == StructField("f2", StringType(), True))False"""assert isinstance(dataType, DataType), "dataType should be DataType"assert isinstance(name, basestring), "field name should be string"if not isinstance(name, str):name = name.encode('utf-8')self.name = nameself.dataType = dataTypeself.nullable = nullableself.metadata = metadata or {
    }

3. DataFrame指定类型

指定说明每个DataFrame的数据类型。

val schema = StructType(List(StructField("id", IntegerType, true),StructField("name", StringType, true),StructField("age", IntegerType, true)))//将RDD映射到rowRDDval rowRDD = personRDD.map(p => Row(p(0).toInt, p(1).trim, p(2).toInt))//将schema信息应用到rowRDD上val personDataFrame = sqlContext.createDataFrame(rowRDD, schema)

参考:

  1. 原文链接:;
  2. Source code for pyspark.sql.types
  相关解决方案