JavaPairRDD的collect方法讲解
官方文档说明
/*** Return an array that contains all of the elements in this RDD.** @note this method should only be used if the resulting array is expected to be small, as* all the data is loaded into the driver's memory.*/
中文含义
返回包含此RDD中所有元素的数组。
注意:只有当结果数组很小时才应使用此方法,因为所有的数据都被载入节点的内存中。
方法原型
//scala
/*** Return an array that contains all of the elements in this RDD.*/
def collect(): List[(K, V)]
//java
public static java.util.List<T> collect()
实例
public class Collect {
public static void main(String[] args) {
System.setProperty("hadoop.home.dir", "E:\\hadoop-2.7.1");SparkConf sparkConf = new SparkConf().setMaster("local").setAppName("Spark_DEMO");JavaSparkContext sc = new JavaSparkContext(sparkConf);JavaPairRDD<String,String> javaPairRDD1 = sc.parallelizePairs(Lists.newArrayList(new Tuple2<String, String>("1","abc11"),new Tuple2<String, String>("2","abc22"),new Tuple2<String, String>("3","33333")));// 返回一个列表List<Tuple2<String,String>> list = javaPairRDD1.collect();// 遍历列表for (Tuple2<String, String> stringStringTuple2 : list) {
System.out.println(stringStringTuple2);}}
}
结果
19/03/19 15:19:38 INFO DAGScheduler: Job 0 finished: collect at Collect.java:22, took 0.834232 s
19/03/19 15:19:38 INFO SparkContext: Invoking stop() from shutdown hook
(1,abc11)
(2,abc22)
(3,33333)
19/03/19 15:19:38 INFO SparkUI: Stopped Spark web UI at http://10.124.209.6:4040
19/03/19 15:19:38 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 10.124.209.6:57881 in memory (size: 897.0 B, free: 357.6 MB)
19/03/19 15:19:38 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
19/03/19 15:19:38 INFO MemoryStore: MemoryStore cleared
19/03/19 15:19:38 INFO BlockManager: BlockManager stopped
19/03/19 15:19:38 INFO BlockManagerMaster: BlockManagerMaster stopped
19/03/19 15:19:38 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
19/03/19 15:19:38 INFO SparkContext: Successfully stopped SparkContext
19/03/19 15:19:38 INFO ShutdownHookManager: Shutdown hook called
19/03/19 15:19:38 INFO ShutdownHookManager: Deleting directory C:\Users\Administrator\AppData\Local\Temp\spark-5762ad13-6044-421b-96f4-08fa3685b17f
注意
数据量太大的情况下,不要用collect,会造成内存溢出