scala - Using apache Spark to union to Lists of Tuples -
scala - Using apache Spark to union to Lists of Tuples -
i'm attempting union rdd's :
val u1 = sc.parallelize(list ( ("a" , (1,2)) , ("b" , (1,2)))) val u2 = sc.parallelize(list ( ("a" , ("3")) , ("b" , (2)))) i receive error :
scala> u1 union u2 <console>:17: error: type mismatch; found : org.apache.spark.rdd.rdd[(string, any)] required: org.apache.spark.rdd.rdd[(string, (int, int))] note: (string, any) >: (string, (int, int)), class rdd invariant in type t. may wish define t -t instead. (sls 4.5) u1 union u2 ^ the string type in each of above tuples key.
is possible union these 2 types ?
once u1 , u2 unioned intent utilize groupby grouping each item according key.
the issue facing explained compiler: trying bring together values of type (int,int) values of type any. any comes superclass of string , int in statement: sc.parallelize(list ( ("a" , ("3")) , ("b" , (2)))). might error or might intended.
in case, seek create values converge mutual type before union. given tuple1, tuple2 different types, i'd consider other container easier transform.
assuming "3" above 3 (int):
val au1 = sc.parallelize(list ( ("a" , array(1,2)) , ("b" , array(1,2)))) val au2 = sc.parallelize(list ( ("a" , array(3)) , ("b" , array(2)))) au1 union au2 org.apache.spark.rdd.rdd[(string, array[int])] = unionrdd[10] @ union @ <console>:17 res: array[(string, array[int])] = array((a,array(1, 2)), (b,array(1, 2)), (a,array(3)), (b,array(2))) once u1 , u2 unioned intent utilize groupby grouping each item according key.
if intend grouping both rdds key, may consider using join instead of union. gets job done @ once
au1 bring together au2 res: array[(string, (array[int], array[int]))] = array((a,(array(1, 2),array(3))), (b,(array(1, 2),array(2)))) if "3" above "3" (string): i'd consider map values first mutual type. either strings or ints. create info easier manipulate having any type. life easier.
scala apache-spark
Comments
Post a Comment