scala - Using apache Spark to union to Lists of Tuples -
scala - Using apache Spark to union to Lists of Tuples -
i'm attempting union rdd's :
val u1 = sc.parallelize(list ( ("a" , (1,2)) , ("b" , (1,2)))) val u2 = sc.parallelize(list ( ("a" , ("3")) , ("b" , (2))))
i receive error :
scala> u1 union u2 <console>:17: error: type mismatch; found : org.apache.spark.rdd.rdd[(string, any)] required: org.apache.spark.rdd.rdd[(string, (int, int))] note: (string, any) >: (string, (int, int)), class rdd invariant in type t. may wish define t -t instead. (sls 4.5) u1 union u2 ^
the string type in each of above tuples key.
is possible union these 2 types ?
once u1 , u2 unioned intent utilize groupby grouping each item according key.
the issue facing explained compiler: trying bring together values of type (int,int)
values of type any
. any
comes superclass of string
, int
in statement: sc.parallelize(list ( ("a" , ("3")) , ("b" , (2))))
. might error or might intended.
in case, seek create values converge mutual type before union. given tuple1, tuple2 different types, i'd consider other container easier transform.
assuming "3"
above 3
(int
):
val au1 = sc.parallelize(list ( ("a" , array(1,2)) , ("b" , array(1,2)))) val au2 = sc.parallelize(list ( ("a" , array(3)) , ("b" , array(2)))) au1 union au2 org.apache.spark.rdd.rdd[(string, array[int])] = unionrdd[10] @ union @ <console>:17 res: array[(string, array[int])] = array((a,array(1, 2)), (b,array(1, 2)), (a,array(3)), (b,array(2)))
once u1 , u2 unioned intent utilize groupby grouping each item according key.
if intend grouping both rdds key, may consider using join
instead of union
. gets job done @ once
au1 bring together au2 res: array[(string, (array[int], array[int]))] = array((a,(array(1, 2),array(3))), (b,(array(1, 2),array(2))))
if "3"
above "3"
(string
): i'd consider map values first mutual type. either strings or ints. create info easier manipulate having any
type. life easier.
scala apache-spark
Comments
Post a Comment