garbage collection - scala splitting strings from a Stream[String] => GC overhead limit exceeded -
garbage collection - scala splitting strings from a Stream[String] => GC overhead limit exceeded -
i don't understand why spliting stream[string] produces gc overhead limit exceeded depending on whether str in stream[string].flatmap{string => str.split(" ")} invariant or randomly emitted.
when str invariant, no overhead happens instead in random case.
i not referencing objects in looping blocks.
i utilize def declare streams in order produce non-accumulating streams.
thanks insights.
here's code:
import scala.util.random object dataops{ val randomgen:random = new random() def randomtext:string = (0 300).map(x => randomgen.nextstring(10)).mkstring(" ") val text:string = array.fill(300)(randomgen.nextstring(10)).mkstring(" ") //return stream of strind using same 'txt:string' def infiniteinvariantdatastream(cnt:int): stream[string] = { if (cnt>0) text#::infiniteinvariantdatastream(cnt-1) else stream[string]() } //return stream of random string def infinitedatastream(cnt:int):stream[string] = { if (cnt>0) randomtext#::infinitedatastream(cnt-1) else stream[string]() } } object basicops{ def dummystringstreamsplit(datastream: stream[string]) = { datastream .flatmap(txt => txt.split(" ")) .foreach(word => word) } } object scalaoverflow extends app{ val n_lines:int = 1000000 println("splitting looping on invariant text") def datastream1:stream[string] = dataops.infiniteinvariantdatastream(n_lines) basicops.dummystringstreamsplit(datastream1) println("invariant line split ok: no heap overflow") println("splitting looping on random text") def datastream3:stream[string] = dataops.infinitedatastream(n_lines) basicops.dummystringstreamsplit(datastream3) println("random line split ok: no heap overflow") } and here 's error :
splitting looping on invariant text invariant line split ok: no heap overflow splitting looping on random text java.lang.outofmemoryerror: gc overhead limit exceeded @ java.lang.string.valueof(string.java:2840) @ java.lang.character.tostring(character.java:2136) @ java.lang.string.valueof(string.java:2826) @ scala.collection.mutable.stringbuilder.append(stringbuilder.scala:198) @ scala.collection.traversableonce$$anonfun$addstring$1.apply(traversableonce.scala:350) @ scala.collection.immutable.list.foreach(list.scala:383) @ scala.collection.traversableonce$class.addstring(traversableonce.scala:343) @ scala.collection.abstracttraversable.addstring(traversable.scala:104) @ scala.collection.traversableonce$class.mkstring(traversableonce.scala:309) @ scala.collection.abstracttraversable.mkstring(traversable.scala:104) @ scala.collection.traversableonce$class.mkstring(traversableonce.scala:311) @ scala.collection.abstracttraversable.mkstring(traversable.scala:104) @ scala.collection.traversableonce$class.mkstring(traversableonce.scala:313) @ scala.collection.abstracttraversable.mkstring(traversable.scala:104) @ scala.util.random.nextstring(random.scala:89) @ dataops$$anonfun$randomtext$1.apply(scalaoverflow.scala:5) @ dataops$$anonfun$randomtext$1.apply(scalaoverflow.scala:5) @ scala.collection.traversablelike$$anonfun$map$1.apply(traversablelike.scala:245) @ scala.collection.traversablelike$$anonfun$map$1.apply(traversablelike.scala:245) @ scala.collection.immutable.range.foreach(range.scala:160) @ scala.collection.traversablelike$class.map(traversablelike.scala:245) @ scala.collection.abstracttraversable.map(traversable.scala:104) @ dataops$.randomtext(scalaoverflow.scala:5) @ dataops$.infinitedatastream(scalaoverflow.scala:16) @ dataops$$anonfun$infinitedatastream$1.apply(scalaoverflow.scala:16) @ dataops$$anonfun$infinitedatastream$1.apply(scalaoverflow.scala:16) @ scala.collection.immutable.stream$cons.tail(stream.scala:1117) @ scala.collection.immutable.stream$cons.tail(stream.scala:1107) @ scala.collection.immutable.stream$$anonfun$flatmap$1.apply(stream.scala:458) @ scala.collection.immutable.stream$$anonfun$flatmap$1.apply(stream.scala:458) @ scala.collection.immutable.stream.append(stream.scala:241) @ scala.collection.immutable.stream$$anonfun$append$1.apply(stream.scala:241) update
actually, reason of streaming rooted in method below. whole point beingness turn java while loop functional friendly stream:
import java.sql.{connection, resultset, statement, drivermanager} def sqlstream(psqlresult: resultset, colname:string): stream[(int,string)] = { val state:boolean = psqlresult.next() if (state && psqlresult.getstring(colname) != null) (psqlresult.getrow(), psqlresult.getstring(colname))#::sqlstream(psqlresult, colname) else if (state) sqlstream(psqlresult, colname) else stream[(int,string)]() } should have considered improve alternative?
thanks.
the parameter datastream in dummystringstreamsplit acts val , maintains reference origin of passed-in stream. causes unbounded memory utilize , eventual gc overhead limit exceeded error.
there no way create method takes stream , computes based on every element (rather returning new stream) safe. @ least, there no way guarantee client code didn't pass stream beingness held in variable somewhere.
if instead define dummystringstreamsplit like:
def dummystringstreamsplit(datastream: stream[string]) = datastream.flatmap(txt => txt.split(" ")) you can do:
println("splitting looping on random text") def datastream3:stream[string] = dataops.infinitedatastream(n_lines) def datastream3split = basicops.dummystringstreamsplit(datastream3) datastream3split.foreach(word => word) println("random line split ok: no heap overflow") and won't gc overhead limit exceeded error.
scala garbage-collection heap heap-memory
Comments
Post a Comment