garbage collection - scala splitting strings from a Stream[String] => GC overhead limit exceeded -



garbage collection - scala splitting strings from a Stream[String] => GC overhead limit exceeded -

i don't understand why spliting stream[string] produces gc overhead limit exceeded depending on whether str in stream[string].flatmap{string => str.split(" ")} invariant or randomly emitted.

when str invariant, no overhead happens instead in random case.

i not referencing objects in looping blocks.

i utilize def declare streams in order produce non-accumulating streams.

thanks insights.

here's code:

import scala.util.random object dataops{ val randomgen:random = new random() def randomtext:string = (0 300).map(x => randomgen.nextstring(10)).mkstring(" ") val text:string = array.fill(300)(randomgen.nextstring(10)).mkstring(" ") //return stream of strind using same 'txt:string' def infiniteinvariantdatastream(cnt:int): stream[string] = { if (cnt>0) text#::infiniteinvariantdatastream(cnt-1) else stream[string]() } //return stream of random string def infinitedatastream(cnt:int):stream[string] = { if (cnt>0) randomtext#::infinitedatastream(cnt-1) else stream[string]() } } object basicops{ def dummystringstreamsplit(datastream: stream[string]) = { datastream .flatmap(txt => txt.split(" ")) .foreach(word => word) } } object scalaoverflow extends app{ val n_lines:int = 1000000 println("splitting looping on invariant text") def datastream1:stream[string] = dataops.infiniteinvariantdatastream(n_lines) basicops.dummystringstreamsplit(datastream1) println("invariant line split ok: no heap overflow") println("splitting looping on random text") def datastream3:stream[string] = dataops.infinitedatastream(n_lines) basicops.dummystringstreamsplit(datastream3) println("random line split ok: no heap overflow") }

and here 's error :

splitting looping on invariant text invariant line split ok: no heap overflow splitting looping on random text java.lang.outofmemoryerror: gc overhead limit exceeded @ java.lang.string.valueof(string.java:2840) @ java.lang.character.tostring(character.java:2136) @ java.lang.string.valueof(string.java:2826) @ scala.collection.mutable.stringbuilder.append(stringbuilder.scala:198) @ scala.collection.traversableonce$$anonfun$addstring$1.apply(traversableonce.scala:350) @ scala.collection.immutable.list.foreach(list.scala:383) @ scala.collection.traversableonce$class.addstring(traversableonce.scala:343) @ scala.collection.abstracttraversable.addstring(traversable.scala:104) @ scala.collection.traversableonce$class.mkstring(traversableonce.scala:309) @ scala.collection.abstracttraversable.mkstring(traversable.scala:104) @ scala.collection.traversableonce$class.mkstring(traversableonce.scala:311) @ scala.collection.abstracttraversable.mkstring(traversable.scala:104) @ scala.collection.traversableonce$class.mkstring(traversableonce.scala:313) @ scala.collection.abstracttraversable.mkstring(traversable.scala:104) @ scala.util.random.nextstring(random.scala:89) @ dataops$$anonfun$randomtext$1.apply(scalaoverflow.scala:5) @ dataops$$anonfun$randomtext$1.apply(scalaoverflow.scala:5) @ scala.collection.traversablelike$$anonfun$map$1.apply(traversablelike.scala:245) @ scala.collection.traversablelike$$anonfun$map$1.apply(traversablelike.scala:245) @ scala.collection.immutable.range.foreach(range.scala:160) @ scala.collection.traversablelike$class.map(traversablelike.scala:245) @ scala.collection.abstracttraversable.map(traversable.scala:104) @ dataops$.randomtext(scalaoverflow.scala:5) @ dataops$.infinitedatastream(scalaoverflow.scala:16) @ dataops$$anonfun$infinitedatastream$1.apply(scalaoverflow.scala:16) @ dataops$$anonfun$infinitedatastream$1.apply(scalaoverflow.scala:16) @ scala.collection.immutable.stream$cons.tail(stream.scala:1117) @ scala.collection.immutable.stream$cons.tail(stream.scala:1107) @ scala.collection.immutable.stream$$anonfun$flatmap$1.apply(stream.scala:458) @ scala.collection.immutable.stream$$anonfun$flatmap$1.apply(stream.scala:458) @ scala.collection.immutable.stream.append(stream.scala:241) @ scala.collection.immutable.stream$$anonfun$append$1.apply(stream.scala:241)

update

actually, reason of streaming rooted in method below. whole point beingness turn java while loop functional friendly stream:

import java.sql.{connection, resultset, statement, drivermanager} def sqlstream(psqlresult: resultset, colname:string): stream[(int,string)] = { val state:boolean = psqlresult.next() if (state && psqlresult.getstring(colname) != null) (psqlresult.getrow(), psqlresult.getstring(colname))#::sqlstream(psqlresult, colname) else if (state) sqlstream(psqlresult, colname) else stream[(int,string)]() }

should have considered improve alternative?

thanks.

the parameter datastream in dummystringstreamsplit acts val , maintains reference origin of passed-in stream. causes unbounded memory utilize , eventual gc overhead limit exceeded error.

there no way create method takes stream , computes based on every element (rather returning new stream) safe. @ least, there no way guarantee client code didn't pass stream beingness held in variable somewhere.

if instead define dummystringstreamsplit like:

def dummystringstreamsplit(datastream: stream[string]) = datastream.flatmap(txt => txt.split(" "))

you can do:

println("splitting looping on random text") def datastream3:stream[string] = dataops.infinitedatastream(n_lines) def datastream3split = basicops.dummystringstreamsplit(datastream3) datastream3split.foreach(word => word) println("random line split ok: no heap overflow")

and won't gc overhead limit exceeded error.

scala garbage-collection heap heap-memory

Comments

Popular posts from this blog

model view controller - MVC Rails Planning -

ruby on rails - Devise Logout Error in RoR -

html - Submenu setup with jquery and effect 'fold' -