algorithm - Removing duplicate strings with limited memory -



algorithm - Removing duplicate strings with limited memory -

say have list of strings , can't load entire list in memory, can load parts of list file. best way solve this?

one approach utilize external sort sort file, , remove duplicates single iteration on sorted list. approach requries little space , o(nlogn) accesses disk.

another approach based on hashing: utilize hashcode of string, , load sublist contains strings whom hashcode in specific range. guaranteed if x loaded, , has duplicate - dupe loaded same bucket well. this requires o(n*#buckets) accesses disk, might require more memory. can invoke procedure recursively (with different hash functions) if needed.

algorithm data-structures

Comments

Popular posts from this blog

php - Android app custom user registration and login with cookie using facebook sdk -

django - Access session in user model .save() -

php - .htaccess Multiple Rewrite Rules / Prioritizing -