algorithm - Removing duplicate strings with limited memory -

say have list of strings , can't load entire list in memory, can load parts of list file. best way solve this?

one approach utilize external sort sort file, , remove duplicates single iteration on sorted list. approach requries little space , o(nlogn) accesses disk.

another approach based on hashing: utilize hashcode of string, , load sublist contains strings whom hashcode in specific range. guaranteed if x loaded, , has duplicate - dupe loaded same bucket well. this requires o(n*#buckets) accesses disk, might require more memory. can invoke procedure recursively (with different hash functions) if needed.

algorithm data-structures

Search This Blog

Three

algorithm - Removing duplicate strings with limited memory -

Comments

Post a Comment

Popular posts from this blog

php - Android app custom user registration and login with cookie using facebook sdk -

ruby on rails - Devise Logout Error in RoR -

c# - Create a Notification Object (Email or Page) At Run Time -- Dependency Injection or Factory -