algorithm - Removing duplicate strings with limited memory -
algorithm - Removing duplicate strings with limited memory -
say have list of strings , can't load entire list in memory, can load parts of list file. best way solve this?
one approach utilize external sort sort file, , remove duplicates single iteration on sorted list. approach requries little space , o(nlogn)
accesses disk.
another approach based on hashing: utilize hashcode of string, , load sublist contains strings whom hashcode in specific range. guaranteed if x
loaded, , has duplicate - dupe loaded same bucket well. this requires o(n*#buckets)
accesses disk, might require more memory. can invoke procedure recursively (with different hash functions) if needed.
algorithm data-structures
Comments
Post a Comment