python - strings in file do not match to string in a set -



python - strings in file do not match to string in a set -

i have file word in each line , set words, , want set not equal words set called 'out' file. there part of code:

def createnextu(self): print "adding words final file" if not os.path.exists(self.finalfile): open(self.finalfile, 'a').close fin = open(self.finalfile,"r") out = set() line in self.lines_seen: #lines_seen set words if line not in fin: out.add(line) else: print line fin.close() fout= open(self.finalfile,"a+") line in out: fout.write(line)

but match bit of real equal words. play same dictionary of words , add together repeat words file each run. doing wrong?? happening?? seek utilize '==' , 'is' comparators , have same result.

edit 1: working huge files(finalfile), can't total loaded @ ram, think should read file line line

edit 2: found big problem pointer:

def createnextu(self): print "adding words final file" if not os.path.exists(self.finalfile): open(self.finalfile, 'a').close out = set() out.clear() open(self.finalfile,"r") fin: word in self.lines_seen: fin.seek(0, 0)'''with line speed downwards 40 lines/second,without dont work''' if word in fin: self.totalmatches = self.totalmatches+1 else: out.add(word) self.totallines=self.totallines+1 fout= open(self.finalfile,"a+") line in out: fout.write(line)

if set lines_seen bucle before opening file, open file each line in lines_seen, speed ups 30k lines/second only. set() having 200k lines/second @ worst, think load file parts , compare using sets. improve solution?

edit 3: done!

fin filehandle can't compare if line not in fin. content needs read first.

with open(self.finalfile, "r") fh: fin = fh.read().splitlines() # fin list of words finalfile line in self.lines_seen: #lines_seen set words if line not in fin: out.add(line) else: print line # remove fin.close()

edit:

since lines_seen set, seek create new set words finalfile diff sets?

file_set = set() open(self.finalfile, "r") fh: f_line in fh: new_set.add(f_line.strip()) # give words in finalfile not in lines_seen. print new_set.difference(self.lines_seen)

python string set comparison equals

Comments

Popular posts from this blog

php - Android app custom user registration and login with cookie using facebook sdk -

c# - Create a Notification Object (Email or Page) At Run Time -- Dependency Injection or Factory -

Set Up Of Common Name Of SSL Certificate To Protect Plesk Panel -