python - strings in file do not match to string in a set -
python - strings in file do not match to string in a set -
i have file word in each line , set words, , want set not equal words set called 'out' file. there part of code:
def createnextu(self): print "adding words final file" if not os.path.exists(self.finalfile): open(self.finalfile, 'a').close fin = open(self.finalfile,"r") out = set() line in self.lines_seen: #lines_seen set words if line not in fin: out.add(line) else: print line fin.close() fout= open(self.finalfile,"a+") line in out: fout.write(line)
but match bit of real equal words. play same dictionary of words , add together repeat words file each run. doing wrong?? happening?? seek utilize '==' , 'is' comparators , have same result.
edit 1: working huge files(finalfile), can't total loaded @ ram, think should read file line line
edit 2: found big problem pointer:
def createnextu(self): print "adding words final file" if not os.path.exists(self.finalfile): open(self.finalfile, 'a').close out = set() out.clear() open(self.finalfile,"r") fin: word in self.lines_seen: fin.seek(0, 0)'''with line speed downwards 40 lines/second,without dont work''' if word in fin: self.totalmatches = self.totalmatches+1 else: out.add(word) self.totallines=self.totallines+1 fout= open(self.finalfile,"a+") line in out: fout.write(line)
if set lines_seen bucle before opening file, open file each line in lines_seen, speed ups 30k lines/second only. set() having 200k lines/second @ worst, think load file parts , compare using sets. improve solution?
edit 3: done!
fin
filehandle can't compare if line not in fin
. content needs read first.
with open(self.finalfile, "r") fh: fin = fh.read().splitlines() # fin list of words finalfile line in self.lines_seen: #lines_seen set words if line not in fin: out.add(line) else: print line # remove fin.close()
edit:
since lines_seen
set, seek create new set words finalfile
diff sets?
file_set = set() open(self.finalfile, "r") fh: f_line in fh: new_set.add(f_line.strip()) # give words in finalfile not in lines_seen. print new_set.difference(self.lines_seen)
python string set comparison equals
Comments
Post a Comment