optimization - Python speed up: checking if value in list -
optimization - Python speed up: checking if value in list -
i have programme processes csv file. contents of csv follows
lines = [ [id_a, val1, val2, ..., valn], [id_a, val1, val2, ..., valn], [id_b, val1, val2, ..., valn], [id_b, val1, val2, ..., valn], [id_b, val1, val2, ..., valn], [id_b, val1, val2, ..., valn], [id_c, val1, val2, ..., valn], [id_c, val1, val2, ..., valn], ]
i building dictionary looks
my_dict = { 'id_a': ['many', 'values'], 'id_b': ['many', ''more', 'values'] 'id_c': ['some', 'other', 'values']}
my current implementation looks like
for line in lines: log_id = line[0] if log_id not in my_dict.keys(): datablock = lines[1:] my_dict[log_id] = datablock else: my_dict[log_id].append(lines[1:])
with close 1000000 lines in csv, programme starts slow downwards 1 time there couple one thousand entries in dictionary. have been debugging spattering of print statements, , bottleneck seems here in if log_id not in my_dict.keys():
line
i tried using seperate list
keeping track of ids in dictionary, did not seem help.
could using set
here work, or alternative out since changes each loop , need reconstructed?
you creating list of keys each time. remove dict.keys()
call, slowing downwards not needed:
if log_id not in my_dict:
dictionaries back upwards membership testing directly, , in o(1) time. dict.keys()
returns new list, however, , membership testing on list not efficient (it takes o(n) time). each membership test, code loop on all keys produce new list object, loop on list 1 time again find match.
you can simplify code bit using dict.setdefault()
:
for line in lines: log_id = line[0] my_dict.setdefault(log_id, []).append(lines[1:])
dict.setdefault()
returns value associated given key, , if key missing, uses sec argument default value (adding key , value dictionary).
alternatively, utilize collections.defaultdict()
object instead of plain dictionary:
from collections import defaultdict mydict = defaultdict(list) line in lines: log_id = line[0] my_dict[log_id].append(lines[1:])
a defaultdict
simple dict
subclass phone call configured mill every time key missing; here list()
called create new value missing keys moment seek access one.
python optimization
Comments
Post a Comment