optimization - Python speed up: checking if value in list -



optimization - Python speed up: checking if value in list -

i have programme processes csv file. contents of csv follows

lines = [ [id_a, val1, val2, ..., valn], [id_a, val1, val2, ..., valn], [id_b, val1, val2, ..., valn], [id_b, val1, val2, ..., valn], [id_b, val1, val2, ..., valn], [id_b, val1, val2, ..., valn], [id_c, val1, val2, ..., valn], [id_c, val1, val2, ..., valn], ]

i building dictionary looks

my_dict = { 'id_a': ['many', 'values'], 'id_b': ['many', ''more', 'values'] 'id_c': ['some', 'other', 'values']}

my current implementation looks like

for line in lines: log_id = line[0] if log_id not in my_dict.keys(): datablock = lines[1:] my_dict[log_id] = datablock else: my_dict[log_id].append(lines[1:])

with close 1000000 lines in csv, programme starts slow downwards 1 time there couple one thousand entries in dictionary. have been debugging spattering of print statements, , bottleneck seems here in if log_id not in my_dict.keys(): line

i tried using seperate list keeping track of ids in dictionary, did not seem help.

could using set here work, or alternative out since changes each loop , need reconstructed?

you creating list of keys each time. remove dict.keys() call, slowing downwards not needed:

if log_id not in my_dict:

dictionaries back upwards membership testing directly, , in o(1) time. dict.keys() returns new list, however, , membership testing on list not efficient (it takes o(n) time). each membership test, code loop on all keys produce new list object, loop on list 1 time again find match.

you can simplify code bit using dict.setdefault():

for line in lines: log_id = line[0] my_dict.setdefault(log_id, []).append(lines[1:])

dict.setdefault() returns value associated given key, , if key missing, uses sec argument default value (adding key , value dictionary).

alternatively, utilize collections.defaultdict() object instead of plain dictionary:

from collections import defaultdict mydict = defaultdict(list) line in lines: log_id = line[0] my_dict[log_id].append(lines[1:])

a defaultdict simple dict subclass phone call configured mill every time key missing; here list() called create new value missing keys moment seek access one.

python optimization

Comments

Popular posts from this blog

php - Android app custom user registration and login with cookie using facebook sdk -

django - Access session in user model .save() -

php - .htaccess Multiple Rewrite Rules / Prioritizing -