python - Filtering out csv rows by column data -
python - Filtering out csv rows by column data -
i'm not sure how phone call have csv data:
...|address | date |... ...|abraham st.| 01/01/2008 |... ...|abraham st.| 02/02/2007 |... ...|abraham st.| 03/03/2011|...
so want maintain newest entry(in case row4), i'm having problem bending mind around this.
my initial thought read info csv list of rows , then:
to convert date strings datetime object and go through every row, it's name , comparing every other row find highest date , save date's row.is there improve way approach this?
just utilize max
builtin key
function extracts , converts date field datetime
object. assume dates mm/dd/yyyy.
import csv datetime import datetime date_column = 1 open('input.csv') f: reader = csv.reader(f, delimiter='|') next(reader) # skip on csv header row most_recent = max(reader, key=lambda x : datetime.strptime(x[date_column].strip(), '%d/%m/%y')) >>> print most_recent ['abraham st.', ' 03/03/2011']
i think intent grouping "address" column , select recent date "date" column, in case can utilize itertools.groupby()
this:
import csv itertools import groupby datetime import datetime address_column = 0 date_column = 1 most_recent = [] open('input.csv') f: reader = csv.reader(f, delimiter='|') next(reader) # skip on csv header row k, g in groupby(sorted(reader), lambda x : x[address_column]): most_recent.append(max(g, key=lambda x : datetime.strptime(x[date_column].strip(), '%d/%m/%y'))) >>> print most_recent [['abraham st.', ' 03/03/2011'], ['moses rd.', ' 10/12/2013'], ['smith st.', ' 01/01/1999']]
assuming input.csv contains this:
address |date abraham st.| 01/01/2008 abraham st.| 02/02/2007 abraham st.| 03/03/2011 moses rd.| 10/12/2013 moses rd.| 11/11/2011 smith st.| 01/01/1999
python csv filter
Comments
Post a Comment