python - Keeping the latest date when counting uniques in another column -
python - Keeping the latest date when counting uniques in another column -
i have next dataframe:
date name 0 20/06/2014 allan watt 1 20/06/2014 cindy mark 2 20/06/2014 luisa mostert 3 19/06/2014 allan watt
i end next dataframe counts unique values in 'name' , uses latest date value. example:
latest_date name count 0 20/06/2014 allan watt 2 1 20/06/2014 cindy mark 1 2 20/06/2014 luisa mostert 1
currently, adding 'count' column doing:
df = pd.dataframe({'count': df.groupby(['name']).size()}).reset_index() name count 0 allan watt 2 1 cindy mark 1 2 luisa mostert 1
but drops date column off completely. whereas:
df = pd.dataframe({'count': df.groupby(['name', 'date']).size()}).reset_index()
obviously, groups date leave me with:
latest_date name count 0 20/06/2014 allan watt 1 1 20/06/2014 cindy mark 1 2 20/06/2014 luisa mostert 1 3 19/06/2014 allan watt 1
what optimal approach accomplish intended result?
you can like:
df['count'] = 1 df = df.groupby('name').agg({'count':sum, 'date':max}) df = df.rename(columns={'date':'latest_date'}) df = df.reset_index() print df name count latest_date 0 allan watt 2 20/06/2014 1 cindy mark 1 20/06/2014 2 luisa mostert 1 20/06/2014
python pandas dataframes
Comments
Post a Comment