python - Keeping the latest date when counting uniques in another column -



python - Keeping the latest date when counting uniques in another column -

i have next dataframe:

date name 0 20/06/2014 allan watt 1 20/06/2014 cindy mark 2 20/06/2014 luisa mostert 3 19/06/2014 allan watt

i end next dataframe counts unique values in 'name' , uses latest date value. example:

latest_date name count 0 20/06/2014 allan watt 2 1 20/06/2014 cindy mark 1 2 20/06/2014 luisa mostert 1

currently, adding 'count' column doing:

df = pd.dataframe({'count': df.groupby(['name']).size()}).reset_index() name count 0 allan watt 2 1 cindy mark 1 2 luisa mostert 1

but drops date column off completely. whereas:

df = pd.dataframe({'count': df.groupby(['name', 'date']).size()}).reset_index()

obviously, groups date leave me with:

latest_date name count 0 20/06/2014 allan watt 1 1 20/06/2014 cindy mark 1 2 20/06/2014 luisa mostert 1 3 19/06/2014 allan watt 1

what optimal approach accomplish intended result?

you can like:

df['count'] = 1 df = df.groupby('name').agg({'count':sum, 'date':max}) df = df.rename(columns={'date':'latest_date'}) df = df.reset_index() print df name count latest_date 0 allan watt 2 20/06/2014 1 cindy mark 1 20/06/2014 2 luisa mostert 1 20/06/2014

python pandas dataframes

Comments

Popular posts from this blog

php - Android app custom user registration and login with cookie using facebook sdk -

django - Access session in user model .save() -

php - .htaccess Multiple Rewrite Rules / Prioritizing -