python - Multiindexed Pandas groupby, ignore a level? -



python - Multiindexed Pandas groupby, ignore a level? -

i'm running groupby operation on multiindexed dataframe similar one:

0 1 ... categories features subfeatures cat1 feature1 subfeature1 -0.224487 -0.227524 subfeature2 -0.591399 -0.799228 feature2 subfeature1 1.190110 -1.365895 ... subfeature2 0.720956 -1.325562 cat2 feature1 subfeature1 1.856932 nan subfeature2 -1.354258 -0.740473 feature2 subfeature1 0.234075 -1.362235 ... subfeature2 0.013875 1.309564 cat3 feature1 subfeature1 nan nan subfeature2 -1.260408 1.559721 ... feature2 subfeature1 0.419246 0.084386 subfeature2 0.969270 1.493417 ... ... ...

and can generated using next code:

import pandas pd, numpy np np.random.seed(seed=90) results = np.random.randn(3,2,2,2) results[2,0,0,:] = np.nan results[1,0,0,1] = np.nan results = results.reshape((-1,2)) index = pd.multiindex.from_product([["cat1", "cat2", "cat3"], ["feature1", "feature2"], ["subfeature1", "subfeature2"]], names=["categories", "features", "subfeatures"]) df = pd.dataframe(results, index=index)

i attempting select groups have maximum difference between 2 subfeature arrays greater threshold, i'm having problem groupby

df.groupby(level=['categories','features'])

this gives me next groups:

{('cat1', 'feature1'): [('cat1', 'feature1', 'subfeature1'), ('cat1', 'feature1', 'subfeature2')], ('cat1', 'feature2'): [('cat1', 'feature2', 'subfeature1'), ('cat1', 'feature2', 'subfeature2')], ('cat2', 'feature1'): [('cat2', 'feature1', 'subfeature1'), ('cat2', 'feature1', 'subfeature2')], ('cat2', 'feature2'): [('cat2', 'feature2', 'subfeature1'), ('cat2', 'feature2', 'subfeature2')], ('cat3', 'feature1'): [('cat3', 'feature1', 'subfeature1'), ('cat3', 'feature1', 'subfeature2')], ('cat3', 'feature2'): [('cat3', 'feature2', 'subfeature1'), ('cat3', 'feature2', 'subfeature2')]}

is there way grouping subfeature level ignored groupby function? reason need both subfeature1 , subfeature2 together, in separate groups they're worthless.

so ideally want groupby homecoming this:

{('cat1', 'feature1'): [('cat1', 'feature1')], ('cat1', 'feature2'): [('cat1', 'feature2')], ('cat2', 'feature1'): [('cat2', 'feature1')], ('cat2', 'feature2'): [('cat2', 'feature2')], ('cat3', 'feature1'): [('cat3', 'feature1')], ('cat3', 'feature2'): [('cat3', 'feature2')],

how this?

in [20]: df.reset_index(level='subfeatures').groupby(level=['categories','features']).groups out[20]: {('cat1', 'feature1'): [('cat1', 'feature1'), ('cat1', 'feature1')], ('cat1', 'feature2'): [('cat1', 'feature2'), ('cat1', 'feature2')], ('cat2', 'feature1'): [('cat2', 'feature1'), ('cat2', 'feature1')], ('cat2', 'feature2'): [('cat2', 'feature2'), ('cat2', 'feature2')], ('cat3', 'feature1'): [('cat3', 'feature1'), ('cat3', 'feature1')], ('cat3', 'feature2'): [('cat3', 'feature2'), ('cat3', 'feature2')]}

python filter pandas

Comments

Popular posts from this blog

model view controller - MVC Rails Planning -

ruby on rails - Devise Logout Error in RoR -

html - Submenu setup with jquery and effect 'fold' -