cluster analysis - interpreting the results of OPTICSxi Clustering -



cluster analysis - interpreting the results of OPTICSxi Clustering -

i interested in detecting clusters in areas varying-density, such user-generated info in cities, , adopted optics algorithm.

unlike dbscan, optics algorithm not produce strict cluster partition, augmented ordering of database. produce cluster partition, utilize opticsxi, algorithm produces classification based on output of optics. there few libraries capable of extracting cluster partition output of optics, , elki’s opticsxi implementation 1 of them.

it clear me, how-to interpret results of dbscan (although not easy, set “meaningful” global parameters); dbscan detects “prototype” of cluster, characterized density, expressed number of points per area (minpts/epsilon). results of opticsxi seem bit more hard interpret.

there 2 phenomena observe in outputs of opticsxi, , not able explain. 1 the appearance of “spike” clusters, link parts of map. cannot explain them, because seem made of few points , don’t understand how algorithm decides grouping them in same cluster. represent “corridor” of density variation? looking @ underlying data, not that. can see these “spikes” in image bellow.

the other phenomenon cannot explain fact sometimes there "overlapping" clusters of same hierarchical level. opticsxi based on optics ordering of database (e.g. dendrogram) , there no repeated points in diagram.

since hierarchical clustering, consider clusters of lower level contain clusters of higher level, , thought enforced when building convex hulls. however, don’t see justification having clusters intersect other clusters on same hierarchical level, in practice mean points have double cluster “membership”. on image bellow, can see intersecting clusters same hierarchical level (0).

finally of import thought/question want leave with, is: what expect see in opticsxi clustering classification? question closely linked task of parametrizing opticsxi.

since see hardly studies runs of opticsxi particular cluster problem, struggle find optimal clustering classification be; i.e.: 1 can provide meaningful/useful results, , add together value dbscan clustering. help me answering question, performed many runs of opticsxi, different combinations of parameters, , selected 3 discuss bellow.

on run used big value of epsilon (2km); meaning of value take big clusters (up 2km); since algorithm “merges” clusters, end big clusters, have low density. output, because exposes hierarchical construction of classification, , reminds me of several runs dbscan different combination of parameters (for different densities), advertised “strength” of optics. mentioned before, smaller clusters correspond higher levels in hierarchical scale, , higher densities.

on run see big number of clusters, if “contrast” parameter same previous run. because chosen low number of minpts, established take clusters low number of points. since epsilon in case shorter, don’t see these big clusters occupying big part of map. find output less interesting previous one, because, if have hierarchical construction there many clusters @ same level, , many of them intersect. in terms of interpretation, can see overall “shape” similar previous one, discretized in lots of little clusters overlooked “noise”.

this run has parameter selection similar previous one, except minpts larger; consequences not find less clusters , overlap less, @ same level.

in perspective of adding value dbscan, opt first combination of parameters, since provides hierarchical image of data, exposing areas more dense. imho lastly combination of parameters, fails provide thought of global distribution of density, since finding similar clusters on study area. interested read other opinions.

the problem extracting clusters optics plot first , lastly elements of clsuter. plot, cannot (to understanding) decide whether lastly element should belong previous cluster or not.

consider plot this

* * * * * * ** ************** b c d ef g h

this can cluster right in middle, b-e nearby, , f nearest element in different cluster. example, info set might this:

* d * b e f g * c h *

or, @ rim of first cluster, b-d part of cluster, whereas e outlier element bridging gap cluster f-h. info set causes such effect this:

d * * * c b e f g e * h *

opticsxi operates visually. f "steeper" point split, e will in each case part of first cluster. literally best guess opticsxi can without looking @ info points.

this effect causing spikes have been observing.

i see 4 options:

improve opticsxi yourself. if interested, can discuss heuristics possible distinguish these 2 cases above.

implement 1 of other extraction methods, such inflexion points (but may suffer same effects, als are in plot afaict)

use hdbscan (sorry, not yet included in elki, although have version appears working) - in 0.7.0

apply post-processing clusters. in particular, test first , lastly few points cluster order, if want include them in cluster, move them next, or move them parent cluster. maybe average distance cluster...

cluster-analysis hierarchical-clustering dbscan elki optics-algorithm

Comments

Popular posts from this blog

php - Android app custom user registration and login with cookie using facebook sdk -

django - Access session in user model .save() -

php - .htaccess Multiple Rewrite Rules / Prioritizing -