Abstract :
All clustering methods have to assume some
cluster relationship among the data objects that they are
applied on. Similarity between a pair of objects can be
defined either explicitly or implicitly. In this paper, we
introduce a novel multi-viewpoint based similarity measure
and two related clustering methods. The major difference
between a traditional similarity measure and ours is that the
former uses only a multi-viewpoint on clustered, which is
the origin, while the latter utilizes many different
viewpoints, which are objects, assumed to not be in the
same cluster with the two objects being measured. Using
multiple viewpoints, more informative assessment of
similarity could be achieved. It combines the
neighbourhood preservation capability of multidimensional
content with the familiar optimal snippet-based
representation by employing a multidimensional content to
derive two-dimensional layouts of the query search results
that preserve text similarity relations, or neighbour hoods.
Theoretical analysis and empirical study are conducted to
support this claim. Two criterion functions for document
clustering are proposed based on this new measure. We
compare them with several well-known clustering
algorithms that use other popular similarity measures on
various document collections to verify the advantages of
our proposal.
Keyword :
Multi-view point, term frequency (TF), clustering, Euclidean distance