Constraint-based discriminative dimension selection for high-dimensional stream clustering

Clustering data streams is one of active research topic in data mining. However, runtime of the existing stream clustering algorithms increases and their performance drop in the face of large number of dimensions. Complexity of the stream clustering methods is increased when perform on data with lar...

Full description

Saved in:
Bibliographic Details
Main Authors: Waiyamai, Kitsana (Author), Kangkachit, Thanapat (Author)
Format: EJournal Article
Published: Universitas Ahmad Dahlan, 2018-11-11.
Subjects:
Online Access:Get Fulltext
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Clustering data streams is one of active research topic in data mining. However, runtime of the existing stream clustering algorithms increases and their performance drop in the face of large number of dimensions. Complexity of the stream clustering methods is increased when perform on data with large number of dimensions. In order to reduce the clustering complexity, one possible solution consists in determining the appropriate subset of cluster dimensions via dimension projection. SED-Stream is an efficient clustering algorithm that supports high dimension data streams. The aim of this paper is to increase performance of SED-Stream in terms of both clustering quality and execution-time. In order to improve the clustering process, background or domain expert knowledge are integrated as "constraints" in SEDC-Stream. The new algorithm, SEDC-Stream, supports the evolving characteristics of the dynamic constraints which are activation, fading, outdating and prioritization. SEDC-Stream algorithm is able to reduce cluster splitting time, and place new incoming points to their suitable clusters. Compared to SED-Stream on the three real-world streams datasets, SEDC-Stream is able to generate a better clustering performance in terms of both purity and f-measure.
Item Description:https://ijain.org/index.php/IJAIN/article/view/271