Constraint-based discriminative dimension selection for high-dimensional stream clustering

Clustering data streams is one of active research topic in data mining. However, runtime of the existing stream clustering algorithms increases and their performance drop in the face of large number of dimensions. Complexity of the stream clustering methods is increased when perform on data with lar...

Full description

Saved in:
Bibliographic Details
Main Authors: Waiyamai, Kitsana (Author), Kangkachit, Thanapat (Author)
Format: EJournal Article
Published: Universitas Ahmad Dahlan, 2018-11-11.
Subjects:
Online Access:Get Fulltext
Tags: Add Tag
No Tags, Be the first to tag this record!
LEADER 02543 am a22002773u 4500
001 IJAIN_271_ijainijain_v4i3_p167-179
042 |a dc 
100 1 0 |a Waiyamai, Kitsana  |e author 
100 1 0 |e contributor 
700 1 0 |a Kangkachit, Thanapat  |e author 
245 0 0 |a Constraint-based discriminative dimension selection for high-dimensional stream clustering 
260 |b Universitas Ahmad Dahlan,   |c 2018-11-11. 
500 |a https://ijain.org/index.php/IJAIN/article/view/271 
520 |a Clustering data streams is one of active research topic in data mining. However, runtime of the existing stream clustering algorithms increases and their performance drop in the face of large number of dimensions. Complexity of the stream clustering methods is increased when perform on data with large number of dimensions. In order to reduce the clustering complexity, one possible solution consists in determining the appropriate subset of cluster dimensions via dimension projection. SED-Stream is an efficient clustering algorithm that supports high dimension data streams. The aim of this paper is to increase performance of SED-Stream in terms of both clustering quality and execution-time. In order to improve the clustering process, background or domain expert knowledge are integrated as "constraints" in SEDC-Stream. The new algorithm, SEDC-Stream, supports the evolving characteristics of the dynamic constraints which are activation, fading, outdating and prioritization. SEDC-Stream algorithm is able to reduce cluster splitting time, and place new incoming points to their suitable clusters. Compared to SED-Stream on the three real-world streams datasets, SEDC-Stream is able to generate a better clustering performance in terms of both purity and f-measure. 
540 |a Copyright (c) 2018 Kitsana Waiyamai, Thanapat Kangkachit 
540 |a https://creativecommons.org/licenses/by-sa/4.0 
546 |a eng 
690 |a Incremental stream clustering; High-dimensional data streams; Dimension selection; Projected clustering; Constraint-based clustering 
655 7 |a info:eu-repo/semantics/article  |2 local 
655 7 |a info:eu-repo/semantics/publishedVersion  |2 local 
655 7 |2 local 
786 0 |n International Journal of Advances in Intelligent Informatics; Vol 4, No 3 (2018): November 2018; 167-179 
786 0 |n 2548-3161 
786 0 |n 2442-6571 
787 0 |n https://ijain.org/index.php/IJAIN/article/view/271/ijainijain_v4i3_p167-179 
856 4 1 |u https://ijain.org/index.php/IJAIN/article/view/271/ijainijain_v4i3_p167-179  |z Get Fulltext