Interactive Clustering: A Comprehensive Review
While analyzing data, a widely used task is to find groups of dataset objects that share similar characteristics. In doing so, users gain insight into their data, understand it, and even reduce its high-dimensionality nature. These conceptual groups are commonly referred to as clusters. Automatic clustering is a technique that discovers “natural” structures hidden in the data in an unsupervised way. It consists of automatically grouping a set of unlabeled data samples into clusters so that samples in the same cluster are more similar to each other than to samples assigned to other clusters. Although a cluster is inherently a subjective structure, without a precise and formal definition, a large number of clustering methods have been developed, each with its particular weaknesses and strengths.
Traditionally, clustering methods and tools have been designed offline and then deployed in a variety of application domains. However, because such tools lack domain-specific and user-specific input, they are not always as relevant or convenient to the end-user as they could be. There are several reasons for this. First, unlike classification tasks that are evaluated using well-defined target labels, clustering is, as mentioned earlier, an intrinsically subjective task as it depends on the interpretation, need, and interest of users. Real-world data may contain different plausible groupings, and a fully unsupervised clustering has no way to establish a grouping that suits the user’s needs, because this requires external domain knowledge. Second, quality of a clustering outcome is heavily dependent on extracting appropriate features as well as specifying appropriate similarity measures. In addition, several parameters are typically required—for example, the number of intended clusters or the minimum cluster size. Given these requirements, a real-world clustering task can be too complex to be solved fully automatically. Fortunately, a small amount of user input can often significantly help to achieve a better clustering quality. Third, a large part of “understanding data” is to understand the clustering process by which these conceptual clusters are formed. For this purpose, end-users are usually motivated and willing to interact with both the system and the data in a way that let them gain knowledge from the clustering task.
Manuscripts elucidating research surveys, technical reports, overviews, latest innovations and advancements in applied computer science and allied sciences are solicited.
Subjects covered include:
Robotics and application
Neural Networks and Biomedical Simulations
Microprocessors and microcontrollers
Assembly language programming
Computational biology and bioinformatics
Computer algorithm design and analysis
Data Base Management & Information Retrievals
Systems & Computer Architecture
Geographical Information Systems/ Global Navigation Satellite Systems (GIS/GNSS)
Soft Computing (AI, Neural Networks, Fuzzy Systems, etc.)
Web and internet computing
Send your manuscripts as an e-mail attachment to the Editorial Office at email@example.com or submit your manuscripts online at: http://www.imedpub.com/submissions/american-computer-science-engineering-survey.html
American Journal of Computer Science and Engineering Survey