AbstractIn recent years, data clustering using the metaheuristic method has become popular in the field of data mining. All of these methods suffer from an optimization problem addressed in this article. The problem occurs when the cluster centroids from an individual in the population (particle in this paper) do not play the role of the cluster center. Let's use the law of gravity to solve this problem. After each particle clusters the data, the centroids move towards the center of mass of the data into a desire cluster by the process of the law of gravity. In the law of gravity, each data in a cluster is processed by forcing the centroid of the cluster and dragging it towards the center of mass of the cluster. The particles are evaluated after this improvement by a selected internal clustering validation index (CVI). We examine some CVIs and find that Xu, Du, and WB are the most accurate CVIs. The proposed method compared to some clustering methods includes Particle Swarm clustering methods and familiar clustering methods via Jacard index. The result shows that our method works more accurately. Say no to plagiarism. Get a tailor-made essay on "Why Violent Video Games Shouldn't Be Banned"? Get an Original Essay Introduction The purpose of clustering is to group the same samples into one cluster and different samples into different clusters. Various methods have been proposed for data clustering. These methods are divided into several branches. Partitioning, hierarchical, density-based and network-based approaches can be used as main clustering methods. Partitioning methods are highly regarded and the most popular method is clustering is the K-means method (Jain, 2010). The K-means method has some disadvantages, the most basic of which is: it is not possible to use every desired objective function, there is the possibility of obtaining the local optimum, and the number of clusters must be specified from the beginning. The K-Means objective function only takes into account the distance within the cluster, but does not care about the distance between clusters. On the other hand, numerous cluster validity indices (CVI) have been introduced which take into account both the inter-cluster distance and the inter-cluster distance. Then we can use these CVIs as the objective function of a clustering method for the first problem mentioned above. For the second problem we can use the general optimizer which rarely gets stuck in the local optimum. If an optimizer can choose the best number of clusters based on the objective function, the last problem is solved. Metaheuristic methods such as Particle Swarm Optimization (PSO) (van der Merwe & Engelbrecht, 2003) and its variants (Cura, 2012; Valente de Oliveira, Szabo, & de Castro, 2017), Genetic Algorithm (GA) (Maulik & Bandyopadhyay, 2000) and its variants, Bee Colony Optimization (ACO) (Ozturk, Hancer, & Karaboga, 2015; Yan, Zhu, Zou, & Wang, 2012) and Gravity Search Algorithm (GSA) (Dowlatshahi & Nezamabadi-pour, 2014) have been proposed for these problems. All of these methods suffer from another problem that we have addressed in this article. The problem occurs when the cluster centroids come from an individual in the population and do not serve the role of cluster centers. For example in fig1 you can see 3 clusters and 2 types of centroids (squares and circles) which are extracted from 2 different particles in PSO. If we use these particles, the result of grouping two particles is exactly similar, but the fineness of two particles is different. It means that the particles that do a particular clustering can have a different physical shape and this implies some problems ofoptimization. This problem affects the diversity of the population also on the exploration and exploitation of the optimizer.1-2 PscThe basic form of the PSO algorithm was introduced in (Kennedy & Eberhart, 1995) and subsequently modified in (Shi & Eberhart, 1998 ). In the algorithm, a swarm of particles S flies stochastically through an N-dimensional search space, where the position of each particle represents a potential solution to an optimization problem. Each particle p with current position xp and current velocity vp remembers its personal best solution so far, bp. The swarm resembles the best solution achieved so far globally, bS. The particles experience attraction towards the best solutions, and after some time, the swarm typically converges towards an optimum. Due to its stochastic nature, PSO can avoid some local optima. However, for the basic form of the PSO algorithm, premature convergence to a local optimum is a common problem. Therefore, several modifications or extensions of the basic form have been introduced (Poli, Kennedy, & Blackwell, 2007), such as Perturbed PSO (Xinchao, 2010), Orthogonal Learning PSO (Zhan, Zhang, Li, & Shi, 2011), or different topologies local neighbourhood, for example the Fully Informed PSO (Mendes, Kennedy and Neves, 2004). In clustering, as in other PSO applications, the position of each particle should represent a potential solution to the problem. Most often this is accomplished by encoding the position of particle p as xp = {mp,1, …, mp,j, …, mp,K}, where mp,j represents the jth (potential) cluster centroid in an N-dimensional data space and K is the number of clusters. Each element of the particle's K-dimensional position, xp, is now an N-dimensional position in the data space. Furthermore, several particle codings have been proposed such as partition-based coding (Jarboui, Cheikh, Siarry, & Rebai, 2007), where each particle is a vector of n integers, n is the number of data elements to be grouped, and l The i-th element represents the cluster label assigned to element i, i ∈ {1, …, n}. The main limitation of the proposed method was the need to manually define the number of clusters, K, a priori. Another clustering technique proposed in (Omran, Salman, & Engelbrecht, 2006) overcame this limitation by using binary PSO to select which of the potential particle centroids should be included in the final solution, but in this technique the K-means algorithm to refine centroid locations. The particle encoding used for PSO clustering was proposed in (Das, Abraham, & Konar, 2008). Given a user-defined maximum number of clusters, Kmax, the position of particle p is encoded as a vector Kmax + Kmax * N xp ={Tp,1, …, Tp,kmax, mp,1, …, mp,j , … , mp,Kmax } , where Tp,j, j∈ {1, …, Kmax } is an activation threshold in the interval [0, 1] and mp,j represents the jth (potential) centroid of the cluster. If Tp,j > 0.5, the corresponding jth centroid is included in the solution. Otherwise, the cluster defined by the jth centroid is inactive. The minimum number of clusters is defined to be two. If there are fewer than two active clusters in a solution, one or two randomly selected activation thresholds, the Tp,j1-3 gravitational clusteringA method is introduced in a paper (Bahrololoum, Nezamabadi-Pour, & Saryazdi, 2015) that uses Newton's universal law of gravitation for clustering. Each data point, Xi = (Xi,1, …, Xi,n) is assumed to be located in an N-dimensional space, where N is the number of features. The clusters are compact and a representative point (centroid) is used to present each cluster. The main idea of the proposed algorithm is to consider an object with gravity.
tags