# Algorithm

# Introduction

The main methodological contribution is an algorithm that heuristically solves the weighted n-cluster editing problem. The algorithms is motivated by the well-known physics-inspired graph layout algorithm of Fruchterman and Reingold. This algorithms seeks to arrange all nodes of a graph in a two-dimensional plane such that "similar" nodes are located more close to each other than to others. n-CluE, afterwards, assignes the nodes from each "dense" part of the graph layout to one n-cluster by single-linkage clustering (SLC) or k-means clustering based on the Euclidean distances. This can be done iteratively for different parameters and the solution with minimal n-cluster editing costs is picked.

The algorithm is carried out in a three-step procedure:

- Layout generation
- n-cluster partition
- Post-processing

# I. Layout generation

In this stage, the coordinates of nodes are generated and re-arranged in a way that the nodes with higher similarities are located next to each other, and far away from those that are dissimilar. The layout is like a "self-arranging" network: the similar nodes attract each other and the dissimilar nodes repulse each other.n-CluE starts the initial layout by locating all points "almost" evenly on a 3D sphere (or a 2D circle, if preferred by the user). The attracting/repulsing forces are computed by the following formula: The final layout is generated in a iterative manner. In each iteration, the "movement" of each nodes is the joint force of the effects coming from all other nodes. See Fig. 1 and 2 for an illustration.

# II. Partition phase

In this phase, the nodes are partitioned into different clusters by Single Linkage Clustering or K-means Clustering (user choice). This clustering is repeated for different parameters and the clustering that minimizes the cluster editing costs is selected.

# III. Post-processing

In post-processing, the n-CluE tries to further reduce the editing costs in two steps.

- N-cluster merging
- Random singleton node moving