Index

Introduction

n-CluE is a high-throughput bioinformatics tool to integrate heterogeneous data from different sources. n-CluE is based on n-Cluster Editing, where the input n-partite graph is converted into a transitive n-partite graph through edge insertions and deletions with minimal costs for edge modifications (depending on edge weights and a user-given threshold).

Bi-clustering and Bi-cluster Editing

Bi-clustering was firsted introduced by Hartigan et al., named "block clustering" (aka. co-clustering). Cheng and Church presented amongst the first bioinformatics bi-clustering methods and applied it to analyzing microarray data sets. Genes (rows) and the conditions (columns) are clustered simultaneously, while traditional microarray cluster analysis either partitions the genes based on the complete expression profiles, or partitions the conditions based on all genes in the experiment. Fig. 1 illustrates this distinction. clust-biclust

Fig. 1. The comparison between clustering and biclustering.

Since Cheng and Church, many different bi-clustering algorithms have been developed and applied to a wide spectrum of biological problems. Since microarray data is usually represented by matrix, one common strategy to solve the biclustering problem on matrix is by transforming the matrix bi-clustering problem to a graph clustering problem on bipartite graphs. Elements of each row and column are modelled as nodes and the matrix values as pairwise similarities (see Fig. 2). matrix-bipartite

Fig. 2. Conversion from an expression matrix to a weighted bipartite graph.

Now, given such a bi-partite graph, bi-cluster editing is transforming the graph into a "transitive" one, i.e. a graph where each disjoint component is a "bi-clique" (a bi-clique is a complete bi-partite graph). This is done by deleting and adding edges to the graph. Each such operation comes with editing costs, which are defined as the absolute value of the difference between the threshold and the edge weight. Weighted Bi-cluster editing now aims to find a transitive solution with minimal edge modification costs (see Fig. 3).

biclust-editing

Fig. 3 illustrates the Bi-Cluster Editing principle. If all editing costs were set to 1 (unweighted cluster editing), we'd select the lower-right solution.

n-Cluster Editing

Biclustering analysis can be regarded as an analysis integrating data coming from two different sources. To address more complex problems, such as drug repositioning where three databases have to be included and connected by trinary links, we extended the bi-cluster editing model to n-partite graphs.

The model of n-Cluster Editing now aims to convert an input n-partite graph into a graph only consisting of disjoint n-cliques with minimal editing costs. Fig. 4 shows an example of n-Cluster Editing. n-Cluster Editing

Fig. 4 illustrates the N-Cluster Editing principle for drug-repositioning integrating genes, diseases and drugs. If all editing costs were set to 1 (unweighted n-cluster editing), we'd select the lower-left solution.