The Format of the Input

The input files of graph clustering are created in two different formats: (1) Plain text (2) XML format.

(1) Plain Text Format

The plain text input format is arranged in the following way, each columns is separated by a tab:

node_1_name node_1_set node_2_name node_2_set edge_weight

For example:
   node_1   0   node_2   1   3.5
   node_2   1   node_3   0   8.5

(2) XML Format (Recommended)

The plain text is straightforward, however, contains large amount of redundancy for large graphs. For example, a node incident to 1,000 edges required 1,000 lines to store all information. The XML format allows using matrices to store the edge weights, which largely reduces the size of the input file.
Here is an example of an XML formatted input file:

 <entity levels="3">
  node_00 node_01 node_12
  node_10 node_11 node_12
  node_20 node_21
 <matrix matrixLevel="0 1" >
  5.4 5.1 2.3
  3.1 3.2 4.5
  6.0 7.1 8.2
 <matrix matrixLevel="1 2" >
  3.1 3.0
  4.5 4.2
  5.1 3.0
 <matrix matrixLevel="0 2" >
  0.5 3.5
  7.2 4.5
  6.6 6.7

  • • The root element of the input XML file is "<document>"
  • • The "<entity>" tag contains the names of the nodes.
    The attributes "levels = '3'" indicates how many node sets are there in the graph.
  • • The names of the nodes belonging to the same node set must be placed in the same line, separated by tab.
  • • The number of lines is equal to the number of node sets. (See Fig. 1)
  • • The "matrix" tag encompasses the edge weights for the edges between two node sets given by the attributes within the tag.
    E.g. <matrix matrixLevel="0 1" > indicates this tag contains edge weights between node set 0 and node set 1. See Fig.2 for more details.
  • • The edge weight matrix is arranged in a way that the rows stand for the nodes in the "former" node set and the
    columns represent the nodes in the "latter" node set.

Fig. 1 The number of the lines in the tag "entity" is equal to the number of the node sets.


Fig. 2 The format of the matrix within the "matrix" tag.