generate_data_instances_set.py#

Script to generate random network instances.

A network instance consist of a transition matrix, binary adjustment matrix and objective index.

The transition matrices (i.e., networks) can be generated by two methods: (1) Filling a matrix with randomly generated uniform values and normalising the matrix row-wise, (2) Generating a matrix Q with randomly generated uniform values and a matrix P generated according to a directed Erdos-Rényi graph, combining these matrices by calculating alpha*P+(1-alpha)*Q and normalising the resulting matrix row-wise.

The binary adjustment matrix is generated by filling a matrix with randomly generated 0 and 1 values. A 50/50 probability is used for generating 0 and 1 values. To avoid self-loops, the diagonal is filled with zeroes.

The objective index of the instance is randomly selected between 0 and the network size.

Per instance, a plot with two histograms that show the edge weight distribution are provided.

The parameters of the script are explained below.

Parameters:
  • START_SEED_VALUE (int) – The start seed value of the pseudo-random number generator. The first network instance is generated with seed START_SEED_VALUE, the second with seed START_SEED_VALUE+1, etc.

  • INSTANCE_SIZE_RANGE (list[int]) – The allowed network size range. The left boundary specifies the minimum network size, the right boundary the maximum network size. If you want to generate a network of a specific size, simply provide the same left and right boundary. For example, if you want to generate a network of size 100, provide [100, 100].

  • NR_INSTANCES (int) – The number of network instances that will be generated.

  • NETWORK_TYPE (str) – Specifies the method you want to use to generate the network instances. Method (1) above corresponds to “random_uniform”, and method (2) above corresponds to “networkx”.

  • ALPHA (float) – If method (2) is used to generate the network instances, the matrices Q and P are combined by ALPHA*P+(1-ALPHA)*Q.

  • PROB (float) – The probability that an edge is created in the Erdos-Rényi graph.

  • SAVE_OUTPUT (bool) – Whether the output should be saved or not.

  • DATA_DIRECTORY (str) – The folder, relative to the ROOT_DIRECTORY (automatically determined), where the resulting data will be saved if SAVE_OUTPUT=True.

  • INSTANCE_NAME (str) – The base name of the network instance(s). This name is currently used to define DATA_FILE_NAME_M_MATRIX, DATA_FILE_NAME_C_MATRIX, INSTANCE_PARAMETERS_FILE_NAME and RUN_PARAMETERS_FILE_NAME.

  • DATA_FILE_NAME_M_MATRIX (str) – The base name of the transition matrices. The generated matrices will be saved according to this name with a suffix “i” where “i” stands for the instance number starting at 0.

  • DATA_FILE_NAME_C_MATRIX (str) – The base name of the binary adjustment matrices. The generated matrices will be saved according to this name with a suffix “i” where “i” stands for the instance number starting at 0.

  • INSTANCE_PARAMETERS_FILE_NAME (str) – The file name of the table that will be saved if SAVE_OUTPUT=True that contains information on the generated network instances. The table contains the following columns: Instance_name, Random_M, Random_C, Seed, Problem_size, Objective_index. The column Instance_name contains the name of each network. The columns Random_M and Random_C specify whether the transition matrices and binary adjustment matrices were generated randomly or not, respectively. The column Seed specifies the seed that was used for generating the instance. The column Problem_size specifies the network size. The column Objective_index specifies the randomly selected network node that will be optimised.

  • RUN_PARAMETERS_FILE_NAME (str) – The name of the text file with the script parameters if SAVE_OUTPUT=True.