Random-Key Genetic Algorithm (RKGA) is a new class of John Holland's Genetic Algorithms (GA). This new class described by Bean[1] works somewhat similarly to GA's, applying the Darwinian principle of survival of the fittest, with a few differences related to the encoding/decoding and manipulation of the generations, representing the possible solutions of an optimization problem as permutation vectors composed of random keys.
A RKGA is an evolutionary metaheuristic for discrete and global optimization. Each solution is encoded as an array og n random keys where a random key is a real number, randomly generated, in the continuous interval [0,1).
A decoder maps each array of random keys to a solution of the optimization problem being solved and computes its cost.
The algorithm starts with a population of p arrays of random keys. At each iteration, the arrays are partitioned into two sets: a smaller set of high-valued elite solutions and the remaining non-elite solutions forming another set.
All elite elements are copied, without change, to the next population. A small number of random-key arrays (the mutants) are added to the population of the next iteration. The remaining elements of the population of the next iteration are generated by combining, with parametrized uniform crossover of Spears and DeJong (On the virtues of parameterized uniform crossover. In: Proceedings of the fourth international conference on genetic algorithms, San Mateo, pp 230–236, 1991), pairs of arrays.
The algorithm starts by generating a population of p vectors of n random keys, which is referred to as the 1st generation.
In the k-th generation, the p vectors of the population are partitioned into two sets: a small set of p_e < p/2 vectors corresponding to the best solutions (called elite set) and a bigger set with the remainder of the population (called non-elite set).
All elite vectors of the current k-th generation are copied, unchanged, to the population of the k+1-th (next) generation. Next, p_m vectors of random keys are introduced into the population of the k+1-th generation. These vectors, called mutants, play the same role as the mutation operators of classical genetic algorithms
.
To complete the p elements of the population of the k+1-th generation, p - p_e - p_m vectors are generated, combining pairs of solutions from the population of the k-th (current) generation, with a parametrized uniform crossover
. This process is described bellow:
Let a and b be the selected vectors for mating and let c be the offspring produced. In the crossover of Spears and DeJong [58], c[i], the i-th component of the offspring vector, receives the i-th key of one of its parents. It receives the key a[i] with probability p_a and b[i] with probability p_b = 1 - p_a.
A decoder is a deterministic algorithm that takes as input a random-key vector and returns a feasible solution of the optimization problem and its cost. The decoder proposed by Bean (2014) simply orders the elements of the vector of random keys, thus producing a permutation corresponding to the indices of the sorted elements.
Thus, a random-key GA searches the solution space indirectly by searching the space of random keys and using the decoder to evaluate the fitness of the vector.
- Size of population: a function of N, say N or 2N
- Size of elite partition: 15-25% of population
- Size of mutant set: 5-15% of population
- Child inheritance (from elite) probability: > 0.5, say 0.7
A usual way to collect data in a Wireless Sensor Network (WSN) is by the support of a special agent, called data mule, that moves among sensors nodes and performs all communication between them. A data mule is a mobile node that moves inside the field (network) and collects data from sensors within the network. The Data Mule Scheduling Problem (DMSP) consists in planning the route of the data mule, the order of attendance of the sensors, and also plan the velocity it will use to move. The idea is that each sensor in the network generates some amount of information per unit of time and it has a limited amount of memory to retain this information generated, and when the amount of information generated exceeds the memory capacity, information is lost. The data mule work is to prevent information loss by visiting and retreating the sensors information to free its memory, preventing the information loss.
We cam decompose the problem into the following three subproblems:
- Path selection: which trajectory the data mule follows
- Speed control: how the data mule changes the speed during the travel
- Job scheduling: from which sensor the data mule collects data at each time point
Path selection is to determine the trajectory of the data mule in the network. To collect data from each particular sensor, the data mule needs to go within the sensor's communication range at least once. Depending on the mobility capacity of the mule, there can be some constraints on the path, such as minimum turning radius.
Once the path is chosen, 2-D/3-D data mule scheduling problems are reduced to 1-D data mule scheduling problem, in which the communication ranges of each node are given as intervals on the location axis. Spped control is to determine how the data mule changes its speed along the chosen path. The data mule needs to change the speed so that it stays within each node's communication range long enough to collect all the data from it. The objective in this problem is to find an optimal time-speed profile while satisfying that constraint.
Once we have a time-speed profile, we have a mapping from each location to time point. Thus we get a scheduling problem by regarding data collection from each sensor as a job. Each job has one or more intervals in which it can be executed. Job scheduling is to determine the allocation of time slots to jobs so that all jobs can be completed.
The problem instances used in this work are representations of networks with different configurations, with the (original) node count varying from X to Y. Bellow we show a graphical representation of such an instance, followed by its description in a textual format.
The textual description of the network contains all information regarding the original nodes within the network (sensors), such as position, demand and communication range, as well meta-data regarding the artificial nodes and edges created after pre-processing and the network as a whole.
6 41 2 0.001 100.0 //1st
150.000 150.000 0.000 7.000 0.000 //2nd
188.000 170.000 38.000 1.000 18.000
294.000 77.000 30.000 3.000 4.000
244.000 188.000 45.000 7.000 5.000
148.000 166.000 5.000 5.000 14.000
86.000 71.000 48.000 10.000 5.000
0 0 0 0 //3rd
0 1 42.942 2
1 150.00 150.00 154.37 152.30 4.94 0 //4th
2 154.37 152.30 188.00 170.00 38.00 1 1
...
The whole file can be divided into two main parts, based on the first blank line: before it there are only information regarding the the entire network and its original nodes; after it we have all the data associated with each and every pair of original nodes and the artificial data introduced after pre-processing. Following is a detailed listening of the values in each of the tagged lines.
- 1st - Original Nodes Count, Total Nodes Count, Number of Allowed Velocities, List of Allowed Velocities.
- 2nd - Sensor X Coordinate, Sensor Y Coordinate, Communication Range, Transmission Rate, Demand.
- 3rd - Sensor A, Sensor B, Distance Between Sensors, Artificial Edges Count (between A and B).
- 4th - Artificial Edge a, "Start" X Coordinate, "Start" Y Coordinate, "End" X Coordinate, "End" Y Coordinate, Edge Length, Sensors that Can be Served While in a, Sensor that Can be Served IDs List.
The solution is represented as an array of length at most n + 1 (number of sensors), which is the case where the mule passes by all of the sensors before returning to the base station, and at least 3, when visiting only one sensor is enough to serve all the others. The solution represents the sequence of sensors visited by the mule.
[0, 2, 0, 1, 3, 4] // Sensors 1, 3 and 4 are not used
[0, 1, 2, 0, 3, 4] // Sensors 3 and 4 are not used
[0, 1, 2, 3, 4, 0] // All sensors are used
The attendance service ends with the mule returning to the base station, where it departed from. Thus, sensors appearing after the base station ID (always 0) in the solution are the ones that were not used in the solution.
Compilation
g++ brkga.cpp -o <out_binary>
Execution
<out_binary> #execs #max_iter #pop_size #mating_type #mating_ls #lso #sensors #instance
- J. C. Bean. Genetic algorithms and random keys for sequencing and optimization. ORSA J. on Computing, 6:154–160, 1994
- http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.567.6832&rep=rep1&type=pdf
- http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0101-74382017000200387
- https://pdfs.semanticscholar.org/f339/7ec243dce5fd5869cf523a3ee1a77c54957a.pdf
- https://link.springer.com/referenceworkentry/10.1007%2F978-3-319-07153-4_30-1