What are genetic algorithms? These algorithms are optimization algorithms that draw inspiration from biology. Let’s take an example to try to understand the same.
This will be dangerously close to a plot from the X-Men Comics.
Imagine there is a race between the mutants and human clans to be at the top of the evolutionary pyramid.
A mutant-only meeting is held at a coffee shop unknown location where they plan their strategy. The Strongest of them all comes up with a plan to takeover.
Insert Hypothetical Diabolical Plot Here:
- Consider, the world Population and a function to classify all them as mutants and humans.
- The selected Mutants then modify human genes to increase the mutant population.
- This process continues in an iterative manner till the goal is met.
Genetic Marker | Difficulty | Mutant Genes |
X | 10 | 10 |
Y | 5 | 6 |
Z | 8 | 4 |
Look at the table to the left as an optimization problem where we have some constraints and factors that we are trying to maximize. Given the difficulty of bringing about a change in the genetic marker, the objective here is to maximize the Mutant Genes.
Genetic Algorithms are an optimization technique. They iteratively work to look for the best solution. To put this paradigm in a more formal sense, Genetic Algorithms work as follows:
For Subject A1:
Initialization:
Genetic Marker | Flag |
---|---|
X | 10 |
Y | 5 |
Z | 8 |
Initialize the population. To continue with the above mentioned example, we can define a vector with binary elements representative of genetic markers in the DNA.
The idea here is simple and explained using an example over here.
Fitness Function:
Genetic Marker | Difficulty | Mutant Genes |
---|---|---|
X | 10 | 10 |
Z | 8 | 4 |
Total | 18 | 14 |
We are calculating the objective function (Mutant genes) and trying to identify the best data points that provide the optimum objective value. Similarly consider subject A2 with a Total of 10 Mutant genes obtained at a difficulty level 13. Hence, A1 is more mutant than A2.
Selection:
Based on the data we are selecting data points out of our population and identifying which genetic markers need to be targeted. There are some methods such as the Stochastic Universal Selection method or tournament method, that can be used here. Basically, the idea is to score the fitness function or Total Mutant genes for each member of the mutant clan. Further identify the best Mutant and his respective genes.
Crossover:
Subject | X1 | X2 | Z1 | Z | Y | Y1 |
---|---|---|---|---|---|---|
A1 | 1 | 1 | 1 | 0 | 1 | 0 |
A2 | 1 | 0 | 0 | 1 | 0 | 1 |
In one word reproduction. Consider, that we have genetic markers X1, X2, Z1, Y in mutant A1 and X1,Z,Y1 in mutant A2 who have been identified as the ideal combinations of genetic markers to maximize the mutant genes.
An example of the same is available in the table above. Consider A1 and A2 as the primary subjects selected for crossover and the resultant H1 with the following genetic markers.
Subject | X1 | X2 | Z1 | Z | Y | Y1 |
---|---|---|---|---|---|---|
H1 | 1 | 1 | 1 | 1 | 0 | 1 |
This method of crossover is One point Crossover where we simply switch the tail end (We are splitting the Columns X1,X2,Z1 into one group called the Head, and Z,Y,Y1 as the Tail. Here we have defined the splitting point that is the crossover point as 3) of the two Parents (A1,A2) to get the off spring (H1). In crossover, we are exploring different combinations and essentially at this step, we are hypothesizing that the off-spring solutions (formed from the Best Parents) might also be optimal. There are several other methods for crossover that can be used here.
Mutation:
The idea here is to prevent premature convergence in the solution space and ensuring diversity. In the previous step, we are generating off-springs that is a new set of solutions in order to evaluate if they are an optimal point (maximize Mutant genes).
Mutation helps in ensuring that there are a diverse set of solutions that is the off-springs are not too similar. Some of the ways this can be accomplished are mentioned here.
Termination:
After the Mutation step, we calculate the Fitness function for the new set of solutions and repeat the process of selection, crossover and mutation. This iterative process is continued till the population starts to converge that is there are no new off-springs being created that differ from the parent population.
We can also have a predefined number of iterations (say 100) that we want the algorithm to run. In such a case solution will be available either at convergence or 100th iteration whichever is sooner.
This will give the optimal solution that is the combination of genetic markers with minimum difficulty and maximum Mutant genes.
What are Genetic Algorithms (GAs)?
These algorithms are an optimization technique that is based on natural selection and can be used for solving both unconstrained and constrained optimization.
We know in the gradient approach there is a possibility of ending with a solution that is the local optima for certain initialization. Genetic Algorithms are better in facilitating the search for the global optima.
GAs can also be used to solve Multi-Objective Optimization problems where multiple pareto optimal solutions exist. We will discuss the utility of GAs in such situations using Non Dominated Sorting Genetic Algorithm (NSGA) in the next post.
About Us
Data science discovery is a step on the path of your data science journey. Please follow us on LinkedIn to stay updated.
About the writers:
- Ankit Gadi: Driven by a knack and passion for data science coupled with a strong foundation in Operations Research and Statistics has helped me embark on my data science journey.