Genetic algorithm-based descriptor selection
Genetic algorithm-based descriptor selection method comprises of four phases: initialization, evaluation, exploitation and exploration.
The initialization phase involves constructing an initial population of randomly selected descriptor subsets.
During the evaluation phase, each descriptor subset is evaluated by calculating its fitness score, which indicates the relevance of a descriptor subset to the biological property.
In the exploitation phase, the descriptor subsets were first ranked by their fitness value. The higher ranked descriptor subsets were given a higher probability of being chosen for reproduction. The top x selected descriptor subsets were then used to replace the x lowest ranking descriptor subsets in the population. These x new descriptor subsets, together with the y highest ranked descriptor subsets in the current generation, form a new generation of descriptor subsets.
In the last phase, which is the exploration phase, the x new descriptor subsets were subjected to one point crossover and mutation to increase the diversity of the population. In the mutation process, descriptors might be randomly added to or deleted from a descriptor subset. After the exploration phase, the genetic algorithm returns to the evaluation phase and the cycle repeats until at least n generations have passed and the highest ranked descriptor subset remains the same for s generations. The highest ranked descriptor subset was used to construct the final QSAR/qSAR model.
Note: x, y, n, s are defined by the user.
Share This