Skip to main content

Table 2 The general flow of feature selection by the genetic algorithm

From: An evaluation of time series summary statistics as features for clinical prediction tasks

Divide data into k=5 folds

for k=1 to 5

Assign

A = test data (1 fold reserved for random forest)

B = train data (3 folds train for random forest)

C = validation data (1 fold validation for random forest)

Repeat for train and validation data

step 1: Encode features as binary chromosomes

step 2: Generate a population of 20 chromosomes randomly

step 3: Evaluate AUROC of random forest algorithm for step 2

step 4: Determine if termination conditions are met

if yes:

Terminate

else:

step 5.1: Apply Single point crossover with probability

of 0.6

step 5.2: Apply uniform mutation with probability of 0.1

step 5.3: Calculate AUROC of new chromosomes by

random forest and compare it with step 3

step 5.4: Select best chromosomes with highest fitness

step 5.5: Replace chromosomes with lowest fitness,

back to step 4

Train random forest with data (B+C) based on statistics obtained by

the genetic algorithm

Test random forest with data (A)

Calculate AUROC for fold k

End for

Calculate average AUROC for 5 folds