Using generative modelling to perform diversifying data augmentation

Deep learning algorithms have become more prevalent in realworld applications. With these developments, bias is observed in the predictions made by these algorithms. One of the reasons for this is the algorithm’s capture of bias in the data set being used.

This research investigates the influence of using generative adversarial networks (GANs) as a gender-to-gender data pre-processing step on the bias and accuracy measured for a VGG-16 gender classification model. A cyclic generative adversarial network (CycleGAN) is trained on the Adience data set to perform the gender-to-gender data augmentation. This architecture allows for an unpaired domain mapping and results in two generators that double the training images generating a male for every female and vice versa.

The VGG-16 gender classification model uses training data to produce an accuracy that indicates its performance. In addition, the model’s fairness is calculated using demographic parity and equalized odds to indicate its bias. The evaluation of the results provided by the proposed methodology in this research shows that the accuracy decreases when CycleGAN pre-processing is applied. In addition, the bias also decreases, especially when measured on an imbalanced data set. However,

The decrease in bias needs to be more significant to change our evaluation of the model from unfair to fair, showing the proposed methodology to be effective but insufficient to remove bias from the data set.