Wasserstein Generative Adversarial Networks for Bacterial Hemoglobin-Like Proteins Prediction


Authors

Soumiya Hamena, Constantine2 University, Algeria

Abstract

In the event of decreased oxygen or oxidative and nitrosative stress, bacteria express three distinct structures of hemoglobin proteins which are flavohemoglobins (FlavoHb), truncated hemoglobins (TruncHb) and single domain hemoglobin proteins (SingHb). These proteins were expressed in different heterologous hosts and have been shown to enhance growth and productivity, making them attractive to scientific researchers. At present, only a small number of bacterial hemoglobin-Like proteins have been experimentally annotated. Therefore, it is beneficial to develop a data augmentation method capable of generating high quality of new synthetic sequences. Hence, we propose in this study a model that combines Wasserstein Generative Adversarial Network (WGAN) to generate novel bacteria hemoglobins sequences and Support Vector Machine (SVM) method to predict and classify these proteins. The performance measure comparison of the proposed model with the existing method by the fivefold cross-validation technique has demonstrated the efficiency and the effectiveness of our model. The experiment results were obtained with the evaluation metrics scores of Accuracy (Acc), Precision, Recall, F1_score, Cohen’s Kappa (Kappa) and Matthews Correlation Coefficient (Mcc). Further, we have also plotted the learning and Receiver Operating Characteristic (Roc) curves. All experimental results indicate that the proposed model outperforms the existing method.

Keywords

Bacterial Hemoglobin-Like proteins, FlavoHb, TruncHb, SingHb, WGAN, SVM, Prediction, Classification