In the event of decreased oxygen or oxidative and nitrosative stress, bacteria express three distinct structures of hemoglobin proteins which are flavohemoglobins (FlavoHb), truncated hemoglobins (TruncHb) and single domain hemoglobin proteins (SingHb). These proteins were expressed in different heterologous hosts and have been shown to enhance growth and productivity, making them attractive to scientific researchers. At present, only a small number of bacterial hemoglobin-Like proteins have been experimentally annotated. Therefore, it is beneficial to develop a data augmentation method capable of generating high quality of new synthetic sequences. Hence, we propose in this study a model that combines Wasserstein Generative Adversarial Network (WGAN) to generate novel bacteria hemoglobins sequences and Support Vector Machine (SVM) method to predict and classify these proteins. The performance measure comparison of the proposed model with the existing method by the fivefold cross-validation technique has demonstrated the efficiency and the effectiveness of our model. The experiment results were obtained with the evaluation metrics scores of Accuracy (Acc), Precision, Recall, F1_score, Cohen’s Kappa (Kappa) and Matthews Correlation Coefficient (Mcc). Further, we have also plotted the learning and Receiver Operating Characteristic (Roc) curves. All experimental results indicate that the proposed model outperforms the existing method.
Bacterial Hemoglobin-Like proteins, FlavoHb, TruncHb, SingHb, WGAN, SVM, Prediction, Classification