Data and scripts for Bayesian prediction of microbial oxygen requirement of selected bacteria from the NCBI genome database

dataset

posted on 2013-09-16, 10:05 authored by Dan B Jensen, David W Ussery

Additional file 1: Text-format (.txt). One-step prediction results. The classification predictions for all included genomes when using the one-step method and N-fold cross-validation.

Additional file 2: Text-format (.txt). Two-step prediction results. The classification predictions for all included genomes when using the two-step method and N-fold cross-validation.

Additional file 3: Text-format (.txt). Domains distinguishing anaerobes from respiring bacteria. The Pfam-A domains found significantly more frequently in bacteria which are capable of respiration (aerobes/facultative anaerobes) than in anaerobes, and vice versa.

Additional file 4 Text-format (.txt). Domains distinguishing aerobes and facultative anaerobes. The Pfam-A domains found significantly more frequently in anaerobe than facultative bacteria, and vice versa.

Additional file 5: Text-format (.txt). Predictions of aerobes and anaerobes only. The predictions of all of the included genomes which are either aerobe or anaerobe, excluding the facultative anaerobes. Made only to allow for direct comparison to results in the literature.

Additional file 6. Text-format (.txt). Protein domain presence/absence matrix. The presence/absence profiles with respect to Pfam-A domains of all included genomes.

Additional file 7: Python-format (.py). Get likelihoods from Pfam-domains. A python script used to identify the Pfam-A domains over-represented in one class compared to the others based on the training set, and on that basis construct the likelihood files used for predictions.

Additional file 8: Python-format (.py). Predictor. A python script used to predict the oxygen requirements classification of genomes in the test set, based on protein domain profile and the likelihood files created by Additional file 7.

Additional file 9: Python-format (.py). Predictive evaluations. A python script used to evaluate the predictors performance by calculating a Matthew's Correlation Coefficient for each of the classifications in the predictions made by Additional file 8.