Dataset
We have constructed the dataset from ProCaff database by using the criteria where binding affinity is available for wild-type and mutant, and calcuted ΔΔG(mut-wt) in kcal/mol. It contains totally 318 mutations from 156 protein-carbohydrate complexes in the dataset.Further, we have doubled the dataset using reverse mutations [1] and obtained 636 mutations. The dataset is used for prediction of protien-carbohydrate complexes binding free energy change using the sequence and structure-based features. The complete dataset contains details on protein name, UniProt ID, Sugar Name, Classification of protein and carbohydrate, Mutation, binding free energy change (ΔΔG) and Reference(PubMed ID).
References:
- Sanavia, T., Birolo, G., Montanucci, L., Turina, P., Capriotti, E., & Fariselli, P. (2020). Limitations and challenges in protein stability prediction upon genome variations: towards future applications in precision medicine. Comput Struct Biotechnol J., 18, 1968-1979.