There are 4 databases integrated into the CPAD 2.0. The databases can be accessed from the home page by clicking the respective button. If you want to contribute to the current dataset. Please download the template CSV files from the home page. We have presented 4 CSV files, each corresponds to the respective dataset. The header columns are different for each dataset. Please fill the data in the respective CSV and upload it using the UPLOAD button (below the template file option). Users can also download the details protein wise from any of the datasets in CSV or JSON format.
For the bulk download, Please contact the authors.
This dataset contains information about experimentally known amyloid and non-amyloid peptides/proteins. Users can filter the amyloid and non-amyloid from the "Select a class" option. We have also provided the "Field to filter" and "query-based search" option. Users can search query based on protein sequence, protein name, peptide length and PMID. The peptide length only takes numerical values. "*" can be used for the unknown residue(s) (such as: "V*Y*", "**YT" etc .) The details of the columns are given below:
We have also calculated some additional properties for the peptide sequences. Each row in the result table is clickable and provides additional information about the peptide. The result page looks like the image below:
The residues in the green background are the aggregating/non-aggregating peptide sequences marked in the source protein. Red-colored residues in the flanks on both directions (3 residues upwards and downwards) are the gatekeeper residues. The rest of the charged and proline residues are marked in the sequence in orange color.
The derived information in this page is explained below:
This dataset contains information about experimentally known aggregation-prone regions (APRs) in amyloidogenic proteins. Users have an option to choose the specific protein from the list of proteins and view the APR(s) regions. We have also provided a query-based search option. "*" can be used for the unknown residue(s) (such as: "V*Y*", "**YT" etc.). The APRs information is provided in 3 different segments: (1) Protein information (2) Peptide (APR) information (3) Literature information The detailed information about each property is given below:
Each row in the result table is clickable and provides additional information and visualization about the peptide. The result page looks like the image below:
The residues in the green background are the amyloidogenic peptide sequences marked in the source protein. Red-colored residues in the flanks on both directions (3 residues upwards and downwards) are the gatekeeper residues. The rest of the charged and proline residues are marked in the sequence in orange color.
This dataset contains information about experimental aggregation kinetics reported in the literature. The database is divided into 3 categories: (1) Aggregation rates (kagg) contains precalculated aggregation rates. We have also calculated the change in aggregation rate with respective wild type protein (experiments are performed under identical conditions). (2) Intensity: Time-dependent aggregation experiments Aggregation intensity from the kinetics experiments (time-dependent) (3) Intensity: Other aggregation experiments Aggregation intensity from other experiments (pH-dependent, mutation-dependent, temperature-dependent, etc. The intensities are calculated at a particular time point; either at equilibrium or max intensity)
Please note that some time kinetics contains data with less/negligible intensity values from the experiments. This information is also added in the dataset for the comparison purpose and providing users the complete literature detail. These details might not fit the classical sigmoidal curve.
Choose the dataset: choose the dataset from the above options. Only one dataset can be searched at a time. "Intensity: Time-dependent aggregation experiments" is selected by default. Specify a protein: select the protein from the drop-down to get the kinetics. (List of proteins refreshes for the respective dataset) Measurement: select the experimental method used for the visualization of aggregation kinetics. pH or Temperature: Users can specify the range of pH or Temperature from this column. Mutation type: Users can use the wild type protein, number of mutations or chemically modified residues from this option. Experimental method: Choose the method used for estimating the aggregation kinetics. Year: Choose the results based on the year they were published. Display options: User can choose the column, he/she wants to see in the results.
The result page shows the table with one experiment in one row. The default experimental detail and other general information provided in the column is discussed here:
For the time kinetics (time-dependent aggregation intensities), Users can visualize the aggregation kinetics as shown in the picture. The respective values of intensity and time point are given in the table (left-hand side). Please note that these values have been extracted from the graphs presented in different literature and might contain marginal error. The accuracy of data depends on the quality of the graph and the number of data points given in the graph. For other aggregation intensities (not time kinetics) and aggregation rate dataset, the aggregation graphs are not presented. However, other information can be obtained by clicking the respective row. The detailed information for each experiment looks like the following image (example from time-dependent aggregation experiments):
The aggregation intensity is plotted with respect to time in the left-hand side. The numerical values are shown in the table on the right-hand side.
The detailed information for the kinetics dataset is explained in 4 parts: (1) Protein Information: Similar to other datasets, it contains basic information about the protein. (2) Experimental condition: Contains the experimental condition used in the aggregation experiment. (3) Measurement: contains the details on how the data was measured. Time kinetics data has details of measurement in the form of a graph and table (presented at the top of the page). The aggregation rate dataset has details of the aggregation rate in this section (since no graph is plotted). The measured aggregation rate k(agg) is given for each protein and how it is different from the aggregation rate of the wild-type protein (if applicable) Other aggregation intensity dataset does not have uniformity, so they are presented as it is. user can see each data point in the dataset as a single row. (4) Literature: Contains the literature information on the experiment, so users can trace back to the original paper.
1. This dataset contains information about structures of aggregation related proteins/mutants, fibril structures, and amyloidogenic protein complexes with aggregation rate inhibitor/enhancer ligand. Please read the structure description and other details carefully for each entry. Users can filter structures based on the "Select a structure type (6 classification type)" and/or use sequence query as a filter. 2. All ligands inhibiting the amyloid formation were considered in the non-amyloid class. However, the inhibition due to ligand is concentration-dependent. The User should read the reference paper carefully before moving forward with the information provided. The Table shown in the structure database contains following columns:
Users can get detailed information on each protein and visualize the structure by clicking the respective row:
The top-left column shows the structure of the protein and the top-right column shows the sequence. The secondary structure information is provided using color coding in both structure and sequence. Helix: Magenta Strand: Yellow Coil and Turn: White The charged residues and Proline residues are highlighted in blue color in the sequence.
The user can also see the contact map (where the distance between residues <= 8 Å) for the PDB structure by clicking the "click to view the interactive contact network" link below the structure.
The chains are shown in different colors. We have used 8 different colors in repetition for visualizing the chains. Each node in the contact map represents the residue. The formatting for the residue nomenclature is [chain]-[residue name in 3 letter code][position]. The edges show the residues within 8 Å distance. The numbers labelled in the edges show the actual distance. Please mind that length of the edge does not correspond to actual distance between residues.
The information on the structure is divided into 4 parts: (1) Protein description: This contains a brief description about the protein taken from the respective literature. (2) Protein information: This contains basic information about the protein. (3) Structure information: This contains information related to the structure of the protein. (4) Ligand information: This contains information related to the ligand present in the PDB structure. (5) Literature: This contains information about the literature which solved the structure of the protein. We have also provided the option to download the complete detail on protein, contact map (within 8 Å distance) and PDB files in various formats.