CPAD 2.0

About the database:

There are 4 databases integrated into the CPAD 2.0. The databases can be accessed from the home page by clicking the respective button. If you want to contribute to the current dataset. Please download the template CSV files from the home page. We have presented 4 CSV files, each corresponds to the respective dataset. The header columns are different for each dataset. Please fill the data in the respective CSV and upload it using the UPLOAD button (below the template file option).

Users can also download the details protein wise from any of the datasets in CSV or JSON format.

For the bulk download, Please contact the authors.

Disclaimer: We have taken utmost care while collecting the data. However, there is a possibility of errors. Please notify us in such cases.
Users can verify the data from the corresponding literature.

Amyloid Proteins and Peptide

This dataset contains information about experimentally known amyloid and non-amyloid peptides/proteins. Users can filter the amyloid and non-amyloid from the "Select a class" option. We have also provided the "Field to filter" and "query-based search" option. Users can search query based on protein sequence, protein name, peptide length and PMID. The peptide length only takes numerical values. "*" can be used for the unknown residue(s) (such as: "V*Y*", "**YT" etc .) The details of the columns are given below:

Column/Property	Description
Entry	Unique IDs are given to each data point
Peptide	Peptide sequence
Length	Peptide length
Class	classification: amyloid-forming peptides or non-amyloid forming peptides
Protein Name	Name of the source protein
UniProt ID	UniProt ID of the source protein
Mutant	Information of the mutations incorporated in the source protein
Reference	Short reference for the Experimental detail
PMID	Pubmed ID
Source	Source database

We have also calculated some additional properties for the peptide sequences. Each row in the result table is clickable and provides additional information about the peptide. The result page looks like the image below:

The residues in the green background are the aggregating/non-aggregating peptide sequences marked in the source protein. Red-colored residues in the flanks on both directions (3 residues upwards and downwards) are the gatekeeper residues. The rest of the charged and proline residues are marked in the sequence in orange color.

The derived information in this page is explained below:

Column/Property	Description
Net Charge	The charge of the peptide considering positive and negative charges
Absolute charge	The total charge of the peptide (sum of positive and negative charge residues)
Hydrophobicity	Percentage of hydrophobic residues (A,C,F,I,L,M,P,V,W,Y)
NuAPRpred	Aggregation propensity predicted using NuAPRpred
Tango	Aggregation propensity predicted using Tango
Normalized Aggregation Propensity (AGGRESCAN)	Aggregation propensity calculated using AGGRESCAN
Area of the profile Above Threshold (AGGRESCAN)	Area of the aggregation propensities above threshold (residue-wise)
Best Energy Score (PASTA 2.0)	Aggregation propensity score predicted using PASTA 2.0 server
Aggregate Orientation (PASTA 2.0)	The orientation of the fibril structure predicted using PASTA 2.0

Aggregation-prone regions (APRs)

This dataset contains information about experimentally known aggregation-prone regions (APRs) in amyloidogenic proteins. Users have an option to choose the specific protein from the list of proteins and view the APR(s) regions. We have also provided a query-based search option. "*" can be used for the unknown residue(s) (such as: "V*Y*", "**YT" etc.).
The APRs information is provided in 3 different segments:
(1) Protein information
(2) Peptide (APR) information
(3) Literature information

The detailed information about each property is given below:

Column/Property	Description
Entry	Unique IDs are given to each data point
Details of Protein	Name of the amyloidogenic protein, UniProt name, alternate names, mutation and UniProt Id
Category	Protein category (Functional/pathogenic)
Prion	if the protein is a prion protein
Structure	PDB Ids of the structure if available
Species	The source organism
Position	Sequence position of the APR
Region	Sequence detail of the APR
Region length	sequence length of the APR
PMID	Pubmed ID
Reference	Article information in short format
Source	Source database

Each row in the result table is clickable and provides additional information and visualization about the peptide. The result page looks like the image below:

The residues in the green background are the amyloidogenic peptide sequences marked in the source protein. Red-colored residues in the flanks on both directions (3 residues upwards and downwards) are the gatekeeper residues. The rest of the charged and proline residues are marked in the sequence in orange color.

Aggregation Kinetics

This dataset contains information about experimental aggregation kinetics reported in the literature. The database is divided into 3 categories:
(1) Aggregation rates (k_agg) contains precalculated aggregation rates. We have also calculated the change in aggregation rate with respective wild type protein (experiments are performed under identical conditions).
(2) Intensity: Time-dependent aggregation experiments Aggregation intensity from the kinetics experiments (time-dependent)
(3) Intensity: Other aggregation experiments Aggregation intensity from other experiments (pH-dependent, mutation-dependent, temperature-dependent, etc. The intensities are calculated at a particular time point; either at equilibrium or max intensity)

Please note that some time kinetics contains data with less/negligible intensity values from the experiments. This information is also added in the dataset for the comparison purpose and providing users the complete literature detail. These details might not fit the classical sigmoidal curve.

Users have multiple filter options:

Choose the dataset: choose the dataset from the above options. Only one dataset can be searched at a time.
"Intensity: Time-dependent aggregation experiments" is selected by default.
Specify a protein: select the protein from the drop-down to get the kinetics.
(List of proteins refreshes for the respective dataset)
Measurement: select the experimental method used for the visualization of aggregation kinetics.
pH or Temperature: Users can specify the range of pH or Temperature from this column.
Mutation type: Users can use the wild type protein, number of mutations or chemically modified residues from this option.
Experimental method: Choose the method used for estimating the aggregation kinetics.
Year: Choose the results based on the year they were published.
Display options: User can choose the column, he/she wants to see in the results.

Result page

The result page shows the table with one experiment in one row. The default experimental detail and other general information provided in the column is discussed here:

Column/Property	Description
Entry	Unique IDs are given to each data point
Protein	Name of the amyloidogenic protein
Species	The source organism
Length	Peptide length
UniProt ID	UniProt ID of the source protein
Mutation(s)	Information of the mutations studied in aggregation experiment
Temperature	Temperature in aggregation experiment
pH	pH in aggregation experiment
Protein concentration	Concentration of protein in aggregation experiment
Measure	Measurement method in aggregation experiment
PMID	Pubmed ID
Datapoints	Shows the number of time points when intensity was measured

Visualization of aggregation kinetics

For the time kinetics (time-dependent aggregation intensities), Users can visualize the aggregation kinetics as shown in the picture. The respective values of intensity and time point are given in the table (left-hand side). Please note that these values have been extracted from the graphs presented in different literature and might contain marginal error. The accuracy of data depends on the quality of the graph and the number of data points given in the graph.
For other aggregation intensities (not time kinetics) and aggregation rate dataset, the aggregation graphs are not presented. However, other information can be obtained by clicking the respective row.
The detailed information for each experiment looks like the following image (example from time-dependent aggregation experiments):

The aggregation intensity is plotted with respect to time in the left-hand side. The numerical values are shown in the table on the right-hand side.

The detailed information for the kinetics dataset is explained in 4 parts:

(1) Protein Information: Similar to other datasets, it contains basic information about the protein.
(2) Experimental condition: Contains the experimental condition used in the aggregation experiment.
(3) Measurement: contains the details on how the data was measured.
Time kinetics data has details of measurement in the form of a graph and table (presented at the top of the page).
The aggregation rate dataset has details of the aggregation rate in this section (since no graph is plotted). The measured aggregation rate k(agg) is given for each protein and how it is different from the aggregation rate of the wild-type protein (if applicable)
Other aggregation intensity dataset does not have uniformity, so they are presented as it is. user can see each data point in the dataset as a single row.
(4) Literature: Contains the literature information on the experiment, so users can trace back to the original paper.

Structure Database of aggregation related proteins and peptides

1. This dataset contains information about structures of aggregation related proteins/mutants, fibril structures, and amyloidogenic protein complexes with aggregation rate inhibitor/enhancer ligand. Please read the structure description and other details carefully for each entry. Users can filter structures based on the "Select a structure type (6 classification type)" and/or use sequence query as a filter.
2. All ligands inhibiting the amyloid formation were considered in the non-amyloid class. However, the inhibition due to ligand is concentration-dependent. The User should read the reference paper carefully before moving forward with the information provided.

The Table shown in the structure database contains following columns:

Column/Property	Description
Entry	Unique IDs are given to each data point
PDB ID	PDB ids of the structures
Protein Name	Name of the protein
Species	The source organism
Length	Sequence length of the protein
Mutation(s)	Mutations in the protein structure
Class	Classification (amyloid or non-amyloid)
Type	Classification type: Protein/Peptide: Wild type proteins or mutants which aggregates. Fibril: Structure of amyloid fibrils. Aggregating complex: Protein-ligand complex which enhances the aggregation process. Inhibitor complex: Protein-ligand complex which inhibits the aggregation process. Fibril complex: Ligand binding with the amyloid fibrils. Protein complex: Amyloidogenic protein-protein complex.
Method	Experimental method used to determine the structure
Resolution (Å)	Resolution of the structure
PMID	Pubmed ID of the literature

Users can get detailed information on each protein and visualize the structure by clicking the respective row:

The top-left column shows the structure of the protein and the top-right column shows the sequence.
The secondary structure information is provided using color coding in both structure and sequence.
Helix: Magenta
Strand: Yellow
Coil and Turn: White
The charged residues and Proline residues are highlighted in blue color in the sequence.

The user can also see the contact map (where the distance between residues <= 8 Å) for the PDB structure by clicking the "click to view the interactive contact network" link below the structure.

The chains are shown in different colors. We have used 8 different colors in repetition for visualizing the chains. Each node in the contact map represents the residue. The formatting for the residue nomenclature is [chain]-[residue name in 3 letter code][position]. The edges show the residues within 8 Å distance. The numbers labelled in the edges show the actual distance.
Please mind that length of the edge does not correspond to actual distance between residues.

The information on the structure is divided into 4 parts:
(1) Protein description: This contains a brief description about the protein taken from the respective literature.
(2) Protein information: This contains basic information about the protein.
(3) Structure information: This contains information related to the structure of the protein.
(4) Ligand information: This contains information related to the ligand present in the PDB structure.
(5) Literature: This contains information about the literature which solved the structure of the protein.

We have also provided the option to download the complete detail on protein, contact map (within 8 Å distance) and PDB files in various formats.

Tutorial