This page explains how to use the Protein-RNA Interface Database
Below is a table of contents for the tutorial:
1. PRIDB information
The Protein-RNA Interface Database (PRIDB) is a comprehensive database of protein-RNA
interfaces extracted from protein complexes in the Protein Data Bank (PDB) . PRIDB is designed to allow
detailed analysis of individual protein-RNA complexes and interfaces of interest, and rapid identification of interface residues in
protein-RNA complexes on a residue-by-residue basis. PRIDB provides atomic level information regarding interfacial contacts and allows
visualization of motifs in both protein and RNA chains. In addition, PRIDB can be used to generate datasets of protein-RNA interfaces for
statisitcal analyses and machine learning applications. PRIDB provides output in machine-readable format.
2. Example: Search by PDB ID or protein name
It is simple to search for a protein-RNA complex by the four-character PDB ID of the protein. For example, in order to search for the
"tRNA synthetase" (PDB ID: 1ASY
), go to
the "Basic Search" link on the sidebar of PRIDB. This will bring up the simple search page, in which users can search for a protein by PDB ID or
protein name. Figure 1 illustrates the simple search page:
In order to search for the tRNA synthetase, either:
- Select the "PDB ID" radio button, and type in "1ASY" in the search
box. Then click "Submit Query".
- Select the "Protein name" radio button, and type in "tRNA synthetase". Then click "Submit Query".
NOTE: For best results, use full words in the
Figure 2 shows an example of a search by PDB ID and an example of a search by protein name:
After the user submits the query, PRIDB will return a results page displaying the top matches in the database, along with pictures (obtained
from PDB), basic information, and calculated RNA-binding residues. Figure 3 demonstrates some features of the search result page:
To show more results, click on the the "Next" link at the bottom of the search results page, as demonstrated by Figure 4.
3. Example: Advanced Search
Users can search for results by name, variable distance cut-off, complex determination method, resolution, chain length, and protein and RNA motifs through the Advanced
Search page. To access the Advanced Search page, click on the "Advanced Search" link in the sidebar. Figure 5 illustrates the Advanced Search page:
4. Example: Submitting a custom .pdb file
In case information for a desired complex is not yet avaliable on PRIDB, the user can submit the .pdb file for any protein-RNA complex in
order to recieve binding site information from PRIDB by e-mail. Figure 6 illustrates an example request:
The user will recieve an e-mail containing the results in .csv format:
See the next section on interpreting the .csv file.
5. Example: Viewing results as a .csv file
The .csv file contains the information for the RNA binding sites in table format.
The file can be viewed in a text editor (Figure 7) but it is more convenient to use a spreadsheet program to view the file.
Google docs can be used to view the .csv file. Since the file may be too big for Google spreadsheet,
it may be necessary to manually select only part of the .csv file--for example, by only keeping the header and the entries for one complex
and deleting all the other entries. Alternatively, choose a specific complex of interest and obtain its individual .csv file by searching by its PDB ID. Figure 9 illustrates the appearance of the .csv file in spreadsheet form.
The table has one entry for each pair of interacting atoms in each complex. (see Figure 10.)
- What distance cut-off is used to calculate interface residues? 5 Å (The distance is
being calculated from any atom to any other atom)
- Which PDB version of data is being used? PDB as of May 1, 2010
- On average, how much time will it take to get results for a user-submitted file in PDB format? This can take upto 24 hours.
- What are the datasets? Several pre-calculated benchmark datasets, which have been filtered to limit redundancy and to exclude
low-resolution structures, are provided for the user’s convenience. These include two previously published datasets, RB109 (Terribilini et al. RNA 2006)
and RB147 (Terribilini et al. NAR 2007), as well as a larger, more recently extracted dataset (RB199).
- What does non-redundant dataset mean?
- RB109: 109 protein sequences obtained from 56 protein-RNA complexes with resolution better than 3.5 Å in PDB and no more than 30% sequence identity. For this dataset, the ENTANGLE program was used
to identify amino acids in contact with RNA.
- RB147: 147 non-redundant protein chains with resolution better than 3.5 Å and no more than 30% sequence identity.
RNA-binding residues were identified according to a distance-based cutoff definition: an RNA-binding residue is an amino acid containing at least one atom within 5 Å of any atom in the bound RNA.
Last updated October 22, 2010.