Iowa State University

Iowa State University

Protein-RNA Interface Database (PRIDB) v2.0

Dobbs and Honavar Laboratories

PRIDB tutorial

FAQs

This page explains how to use the Protein-RNA Interface Database (PRIDB).

Below is a table of contents for the tutorial:

1. PRIDB information

The Protein-RNA Interface Database (PRIDB) is a comprehensive database of protein-RNA interfaces extracted from protein complexes in the Protein Data Bank (PDB) . PRIDB is designed to allow detailed analysis of individual protein-RNA complexes and interfaces of interest, and rapid identification of interface residues in protein-RNA complexes on a residue-by-residue basis. PRIDB provides atomic level information regarding interfacial contacts and allows visualization of motifs in both protein and RNA chains. In addition, PRIDB can be used to generate datasets of protein-RNA interfaces for statisitcal analyses and machine learning applications. PRIDB provides output in machine-readable format.

2. Example: Search by PDB ID or protein name

It is simple to search for a protein-RNA complex by the four-character PDB ID of the protein. For example, in order to search for the "tRNA synthetase" (PDB ID: 1ASY), go to the "Basic Search" link on the sidebar of PRIDB. This will bring up the simple search page, in which users can search for a protein by PDB ID or protein name. Figure 1 illustrates the simple search page:

In order to search for the tRNA synthetase, either:

  • Select the "PDB ID" radio button, and type in "1ASY" in the search box. Then click "Submit Query".
  • Select the "Protein name" radio button, and type in "tRNA synthetase". Then click "Submit Query".
    NOTE: For best results, use full words in the query.

Figure 2 shows an example of a search by PDB ID and an example of a search by protein name:

After the user submits the query, PRIDB will return a results page displaying the top matches in the database, along with pictures (obtained from PDB), basic information, and calculated RNA-binding residues. Figure 3 demonstrates some features of the search result page:

To show more results, click on the the "Next" link at the bottom of the search results page, as demonstrated by Figure 4.

3. Example: Advanced Search

Users can search for results by name, variable distance cut-off, complex determination method, resolution, chain length, and protein and RNA motifs through the Advanced Search page. To access the Advanced Search page, click on the "Advanced Search" link in the sidebar. Figure 5 illustrates the Advanced Search page:

4. Example: Submitting a custom .pdb file

In case information for a desired complex is not yet avaliable on PRIDB, the user can submit the .pdb file for any protein-RNA complex in order to recieve binding site information from PRIDB by e-mail. Figure 6 illustrates an example request:

The user will recieve an e-mail containing the results in .csv format:

See the next section on interpreting the .csv file.

5. Example: Viewing results as a .csv file

The .csv file contains the information for the RNA binding sites in table format. The file can be viewed in a text editor (Figure 7) but it is more convenient to use a spreadsheet program to view the file.

Google docs can be used to view the .csv file. Since the file may be too big for Google spreadsheet, it may be necessary to manually select only part of the .csv file--for example, by only keeping the header and the entries for one complex and deleting all the other entries. Alternatively, choose a specific complex of interest and obtain its individual .csv file by searching by its PDB ID. Figure 9 illustrates the appearance of the .csv file in spreadsheet form.

The table has one entry for each pair of interacting atoms in each complex. (see Figure 10.)

FAQs

  • What distance cut-off is used to calculate interface residues? 5 Å (The distance is being calculated from any atom to any other atom)
  • Which PDB version of data is being used? PDB as of May 1, 2010
  • On average, how much time will it take to get results for a user-submitted file in PDB format? This can take upto 24 hours.
  • What are the datasets? Several pre-calculated benchmark datasets, which have been filtered to limit redundancy and to exclude low-resolution structures, are provided for the user’s convenience. These include two previously published datasets, RB109 (Terribilini et al. RNA 2006) and RB147 (Terribilini et al. NAR 2007), as well as a larger, more recently extracted dataset (RB199).
  • What does non-redundant dataset mean?
    • RB109: 109 protein sequences obtained from 56 protein-RNA complexes with resolution better than 3.5 Å in PDB and no more than 30% sequence identity. For this dataset, the ENTANGLE program was used to identify amino acids in contact with RNA.
    • RB147: 147 non-redundant protein chains with resolution better than 3.5 Å and no more than 30% sequence identity. RNA-binding residues were identified according to a distance-based cutoff definition: an RNA-binding residue is an amino acid containing at least one atom within 5 Å of any atom in the bound RNA.

Last updated October 22, 2010.