Skip to content

RiboVision 2 Data

RNA sequences and phylogeny (RiboVision)

The subset of 152 species from the DESIRE (Sparse and Efficient Representation of Extant Biology), database was organized into a phylogenetic browser using a tree topology from the Banfield lab.

Sequences have been obtained and tagged from external databases. The header of each sequence includes its source. The sequence IDs that start with prefix "URS" are obteained from RNAcentral and the sequnce IDs that contain non "URS" prfixes are obtained from RFAM.

Alignments

Each ribosomal RNA has an associated MSA. The alignments were generated according to the procedure described at (https://doi.org/10.1093/molbev/msy101)

2D maps

Pregenerated 2D layouts of RNAs were exported into the RNA topology viewer using the EMBL-EBI PDBe API.

3D Structures

3D structures were fetched from the RCSB using the APIs of RCSB coordinate server.

3D coloring

To facilitate coloring of the 3D structures, the RiboVision 2 color themes were created within PDBe Mol, using the color wrapper of (Proteopedia)[https://github.com/molstar/molstar/tree/master/src/examples/proteopedia-wrapper] as a template. The tailored Mol code is available from local GitHub repo.

Sequence and structure associated data

Protein contacts

RNA-Protein contacts are computed upon selecting a specific RNA complex and specifying the main RNA chain. The contacts are computed by NeighborSearch module of BioPython using the KD Tree algorithm with a cutoff distance of 3.5 A.

Chemical modifications

Chemical modifications (if any) are extracted from "_entity_poly.pdbx_seq_one_letter_code" fields of the selected CIF file.

Available attributes for calculated mapping data:

Nucleotide frequencies

Nucleotide frequencies in each column of an MSA were adjusted for the presence of gaps. Thus, the gap frequencies were prorated and were treated as a uniform distribution among four possible nucleic acid characters, such that a single character in a gap counts as 0.25, as described by Bernier et al..

Shannon Entropy

The Shannon entropy (as well as all properties listed below) was computed from the gap adjusted probabilities as:

\[ H(X) = - \sum_{i=1}^{n} P(x_i) \cdot \log_b(P(x_i)) \]
where, $n=4$ yielding a range of $H$ between $0$ (conserved) and $2.0$ (random).

Two group comparison (TwinCons)

In case of two groups selected in the phylogeny browser, RiboVision 2 provides an additional option to compute an in house developed score, TwinCons. TwinCons ( https://doi.org/10.1371/journal.pcbi.1009541) is computed for a single position of the MSA that compares two pre-defined groups (represented by vectors of the gap adjusted nucleotide frequencies) based on their similarity defined by the pre-computed substitution matrix, blastn (https://github.com/LDWLab/TwinCons). TwinCons represents the transformation price between the two vector columns related by the substitution matrix.

Associated data

2D RNA topology viewer MolStar Viewer
2D RNA topology Viewer MolStar Viewer

RiboVision 2 offers visualization of structural and evolutionary data related to ribosomal RNAs, including helices, expansion segments and ancestral expansion segments. The definitions of helices were obtained from Petrov et al. as illustrated in the gallery of ribosomal structures; each nucleotide belongs uniquely to one helix, containing contiguous base-paired or stacked bases. Definitions of Ancestral Expansion Segments were taken from the evolutionary accretion model by Petrov PNAS 2014 and PNAS 2015. The evolutionary data were deposited into RiboVision's database for a limited number of anchor structures. When users select a ribosomal sequence, RiboVision maps pre-existing evolutionary data onto the chosen sequence. This data can then be visualized in both 2D and 3D applets. These data, when selected, can be explored interactively by hovering mouse over nucleotides in the 2D representation through Tooltips.

Color Schemes

Each calculated attribute is mapped on a matplotlib colorscheme. For single continuum attributes (e.g. Shannon entropy, Helical data, Custom Data), RiboVision 2 uses single continuum colormaps (viridis or rainbow). For diverging data attributes (e.g. TwinCons), RiboVision2 uses diverging colormaps like Blue-White-Red. All colormaps were generated with the python matplotlib library and exported to JavaScript with the js-colormaps package. Further information about colormaps in matplotlib.

Treatment of Unresolved RNA regions

RiboVision2 provides support for 2D visualization of regions unresolved in the 3D structure. This is achieved by the following measures. In RiboVision mode and the cif option of the User-upload mode, the full RNA sequence is automatically extracted from the cif files (._entity_poly.pdbx_seq_one_letter_code field). This sequence is used to generate secondary structure diagrams. Nucleotides unresolved in 3D structure are marked as “Unobserved residues” in the 2D applet upon hovering the mouse. RiboVision2 supports mapping of Shannon Entropy, TwinCons, and Associated Data (Helices, Expansion Segments and Phases) in 2D onto unresolved regions in RiboVision mode.
In a few rare known instances, cif files deposited to PDB do not contain the full genomic RNA sequence in the abovementioned field. For these cases users are advised to use the “pdb” option of the User-upload mode. This option requires additional uploading of the full RNA sequence in addition to a 3D RNA structure in the PDB format. The supplied full sequence is then used for mapping onto a provided MSA and to generate secondary structure diagrams. This method ensures that unresolved regions are properly depicted.