ProteoVision Data

Phylogeny (SEREB)

The subset of 179 species from the SEREB (Sparse and Efficient Representation of Extant Biology), database was organized into a phylogenetic browser using a tree topology from NCBI.

Alignments

Each ribosomal protein has an associated MSA. First, an MSA reference was generated with MATRAS from multiple structure superimpositions. Then, amino acid sequences of species from the SEREB database were added to the reference alignment using MAFFT.

2D maps

Topologies of the protein secondary structures (Laskowski; 10.1093/nar/gkn860) were exported into PDB topology viewer using the EMBL-EBI PDBe API.

3D Structures

3D structures were fetched from the PDBe using the APIs of EMBL-EBI coordinate server. The selection of ranges was implemented using the syntax of the LiteMol’s coordinate server.

Alignment associated data (Fold, Phase)

DESIRE holds annotations of domain architecture from ECOD (Cheng; 10.1371/journal.pcbi.1003926) and ribosomal phase definitions (Kovacs; 10.1093/molbev/msx086) at residue level for one representative species (E. coli). Using the alignments, ProteoVision retrieves these annotations for each column of an alignment and displays them as a hovering pop-up next to each residue.

Available attributes for calculated mapping data:

Amino Acid frequencies

Amino acid frequencies in each column of an MSA were adjusted for presence of gaps. Thus, the gap frequencies were prorated and were treated as a uniform distribution among all possible amino acid characters, such that a single character in a gap counts as 0.05, as described by Bernier et al..

Shannon Entropy

The Shannon entropy (as well as all properties listed below) was computed from the gap adjusted probabilities as:

Two group comparison (TwinCons)

In case of two groups selected in the phylogeny browser, ProteoVision provides an additional option to compute an in house developed score, TwinCons. TwinCons is computed for a single position of the MSA that compares two pre-defined groups (represented by vectors of the gap adjusted amino acid frequencies) based on their similarity defined by the pre-computed substitution matrix. TwinCons represents the transformation price between the two vector columns related by the substitution matrix.

Charge, hydropathy, hydrophobicity, polarity, mutability

The physico-chemical properties for each position within an MSA are computed as average properties for a given distribution of the amino acid frequencies. The tabulated values for each property were obtained from the available literature: - charges - hydropathy - hydrophobicity - polarity - mutability

Color Schemes

Each calculated attribute is mapped on a matplotlib colorscheme. For single continuum attributes (like Shannon entropy or Polarity), ProteoVision uses single continuum colormaps like plasma and viridis. For diverging data attributes (like Charge or TwinCons), ProteoVision uses diverging colormaps like Blue-White-Red or Green-White-Purple. All colormaps were generated with the python matplotlib library and exported to JavaScript with the js-colormaps package. Further information about colormaps in matplotlib.