Reactome: A Curated Pathway Database

Computational Inference

Using curated rice pathways as a reference, Plant Reactome now predicts pathways in other plant species on the basis of Compara and Inparanoid super-cluster orthology.

We isolated gene identifiers and their Uniprot counterparts in our curated rice pathways and downloaded Ensembl Plants (Gramene rel.48) orthology predictions for several plant species, using O. sativa as the reference. We also utilized Inparanoid clustered orthology data for other species, including ancestral rice species. Using Compara’s reciprocal identity data, we limited the orthology predictions to those meeting a threshold of 40% reciprocal identity for monocots, 30% reciprocal identity for dicots and amborella, and 28% for lycopods, bryophytes, chlorophytes, rhodophytes. Both high and low confidence Compara data were utilized.

We then ran Reactome projection inference scripts on our curated rice pathway data to generate predicted pathways, reactions, orthologous gene products, and other required data instances. We accepted defined sets and complexes with a minimum of one orthologous member. Finally, we generated separate pathway diagrams for these projected pathways and deployed the results on this web site.

We use the set of manually curated reference rice reactions to electronically infer reactions in several evolutionarily divergent plant species for which high-quality whole-genome sequence data are available the Gramene database and a select set of published transcriptomes and non-reference plant genomes, and hence a comprehensive and high-quality set of protein predictions exists. The estimated success rates of our orthology inference strategy can be stated as ‘the percentage of eligible reactions, defined in step 2 below, in the current reference data set for which an event can be inferred  to be projected in another species. By this measure, success rates range from species to species, depending on the quality of the primary annotation and genes identified by the genome or transcriptome sequencing project.

Electronic inference proceeds in four steps.

1) Protein homology data were obtained (a) from Gramene’s Plant  Compara. Briefly, this method is based on the construction of gene trees, using the longest protein translation for every Ensembl gene, for all species included in the Compara database. Homologues are deduced from these trees. The method is described in more detail in EnsemblCompara GeneTrees: Analysis of complete, duplication aware phylogenetic trees in vertebrates. Vilella et al., Genome Research, 2008;  (2) A select set of species data represented by the published/shared in collaboration with us in the form of transcriptome and/or the genome annotation were process through the InParanoid-based homology prediction.  For the purpose of inferring homologous events in Reactome, we used both the Core Plant Compara data set and the Inparanoid based predictions for projecting the computationally inferred Pathway events in several plant genomes. More information about the analyses and inclusion of two types of homology data set can be found here.

2) All reference reactions in the Plant Reactome knowledgebase involving one or more proteins are eligible for electronic inference. Eligible reactions are checked to determine whether each involved protein has at least one homologous protein (HP) in the selected plant genomes. If a reference reaction involves a complex, at least one of the accessioned protein components of the complex must have HPs in the selected species.

3) For each reaction that meets these criteria, an equivalent reaction is created for the selected species by replacing each reference protein with its organism specific HP. If a reference protein corresponds to more than one HP from a species, a DefinedSet called ‘Homologues of …’ is created, with the model organism HPs as members. For reference proteins that lack a species specific HP but that are included in complexes inferred, placeholder model organism entities (called ‘Ghost homologue of…’) are created.

4) If this analysis generates reactions in the selected species corresponding to any of the steps of a reference pathway, then the pathway event is also inferred for the selected species.

These electronically inferred reactions are predictions based on a number of assumptions. Most basically, we assume that if we can find HPs corresponding to all proteins involved in a reference reaction, then the proteins mediate the same reaction in the projections. This may not be true. On the other hand we may miss a truly homologous reaction in a given  species because it is mediated either by structurally divergent proteins and the Compara strategy failed to identify them or the gene was not identified by the genome annotation project. Similarly, complexes sharing a subset of homologous subunits between species may nevertheless continue to perform the same function. The electronically inferred reactions presented in Plant Reactome are thus not data, but hypotheses useful to direct the design of confirmatory experiments.

If you are interested in looking at the pathway projection summary, please visit the Database Release Summary page.