Finding New Order in Biological Functions from the Network Structure of Gene Annotations
If one gene regulates another, those two genes are likely to be involved in many of the same biological functions. Conversely, shared biological function may be suggestive of the existence and nature of a regulatory interaction. With this in mind, we develop a measure of functional similarity between genes based on annotations made to the Gene Ontology in which the magnitude of their functional relationship is also indicative of a regulatory relationship. In contrast to other measures that have previously been used to quantify the functional similarity between genes, our measure scales the strength of any shared functional annotation by the frequency of that function's appearance across the entire set of annotations. We apply our method to both Escherichia coli and Saccharomyces cerevisiae gene annotations and find that the strength of our scaled similarity measure is more predictive of known regulatory interactions than previously published measures of functional similarity. In addition, we observe that the strength of the scaled similarity measure is correlated with the structural importance of links in the known regulatory network. By contrast, other measures of functional similarity are not indicative of any structural importance in the regulatory network. We therefore conclude that adequately adjusting for the frequency of shared biological functions is important in the construction of a functional similarity measure aimed at elucidating the existence and nature of regulatory interactions. We also compare the performance of the scaled similarity with a high-throughput method for determining regulatory interactions from gene expression data and observe that the ontology-based approach identifies a different subset of regulatory interactions compared with the gene expression approach. We show that combining predictions from the scaled similarity with those from the reconstruction algorithm leads to a significant improvement in the accuracy of the reconstructed network.