Software Suite for Gene and Protein Annotation Prediction and Similarity Search

In the computational biology community, machine learning algorithms are key instruments for many applications, including the prediction of gene-functions based upon the available biomolecular annotations. Additionally, they may also be employed to compute similarity between genes or proteins. Here, we describe and discuss a software suite we developed to implement and make publicly available some of such prediction methods and a computational technique based upon Latent Semantic Indexing (LSI… 

Figures and Tables from this paper

Exploring Approaches for Detecting Protein Functional Similarity within an Orthology-based Framework

A similarity z-score is introduced that takes into account the FS background distribution of each protein, and is implemented in Frela, a fast high-throughput public web server for protein FS calculation and interpretation.

Validation Pipeline for Computational Prediction of Genomics Annotations

This work proposes a validation procedure based upon three different sub-phases, which is able to assess the precision of any algorithm predictions with a reliable degree of accuracy and shows some validation results obtained for Gene Ontology annotations of Homo sapiens genes that demonstrate the effectiveness of this approach.

Novelty Indicator for Enhanced Prioritization of Predicted Gene Ontology Annotations

A novelty indicator able to state the level of “originality” of the annotations predicted for a specific gene to Gene Ontology (GO) terms is proposed, joint with previously introduced prediction steps, that helps by prioritizing the most novel interesting annotations predicted by improving accuracy and relevance of an annotation prediction and prioritization pipeline.

Nine quick tips for pathway enrichment analysis

Nine quick tips to avoid common mistakes and to out a complete, sound, thorough PEA, which can produce relevant and robust results are proposed.

Integration and Querying of Genomic and Proteomic Semantic Annotations for Biomedical Knowledge Extraction

A software architecture to create and maintain a Genomic and Proteomic Knowledge Base (GPKB), which integrates several of the most relevant sources of such dispersed information and uses a flexible, modular, and multilevel global data schema based on abstraction and generalization of integrated data features.

Biological and Medical Ontologies: GO and GOA

  • M. Masseroli
  • Biology
    Encyclopedia of Bioinformatics and Computational Biology
  • 2019

HESML: a real-time semantic measures library for the biomedical domain with a reproducible survey

An updated version of the HESML Java software library especially designed for the biomedical domain is introduced, which implements the most efficient and scalable ontology representation reported in the literature, together with a new method for the approximation of the Dijkstra’s algorithm for taxonomies, called Ancestors-based Shortest-Path Length (AncSPL), which allows the real-time computation of any path-based semantic similarity measure.

Biological and Medical Ontologies: Protein Ontology (PRO)

  • D. ChiccoM. Masseroli
  • Computer Science, Biology
    Encyclopedia of Bioinformatics and Computational Biology
  • 2019



Semantically improved genome-wide prediction of Gene Ontology annotations

A novel prediction algorithm that incorporates gene clustering based on gene functional similarity computed on Gene Ontology annotations and tested both prediction methods performing k-fold cross-validation on two organism genomes.

Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation

This paper investigates the use of ontological annotation to measure the similarities in knowledge content or 'semantic similarity' between entries in a data resource and shows a simple extension that enables a semantic search of the knowledge held within sequence databases.

Computational Approaches for Protein Function Prediction : A Survey

This survey aims to discuss a wide spectrum of approaches for protein function prediction by categorizing them in terms of the data type the y us for predicting function, and thus identify the trends and needs of this very important field.

Integration of Bioinformatics Web Services through the Search Computing Technology MINOR RESEARCH REPORT

This work aimed at supporting the explorative search of heterogeneous distributed bio-data and the automatic integration and global ranking of their individual search results, also taking into account the partial rankings of individual searches.

A semantic analysis of the annotations of the human genome

The technique is able to identify missing and inaccurate annotations in existing annotation databases, and thus help improve their accuracy, and is used to analyze and improve the quality of the data of any public or private annotation database.

Semantic similarity analysis of protein data: assessment with biological features and issues

This work presents a systematic discussion and comparison of main approaches for annotating existing protein data with biological information to enable the use of algorithms that use biological ontologies as framework to mine annotated data.

Gene clustering by Latent Semantic Indexing of MEDLINE abstracts

It is demonstrated here that pairwise distances derived from the vector angles of gene abstract documents can be effectively used to functionally group genes by hierarchical clustering, and provide proof-of-principle that LSI is a robust automated method to elucidate both known (explicit) and unknown (implicit) gene relationships from the biomedical literature.

G-SESAME: web tools for GO-term-based gene similarity analysis and knowledge discovery

We have developed a set of online tools for measuring the semantic similarities of Gene Ontology (GO) terms and the functional similarities of gene products, and for further discovering biomedical

Towards the assessment of semantic similarity analysis of protein data: main approaches and issues

Bioinformatics approaches to the study of proteins yield to the introduction of different methodologies and related tools for the analysis of different types of data related to proteins, ranging from

Visual Composition of Complex Queries on an Integrative Genomic and Proteomic Data Warehouse

A Web application is developed to enable any user to easily compose queries, although complex, on all data integrated in the GPDW, a Genomic and Proteomic Data Warehouse that integrates data provided by some of the main bioinformatics databases.