Resource (TAIR)–database for the brassica family plant Arabidopsis thaliana, Rat Genome Database (RGD)–database for the rat Rattus norvegicus, and GeneDB protozoa– databases for Plasmodium falciparum, Leishmania major, Trypanosoma brucei, and several other protozoan

Abstract

Nowadays, Gene Ontology has been used widely by many researchers for biological data mining and information retrieval, integration of biological databases, finding genes, and incorporating knowledge in the Gene Ontology for gene clustering. However, the increase in size of the Gene Ontology has caused problems in maintaining and processing them. One way to obtain their accessibility is by clustering them into fragmented groups. Clustering the Gene Ontology is a difficult combinatorial problem and can be modeled as a graph partitioning problem. Additionally, deciding the number k of clusters to use is not easily perceived and is a hard algorithmic problem. Therefore, an approach for solving the automatic clustering of the Gene Ontology is proposed by incorporating cohesion-and-coupling metric into a hybrid algorithm consisting of a genetic algorithm and a split-and-merge algorithm. Experimental results and an example of modularized Gene Ontology in RDF/XML format are given to illustrate the effectiveness of the algorithm. Keywords—Automatic clustering, Cohesion-and-coupling metric, Gene Ontology; Genetic algorithm, Split-and-merge algorithm

12 Figures and Tables

Cite this paper

@inproceedings{Othman2006ResourceF, title={Resource (TAIR)–database for the brassica family plant Arabidopsis thaliana, Rat Genome Database (RGD)–database for the rat Rattus norvegicus, and GeneDB protozoa– databases for Plasmodium falciparum, Leishmania major, Trypanosoma brucei, and several other protozoan}, author={Razib M. Othman and Safaai Deris and Rosli Md. Illias and Zalmiyah Zakaria and Saberi M . Mohamad}, year={2006} }