Gene names and symbols are important biomedical entities, but are highly ambiguous. This ambiguity affects the performance of both information extraction and information retrieval systems in the biomedical domain. Existing knowledge sources contain different types of information about genes and could be used to disambiguate gene symbols. In this paper, we applied an information retrieval (IR) based method for human gene symbol disambiguation and studied different methods to combine various… CONTINUE READING
The combination method using coefficients obtained from a logistic regression model reached the highest precision of 92.2% on a testing set of ambiguous human gene symbols.
They also used a simple method to combine different types of information and reported that a highest precision of 93.9% was reached for a testing set of mouse genes using multiple types of information.
Among the five different combination methods, CombLR achieved the highest mean precision of 0.922 for testing set 1. CombSum, which is a simple combination method, also had a good mean precision of 0.920 on testing set 1.
The highest precision of 0.906 was reached when using CombSum and CombMax methods.
For example, it is not straightforward to compare our precision result (92.2%) with that (92.7%) reported by Schijvenaars et al.
The combination method using coefficients obtained from a logistic regression model reached the highest precision of 92.2% on an automatically generated testing set of ambiguous human gene symbols.
II GN task, the combination method that performed summation of individual similarities reached the highest precision of 90.6%.