Learn More
With the current explosion of data, retrieving and integrating information from various sources is a critical problem. Work in multidatabase systems has begun to address this problem, but it has primarily focused on methods for communicating between databases and requires signiicant eort for each new database added to the system. This paper describes a more(More)
Nineteen teams presented results for the Gene Mention Task at the BioCreative II Workshop. In this task participants designed systems to identify substrings in sentences corresponding to gene name mentions. A variety of different methods were used and the results varied with a highest achieved F1 score of 0.8721. Here we present brief descriptions of all(More)
Single nucleotide polymorphism (SNP) prioritization based on the phenotypic risk is essential for association studies. Assessment of the risk requires access to a variety of heterogeneous biological databases and analytical tools. FASTSNP (function analysis and selection tool for single nucleotide polymorphisms) is a web server that allows users to(More)
This paper presents a novel feature selection approach for backpropagation neural networks (NNs). Previously, a feature selection technique known as the wrapper model was shown effective for decision trees induction. However, it is prohibitively expensive when applied to real-world neural net training characterized by large volumes of data and many feature(More)
| Integrating a large number of Web information sources may signiicantly increase the utility of the WorldWide Web. A promising solution to the integration is through the use of a Web Information mediator that provides seamless, transparent access for the clients. Information mediators need wrappers to access a Web source as a structured database, but(More)
A critical problem in building an information mediator is how to translate a domain-level query into an efficient query plan for accessing the required data. We have built a flexible and efficient information mediator , called SIMS. This system takes a domain-level query and dynamically selects the appropriate information sources based on their content and(More)
BACKGROUND The goal of the gene normalization task is to link genes or gene products mentioned in the literature to biological databases. This is a key step in an accurate search of the biological literature. It is a challenging task, even for the human expert; genes are often described rather than referred to by gene symbol and, confusingly, one gene name(More)
This paper explains why well-known dis-cretization methods, such as entropy-based and ten-bin, work well for naive Bayesian classiiers with continuous variables, regardless of their complexities. These methods usually assume that discretized variables have Dirichlet priors. Since perfect aggrega-tion holds for Dirichlets, we can show that, generally, a wide(More)
Semantic query optimization can dramatically speed up database query answering by knowledge intensive reformulation. But the problem of how to learn the required semantic rules has not been previously solved. This chapter presents a learning approach to solving this problem. In our approach, the learning is triggered by user queries. Then the system uses an(More)