Learn More
— Post-genomic research deals with challenging problems in screening genomes of organisms for particular functions or potential for being the targets of genetic engineering for desirable biological features. 'Phenotyping' of wild type and mutants is a time-consuming and costly effort by many individuals. This article is a preliminary progress report in(More)
Entity matching (EM) has been a long-standing challenge in data management. Most current EM works focus only on developing matching algorithms. We argue that far more efforts should be devoted to building EM systems. We discuss the limitations of current EM systems, then present as a solution Magellan, a new kind of EM systems. Magellan is novel in four(More)
Protein secondary structure detection is an intricate problem which depends on several parameters of a polypeptide chain and its environment and has a great effect on the accurate determination of protein functionality in living organisms. Statistical learning approaches have been used to tackle the problem extensively and many considerable results have(More)
Entity matching (EM) has been a long-standing challenge in data management. Most current EM works, however, focus only on developing matching algorithms. We argue that far more efforts should be devoted to building EM systems. We discuss the limitations of current EM systems, then present Magellan, a new kind of EM systems that addresses these limitations.(More)
Modern epidemiology integrates knowledge from heterogeneous collections of data consisting of numerical, descriptive and imaging. Large-scale epidemiological studies use sophisticated statistical analysis, mathematical models using differential equations and versatile analytic tools that handle numerical data. In contrast, knowledge extraction from images(More)
Tuberculosis is a treatable but severe disease caused by Mycobacterium tuberculosis (Mtb). Recent statistics by international health organizations estimate the Mtb exposure to have reached over two billion individuals. Delay in disease diagnosis could be fatal, especially to the population at risk, such as individuals with compromised immune systems.(More)
— The ubiquitous role of the cyber-infrastructures, such as the WWW, provides myriad opportunities for machine learning and its broad spectrum of application domains taking advantage of digital communication. Pattern classification and feature extraction are among the first applications of machine learning that have received extensive attention. The most(More)
Large-scale human-in-the-loop information extraction and integration with applications in healthcare • Hybrid machine-human clustering for attribute value normalization • Highly scalable event extraction in the Twittersphere from legacy tweet stores • Slot filling for TAC/MR-KBP using logistic regressors on large-scale data Computational systems biology,(More)
Ant miner is a data mining algorithm based on Ant Colony Optimization. Ant miner algorithms are mainly for discovery rule for optimization. Ant miner+ algorithm uses MAX-MIN ant system for discover rules in the database. Soil classification deals with the systematic categorization of soils based on distinguished characteristics as well as criteria. In this(More)
  • 1