Learn More
We introduce novel profile-based string kernels for use with support vector machines (SVMs) for the problems of protein classification and remote homology detection. These kernels use probabilistic profiles, such as those produced by the PSI-BLAST algorithm, to define position-dependent mutation neighborhoods along protein sequences for inexact matching of(More)
Automated prediction of bacterial protein subcellular localization is an important tool for genome annotation and drug discovery. PSORT has been one of the most widely used computational methods for such bacterial protein analysis; however, it has not been updated since it was introduced in 1991. In addition, neither PSORT nor any of the other computational(More)
In traditional data clustering, similarity of a cluster of objects is measured by pairwise similarity of objects in that cluster. We argue that such measures are not appropriate for transactions that are sets of items. We propose the notion of <italic>large items</italic>, i.e., items contained in some minimum fraction of transactions in a cluster, to(More)
In this paper, we present a novel algorithm Opportune Project for mining complete set of frequent item sets by projecting databases to grow a frequent item set tree. Our algorithm is fundamentally different from those proposed in the past in that it opportunistically chooses between two different structures, array-based or tree-based, to represent projected(More)
Many semistructured objects are similarly, though not identically, structured. We study the problem of discovering \typical" substructures of a collection of semistructured objects. The discovered structures can serve the following purposes: (a) the \table-of-contents" for gaining general information of a source, (b) a road map for browsing and querying(More)
Atrial fibrillation (AF) is the most common sustained arrhythmia. Previous studies have identified several genetic loci associated with typical AF. We sought to identify common genetic variants underlying lone AF. This condition affects a subset of individuals without overt heart disease and with an increased heritability of AF. We report a meta-analysis of(More)
Load curve data refers to the electric energy consumption recorded by meters at certain time intervals at delivery points or end user points, and contains vital information for day-to-day operations, system analysis, system visualization, system reliability performance, energy saving and adequacy in system planning. Unfortunately, it is unavoidable that(More)
As potential activators of brown adipose tissue (BAT), mild cold exposure and sympathomimetic drugs have been considered as treatments for obesity and diabetes, but whether they activate the same pathways is unknown. In 10 healthy human volunteers, we found that the sympathomimetic ephedrine raised blood pressure, heart rate, and energy expenditure, and(More)