Learn More
Treatment of pediatric acute lymphoblastic leukemia (ALL) is based on the concept of tailoring the intensity of therapy to a patient's risk of relapse. To determine whether gene expression profiling could enhance risk assignment, we used oligonucleotide microarrays to analyze the pattern of genes expressed in leukemic blasts from 360 pediatric ALL patients.(More)
Feature selection plays an important role in classification. We present a comparative study on six feature selection heuristics by applying them to two sets of data. The first set of data are gene expression profiles from Acute Lymphoblastic Leukemia (ALL) patients. The second set of data are proteomic patterns from ovarian cancer patients. Based on(More)
Human ESCs (hESCs) respond to signals that determine their pluripotency, proliferation, survival, and differentiation status. In this report, we demonstrate that phosphatidylinositol 3-kinase (PI3K) antagonizes the ability of hESCs to differentiate in response to transforming growth factor beta family members such as Activin A and Nodal. Inhibition of PI3K(More)
Clear cell renal cell carcinoma (ccRCC) is the predominant RCC subtype, but even within this classification, the natural history is heterogeneous and difficult to predict. A sophisticated understanding of the molecular features most discriminatory for the underlying tumor heterogeneity should be predicated on identifiable and biologically meaningful(More)
Many semistructured objects are similarly, though not identically, structured. We study the problem of discovering \typical" substructures of a collection of semistructured objects. The discovered structures can serve the following purposes: (a) the \table-of-contents" for gaining general information of a source, (b) a road map for browsing and querying(More)
We introduce a new method, called CS4, to construct committees of decision trees for classification. The method considers different top-ranked features as the root nodes of member trees. This idea is particularly suitable for dealing with high-dimensional bio-medical data as top-ranked features in this type of data usually possess similar merits for(More)
MicroRNA regulate mRNA levels in a tissue specific way, either by inducing degradation of the transcript or by inhibiting translation or transcription. Putative mRNA targets of microRNA identified from seed sequence matches are available in many databases. However, such matches have a high false positive rate and cannot identify tissue specificity of(More)
We describe a methodology, as well as some related data mining tools, for analyzing sequence data. The methodology comprises three steps: (a) generating candidate features from the sequences, (b) selecting relevant features from the candidates, and (c) integrating the selected features to build a system to recognize specific properties in sequence data. We(More)
MOTIVATIONS AND RESULTS For classifying gene expression profiles or other types of medical data, simple rules are preferable to non-linear distance or kernel functions. This is because rules may help us understand more about the application in addition to performing an accurate classification. In this paper, we discover novel rules that describe the gene(More)