Qiang Lou

Learn More
Graph is an important data structure to model complex structural data, such as chemical compounds, proteins, and XML documents. Among many graph data-based applications, sub-graph search is a key problem, which is defined as given a query Q, retrieving all graphs containing Q as a sub-graph in the graph database. Most existing sub-graph search methods try(More)
The proposed feature selection method aims to find a minimum subset of the most informative variables for classifi-cation/regression by efficiently approximating the Markov Blanket which is a set of variables that can shield a certain variable from the target. Instead of relying on the conditional independence test or network structure learning, the new(More)
Genome-wide analysis of single nucleotide polymorphisms (SNP) can potentially be helpful in exploring the role of genetic variability in drug therapy. However, two major problems with such an analysis are the need for a large number of interrogated genomes, and the resulting high-dimensional data where the number of SNPs used as features is much larger than(More)
Previous studies have suggested that murine T cells are tolerant to epitopes derived from germ line variable regions of immunoglobulin (Ig) heavy (VH) or light chains. This has lead to the prediction that germ line VH-region epitopes found in neoplastic B cells cannot be used to provoke an antitumor immune response. To test these assumptions and address the(More)
Recent work has shown that the adversary's background knowledge is a very important factor in privacy-preserving data publishing. In this paper, we formalize background knowledge h of form "an individual X's sensitive value belongs to class C or range W. Through analyzing the drawbacks of previous approaches in dealing with this form of background(More)
—In order to more accurately predict an individual's health status, in clinical applications it is often important to perform analysis of high-dimensional gene expression data that varies with time. A major challenge in predicting from such temporal microarray data is that the number of biomarkers used as features is typically much larger than the number of(More)
Prediction models for multivariate spatio-temporal functions in geosciences are typically developed using supervised learning from attributes collected by remote sensing instruments collocated with the outcome variable provided at sparsely located sites. In such collocated data there are often large temporal gaps due to missing attribute values at sites(More)
Identifying informative biomarkers from a large pool of candidates is the key step for accurate prediction of an individual's health status. In clinical applications traditional static feature selection methods that flatten the temporal data cannot be directly applied since the patient's observed clinical condition is a temporal multivariate time series(More)