En-Shiun Annie Lee

Learn More
Discovering patterns from sequence data has significant impact in many aspects of science and society, especially in genomics and proteomics. Here we consider multiple strings as input sequence data and substrings as patterns. In the real world, usually a large set of patterns could be discovered yet many of them are redundant, thus degrading the output(More)
Discovering sequence patterns with variation can unveil functions of a protein family that are important for drug discovery. Exploring protein families using existing methods such as multiple sequence alignment is computationally expensive, thus pattern search, called motif finding in Bioinformatics, is used. However, at present, combinatorial algorithms(More)
Discovering sequence patterns with variations unveils significant functions of a protein family. Existing combinatorial methods of discovering patterns with variations are computationally expensive, and probabilistic methods require more elaborate probabilistic representation of the amino acid associations. To overcome these shortcomings, this paper(More)
Location-based advertising has gained a lot of attention from advertisers in recent years because of the rapid increase the number of in smart phone users. Adversarial location-based advertising is a novel concept that stretches beyond conventional mobile advertising. Advertisers can promote their own products based on location and also compete for(More)
Similarities and differences in protein sequence patterns can be used to reveal essential and class-specific functionality of protein families. Traditional supervised learning methods require class labels for classifying sequences but cannot reveal embedded patterns related to inherent functionality and taxonomical variations. We develop algorithm for(More)
A basic task in protein analysis is to discover a set of sequence patterns that reflect the function of a protein family. This set of sequence patterns contains non-exact significant residue associations. Currently, the existing combinatorial methods are computationally expensive and probabilistic methods require richer representation of the amino acid(More)
Proteins can be represented in several ways, including primary protein sequence, where the protein is represented as a string of amino acids, and three-dimensional structure, where the sequence is folded into a structure. By analyzing proteins from the same protein family, we can find conserved protein regions that are common within that protein family,(More)