Learn More
We report improved whole-genome shotgun sequences for the genomes of indica and japonica rice, both with multimegabase contiguity, or almost 1,000-fold improvement over the drafts of 2002. Tested against a nonredundant collection of 19,079 full-length cDNAs, 97.7% of the genes are aligned, without fragmentation, to the mapped super-scaffolds of one or the(More)
Mining frequent patterns from data streams has drawn increasing attention in recent years. However, previous mining algorithms were all focused on a single data stream. In many emerging applications, it is of critical importance to combine multiple data streams for analysis. For example, in real-time news topic analysis, it is necessary to combine multiple(More)
—Ensemble learning is a commonly used tool for building prediction models from data streams, due to its intrinsic merits of handling large volumes stream data. Despite of its extraordinary successes in stream data mining, existing ensemble models, in stream data environments, mainly fall into the ensemble classifiers category, without realizing that(More)
Recently regular expression matching has become a research focus as a result of the urgent demand for Deep Packet Inspection (DPI) in many network security systems. Deterministic Finite Automaton (DFA), which recognizes a set of regular expressions, is usually adopted to cater to the need for real-time processing of network traffic. However, the huge memory(More)
Filtering plays an important role in the Internet security and information retrieval fields, and usually employs multiple-strings matching algorithm as its key part. All the classical matching algorithms, however , perform badly when the number of the keywords exceeds a critical point, which made large scale multiple-strings matching problem a great(More)
Missing data commonly occurs in many applications. While many data imputation methods exist to handle the missing data problem for large scale databases, when applied to concept drifting data streams, these methods face some common difficulties. First, due to large and continuous data volumes, we are unable to maintain all stream records to form a candidate(More)