Learn More
Two classifiers-Support Vector Machine (SVM) and Conditional Random Fields (CRFs) are applied here for the recognition of biomedical named entities. According to their different characteristics, the results of two classifiers are merged to achieve better performance. We propose an automatic corpus expansion method for SVM and CRF to overcome the shortage of(More)
In many applications, the data of interest comprises multiple sequences that evolve over time. Examples include currency exchange rates, network traffic data. We develop a fast method to analyze such co-evolving time sequences jointly to allow (a) estimation/forecasting of miss-ing/delayed/future values, (b) quantitative data mining, and (c) outlier(More)
Fast indexing in time sequence databases for similarity searching has attracted a lot of research recently. Most of the proposals, however, typically centered around the Euclidean distance and its derivatives. We examine the problem of multi-modal similarity search in which users can choose the best one from multiple similarity models for their needs. In(More)
SUMMARY POSBIOTM-NER is a trainable biomedical named-entity recognition system. POSBIOTM-NER can be automatically trained and adapted to new datasets without performance degradation, using CRF (conditional random field) machine learning techniques and automatic linguistic feature analysis. Currently, we have trained our system on three different datasets.(More)
With the advent of ubiquitous computing, we can easily collect large-scale trajectory data, say, from moving vehicles. This paper studies pattern-matching problems for trajectory data over road networks, which complements existing efforts focusing on (1) a spatiotemporal window query for location-based service or (2) euclidean space with no restriction. In(More)
As multimedia applications spread widely, it is crucial for programming and design support systems to handle " time " in multimedia documents effectively and flexibly. This paper presents a set of interactive system support tools for designing and maintaining the temporal behavior of mul-timedia documents. The tool set provides mechanisms for anomaly(More)
Fast similarity searching in large time-sequence databases has attracted a lot of research interest 1, 5, 2, 6, 3, 10]. All of them use the Euclidean distance (L 2), or some variation of L p metrics. L p metrics lead to eecient indexing, thanks to feature extraction (e.g., by keeping the rst few DFT coeecients) and subsequent use of fast spatial access(More)