Truc-Vien T. Nguyen

Learn More
This paper explores the use of innovative kernels based on syntactic and semantic structures for a target relation extraction task. Syntax is derived from constituent and dependency parse trees whereas semantics concerns to entity types and lexical sequences. We investigate the effectiveness of such representations in the automated relation extraction from(More)
In this paper, we extend distant supervision (DS) based on Wikipedia for Relation Extraction (RE) by considering (i) relations defined in external repositories, e.g. YAGO, and (ii) any subset of Wikipedia documents. We show that training data constituted by sentences containing pairs of named entities in target relations is enough to produce reliable(More)
Supervised approaches to Relation Extraction (RE) are characterized by higher accuracy than unsupervised models. Unfortunately, their applicability is limited by the need of training data for each relation type. Automatic creation of such data using Distant Supervision (DS) provides a promising solution to the problem. In this paper, we study DS for(More)
The most fascinating advantage of the semantic web would be its capability of understanding and processing the contents of web pages automatically. Basically, the semantic web realization involves two main tasks: (1) Representation and management of a large amount of data and metadata for web contents; (2) Information extraction and annotation on web pages.(More)
In this paper, we present the methods for event clustering and classification defined by MediaEval 2013. For event clustering, the watershed-based method with external data sources is used. Based on two main observations, the whole metadata is turned into a user-time (UT) image, so that each row of an image contains all records that belong to one user; and(More)
In this paper, a watershed-based method with support from external data sources is proposed to detect Social Events defined by MediaEval 2012. This method is based on two main observations: (1) people cannot be involved in more than one event at the same time, and (2) people tend to introduce similar annotations for all images associated to the same event.(More)
Literature has seen a large amount of work on entity recognition and semantic disambiguation in text but very limited on the effect in noisy text data. In this paper, we present an approach for recognizing and disambiguating entities in text based on the high coverage and rich structure of an online encyclopedia. This work was carried out on a collection of(More)
We present novel kernels based on structured and unstructured features for reranking the N-best hypotheses of conditional random fields (CRFs) applied to entity extraction. The former features are generated by a polynomial kernel encoding entity features whereas tree kernels are used to model dependencies amongst tagged candidate examples. The experiments(More)
In this contribution, we propose a watershed-based method with support from external data sources and visual information to detect social events in web multimedia. The idea is based on two main observations: (1) people cannot be involved in more than one event at the same time, and (2) people tend to introduce similar annotations for all images associated(More)
We describe the experiments of the two learning algorithms for Named Entity Recognition. One implements Conditional Random Fields (CRFs), another makes use of Support Vector Machines (SVMs). Both are trained with a large number of features. While SVMs employ purely input features, CRFs also exploit statistical aspects in terms of unigram and bigram of both(More)