Gaston L'Huillier

Learn More
Plagiarism detection has been considered as a classification problem which can be approximated with intrinsic strategies, considering self-based information from a given document, and external strategies, considering comparison techniques between a suspicious document and different sources. In this work, both intrinsic and external approaches for plagiarism(More)
Web mining has been traditionally used in different application domains in order to enhance the content that Web users are accessing. Likewise, Website administrators are interested in finding new approaches to improve their Website content according to their users’ preferences. Furthermore, the Semantic Web has been considered as an alternative to(More)
The study of extremist groups and their interaction is a crucial task in order to maintain homeland security and peace. Tools such as social networks analysis and text mining have contributed to their understanding in order to develop counter-terrorism applications. This work addresses the topic-based community key-members extraction problem, for which our(More)
Phishing email fraud has been considered as one of the main cyber-threats over the last years. Its development has been closely related to social engineering techniques, where different fraud strategies are used to deceit a naïve email user. In this work, a latent semantic analysis and text mining methodology is proposed for the characterisation of(More)
This paper addresses the problem of probability estimation in multiclass classification tasks combining two well known data mining techniques: support vector machines and neural networks. We present an algorithm which uses both techniques in a two-step procedure. The first step employs support vector machines within a one-vs-all reduction from multiclass to(More)
Nowadays, plagiarism has been presented as one of the main distresses that the information technology revolution has lead into our society for which using pattern matching algorithms and intelligent data analysis approaches, these practices could be identified. Furthermore, a fast document copy detection algorithm could be used in large scale applications(More)
In adversarial systems, the performance of a classifier decreases after it is deployed, as the adversary learns to defeat it. Recently, adversarial data mining was introduced as a solution to this, where the classification problem is viewed as a game mechanism between an adversary and an intelligent and adaptive classifier. Over the last years, phishing(More)
The retrieval of similar documents from the Web using documents as input instead of key-term queries is not currently supported by traditional Web search engines. One approach for solving the problem consists of fingerprint the document's content into a set of queries that are submitted to a list of Web search engines. Afterward, results are merged, their(More)
http://dx.doi.org/10.1016/j.inffus.2014.01.006 1566-2535/ 2014 Elsevier B.V. All rights reserved. ⇑ Corresponding author. Tel.: +56 2 2978 4834; fax: +56 2 2689 7895. E-mail addresses: rduenas@ing.uchile.cl (R. Dueñas-Fernández), jvelasqu@ dii.uchile.cl (J.D. Velásquez), gaston@groupon.com (G. L’Huillier). URL: http://wi.dii.uchile.cl/ (J.D. Velásquez).(More)