Learn More
Twitter spam detection is a recent area of research in which most previous works had focused on the identification of malicious user accounts and honeypot-based approaches. However, in this paper we present a methodology based on two new aspects: the detection of spam tweets in isolation and without previous information of the user; and the application of a(More)
This paper applies a language model approach to different sources of information extracted from a Web page, in order to provide high quality indicators in the detection of Web Spam. Two pages linked by a hyperlink should be topically related, even though this were a weak contextual relation. For this reason we have analysed different sources of information(More)
Keywords: Web technologies Tourist information system Ubiquitous computing Mobile computing Collaborative process integration Semantic web services Emerging technologies a b s t r a c t Despite the recent advances in mobile tourism systems, most of the wayfinding applications have still to deal with some problems: a huge amount of tourist information to(More)
In this short note we present a recommendation system for automatic retrieval of broken Web links using an approach based on contextual information. We extract information from the context of a link such as the anchor text, the content of the page containing the link, and a combination of the cache page in some search engine and web archive, if it exists.(More)
The mesoscopic structure of complex networks has proven a powerful level of description to understand the linchpins of the system represented by the network. Nevertheless, the mapping of a series of relationships between elements, in terms of a graph, is sometimes not straightforward. Given that all the information we would extract using complex network(More)
Web spam is a serious problem for search engines because the quality of their results can be severely degraded by the presence of this kind of page. In this paper, we present an efficient spam detection system based on a classifier that combines new link-based features with language-model (LM)-based ones. These features are not only related to quantitative(More)
Because of the sheer volume of information available in FLOSS repositories, simple analysis have to face the problems of filtering the relevant information. Hence, it is essential to apply method-ologies that highlight that information for a given aspect of the project. In this paper, some techniques from the social sciences have been used on data from(More)
In the web pages accessed when navigating throughout Inter-net, or even in our own web pages, we sometimes find links which are not valid any more. The search of the right web pages which correspond to those links is often hard. In this work we have analyzed different sources of information to automatically recover broken web links so that the user can be(More)
In this work we compare different techniques to automatically find candidate web pages to substitute broken links. We extract information from the anchor text, the content of the page containing the link, and the cachepage in somedigital library.The selected information is processed and submitted to a search engine. We have compared different information(More)
Broken hypertext links are a frequent problem in the Web. Sometimes the page which a link points to has disappeared forever, but in many other cases the page has simply been moved to another location in the same web site or to another one. In some cases the page besides being moved, is updated, becoming a bit different to the original one but rather(More)