Information filtering based on wiki index database

  title={Information filtering based on wiki index database},
  author={Alexander V. Smirnov and Andrew Krizhanovsky},
In this paper we present a profile-based approach to information filtering by an analysis of the content of text documents. The Wikipedia index database is created and used to automatically generate the user profile from the user document collection. The problem-oriented Wikipedia subcorpora are created (using knowledge extracted from the user profile) for each topic of user interests. The index databases of these subcorpora are applied to filtering information flow (e.g., mails, news). Thus… 

Figures and Tables from this paper

On the problem of Wiki texts indexing
A new method for indexing Wikipedia texts in three languages: Russian, English, and German, is proposed and implemented, and the architecture of the indexing system, including the software components GATE and Lemmatizer, is considered.
Index wiki database: design and experiments
The software architectural model for indexing wiki texts in three languages (Russian, English, and German) and the interaction between the software components (GATE, Lemmatizer, and Synarcher) is described and the inverted file index database was designed using visual tool DBDesigner.
Learning Explainable User Sentiment and Preferences for Information Filtering
A sentiment-aware neighborhood model which integrates the sentiment of user comments with unary preferences, either through fixed or through learned mapping functions, and several content-based methods based on semantic similarities under presence or absence of preferences are proposed.
Content-based Recommender Systems: State of the Art and Trends
The role of User Generated Content is described as a way for taking into account evolving vocabularies, and the challenge of feeding users with serendipitous recommendations, that is to say surprisingly interesting items that they might not have otherwise discovered.
Combining content with user preferences for non-fiction multimedia recommendation: a study on TED lectures
A new dataset is introduced and several methods for the recommendation of non-fiction audio visual material, namely lectures from the TED website, are compared, using cross-validation to select the best parameters of keyword-based (TFIDF) and semantic vector space-based methods.
Discovery of usage based item similarities to support recommender systems in dealing with rarely used items
Recommender systems already are a consistent part in the life of most people regularly using the internet. They get recommendations when they shop at, when they watch video clips on
On enhancing recommender systems by utilizing general social networks combined with users goals and contextual awareness
The proposed solutions cannot be extended directly to General Purpose Social Networks like Facebook and Twitter which are open social networks where users can do a variety of useful actions that can be useful for recommendation, but as they can’t rate items, these information are not possible to be used in recommender systems.


Integrating Semantic Knowledge into Text Similarity and Information Retrieval
It is found that integrating lexical semantic knowledge improves performance for both tasks: ad-hoc information retrieval and text similarity.
Mining Domain-Specific Thesauri from Wikipedia: A Case Study
It is shown how the classic thesaurus structure of terms and links can be mined automatically from Wikipedia, and it is found that Wikipedia contains a substantial proportion of its concepts and semantic relations.
Synonym search in Wikipedia: Synarcher
Adapted HITS algorithm for synonym search, program architecture, and program work evaluation with test examples are presented in the paper.
Exploiting Synergy Between Ontologies and Recommender Systems
This paper investigates the synergy between a web-based research paper recommender system and an ontology containing information automatically extracted from departmental databases available on the web, and the ontology's interest-acquisition problem.
WikiRelate! Computing Semantic Relatedness Using Wikipedia
This work presents experiments on using Wikipedia for computing semantic relatedness and compares it to WordNet on various benchmarking datasets, and shows that Wikipedia outperforms WordNet when applied to the largest available dataset designed for that purpose.
Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis
This work proposes Explicit Semantic Analysis (ESA), a novel method that represents the meaning of texts in a high-dimensional space of concepts derived from Wikipedia that results in substantial improvements in correlation of computed relatedness scores with human judgments.
An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet
This paper presents an adaptation of Lesk's dictionary-based word sense disambiguation algorithm that uses the lexical database WordNet as the source of glosses for this approach, and attains an overall accuracy of 32%.
Automatic Assignment of Wikipedia Encyclopedic Entries to WordNet Synsets
An approach taken for automatically associating entries from an on-line encyclopedia with concepts in an ontology or a lexical semantic network is described, which will be applied to enriching ontologies with encyclopedic knowledge.
A Fuzzy Linguistic Multi-agent Model for Information Gathering on the Web Based on Collaborative Filtering Techniques
A fuzzy linguistic multi-agent model that incorporates information filtering techniques in its structure, i.e., a collaborative filtering agent is described, in such a way that the information filtering possibilities of multi- agent system on the Web are increased and its retrieval results are improved.
Using WordNet to Improve User Modelling in a Web Document Recommender System
There is disclosed a combined support and locator for underground fixtures intended to be buried which includes a support base formed of a moldable material, metallic means embedded within the support base and having an elliptical configuration.