Learn More
However, many text mining applications do not have adequate natural language processing ability beyond simple keyword indexing, and as a result, there are too many textual elements (words) included in the analysis. We argue that noun phrases as textual elements are better suited for text mining and could provide more discriminating power, than single words.(More)
Effectiveness and efficiency of searching and returned results presentation is the key to a search engine. Before downloading and examining the document text, users usually first judge the relevance of a return hit to the query by looking at document metadata presented in the return result. However, the metadata coming with the return hit is usually not(More)
Automated medical concept recognition is important for medical informatics such as medical document retrieval and text mining research. In this paper, we present a software tool called keyphrase identification program (KIP) for identifying topical concepts from medical documents. KIP combines two functions: noun phrase extraction and keyphrase(More)
In this paper, we propose the first real time rumor debunking algorithm for Twitter. We use cues from 'wisdom of the crowds', that is, the aggregate 'common sense' and investigative journalism of Twitter users. We concentrate on identification of a rumor as an event that may comprise of one or more conflicting microblogs. We continue monitoring the rumor(More)
This paper presents a hybrid concept hierarchy development technique for web returned documents retrieved by a meta-search engine. The aim of the technique is to separate the initial retrieved documents into topical oriented categories, prior to the actual concept hierarchy generation. The topical categories correspond to different semantic aspects of the(More)
We report a keyphrase identification program (KIP), which uses sample human keyphrases and then learns to identify additional new keyphrases. KIP first populates its database using manually identified keyphrases; each keyphrase is preprocessed and assigned an initial weight. It then extracts noun phrases from documents. All noun phrases will be assigned a(More)
The paper presents a hybrid technique for the classification of Web returned hits into concept hierarchies. The technique involves a combination of manual and automatic classifiers. At first, all Web returned documents are assigned to human defined categories using manual classifiers, and then automatic classifiers are used to generate a concept hierarchy(More)