Rohini K. Srihari

Learn More
Like many purely data-driven machine learning methods, Support Vector Machine (SVM) classifiers are learned exclusively from the evidence presented in the training dataset; thus a larger training dataset is required for better performance. In some applications, there might be human knowledge available that, in principle, could compensate for the lack of(More)
A number of feature selection metrics have been explored in text categorization, among which information gain (IG), chi-square (CHI), correlation coefficient (CC) and odds ratios (OR) are considered most effective. CC and OR are one-sided metrics while IG and CHI are two-sided. Feature selection using one-sided metrics selects the features most indicative(More)
This paper presents experiments on subjectivity and polarity classifications of topicand genre-independent blog posts, making novel use of a linguistic feature, verb class information, and of an online resource, the Wikipedia dictionary, for determining the polarity of adjectives. Each post from a blog is classified as objective, positive, or negative. Our(More)
Merchants selling products on the Web often ask their customers to share their opinions and hands-on experiences on products they have purchased. Unfortunately, reading through all customer reviews is difficult, especially for popular items, the number of reviews can be up to hundreds or even thousands. This makes it difficult for a potential customer to(More)
Color histogram is an important technique for color image database indexing and retrieving. In this paper, traditional color histogram is modified to capture spatial layout information of each color and three types spatial color histograms are introduced: annular, angular and hybrid color histograms. Experiments show that with a proper trade-off between the(More)
Ambiguity is very high for location names. For example, there are 23 cities named ‘Buffalo’ in the U.S. Country names such as ‘Canada’, ‘Brazil’ and ‘China’ are also city names in the USA. Almost every city has a Main Street or Broadway. Such ambiguity needs to be handled before we can refer to location names for visualization of related extracted events.(More)
Ambiguity is very high for location names. For example, there are 23 cities named ‘Buffalo’ in the U.S. Based on our previous work, this paper presents a refined hybrid approach to geographic references using our information extraction engine InfoXtract. The InfoXtract location normalization module consists of local pattern matching and discourse(More)
We propose a new framework termed <i>Keyblock</i> for content-based image retrieval, which is a generalization of the text-based information retrieval technology in the image domain. In this framework, methods for extracting comprehensive image features are provided, which are based on the frequency of representative blocks, termed keyblocks, of the image(More)