Ana-Maria Popescu

Learn More
The KNOWITALL system aims to automate the tedious process of extracting large collections of facts (e.g., names of scientists or politicians) from the Web in an unsupervised, domain-independent, and scalable manner. The paper presents an overview of KNOWITALL’s novel architecture and design principles, emphasizing its distinctive ability to extract(More)
Manually querying search engines in order to accumulate a large bodyof factual information is a tedious, error-prone process of piecemealsearch. Search engines retrieve and rank potentially relevantdocuments for human perusal, but do not extract facts, assessconfidence, or fuse information from multiple documents. This paperintroduces KnowItAll, a system(More)
The need for Natural Language Interfaces (NLIs) to databases has become increasingly acute as more nontechnical people access information through their web browsers, PDAs and cell phones. Yet NLIs are only usable if they map natural language questions to SQL queries <i>correctly</i>. We introduce the <sc>Precise</sc> NLI [2], which reduces the semantic(More)
This paper addresses the task of user classification in social media, with an application to Twitter. We automatically infer the values of user attributes such as political orientation or ethnicity by leveraging observable information such as the user behavior, network structure and the linguistic content of the user’s Twitter feed. We employ a machine(More)
Computing the pairwise semantic similarity between all words on the Web is a computationally challenging task. Parallelization and optimizations are necessary. We propose a highly scalable implementation based on distributional similarity, implemented in the MapReduce framework and deployed over a 200 billion word crawl of the Web. The pairwise similarity(More)
More and more technologies are taking advantage of the explosion of social media (Web search, content recommendation services, marketing, ad targeting, etc.). This paper focuses on the problem of automatically constructing user profiles, which can significantly benefit such technologies. We describe a general and robust machine learning framework for(More)
In this paper we address the curve completion problem, e.g., the geometric continuation of boundaries of objects which are temporarily interrupted by occlusion. Also known as the gap completion or shape completion problem, this problem is a significant element of perceptual grouping of edge elements and has been approached by using cubic splines or biarcs(More)
Given sensors to detect object use, commonsense priors of object usage in activities can reduce the need for labeled data in learning activity models. It is often useful, however, to understand how an object is being used, i.e., the action performed on it. We show how to add personal sensor data (e.g., accelerometers) to obtain this detail, with little(More)
Our KNOWITALL system aims to automate the tedious process of extracting large collections of facts (e.g., names of scientists or politicians) from the Web in an autonomous, domain-independent, and scalable manner. In its first major run, KNOWITALL extracted over 50,000 facts with high precision, but suggested a challenge: How can we improve KNOWITALL’s(More)