Learn More
Online labor markets, such as Amazon's Mechanical Turk, have been used to crowdsource simple, short tasks like image labeling and transcription. However, expert knowledge is often lacking in such markets, making it impossible to complete certain classes of tasks. In this work we introduce an alternative mechanism for crowdsourcing tasks that require(More)
We analyze the "image" of a given query word in a given corpus of text news by producing a short list of other words with which this query is strongly associated. We use a number of feature selection schemes for text classification to help in this task. We apply these classification techniques using indicators of the query word's appearance in each document(More)
The cellular system is the world's largest network, providing service to over five billion people. Operators of these networks face fundamental trade-offs in coverage, capacity and operating power. These trade-offs, when coupled with the reality of infrastructure in poorer areas, mean that upwards of a billion people lack access to this fundamental service.(More)
News media plays a significant role in the course of events, political and otherwise. As the amount of news available grows, the task of understanding it grows more difficult for concerned citizens, media analysts, and decision makers alike. In this paper we adapt scalable and sparse statistical techniques to perform a new form of document summarization:(More)
In this paper we propose a general framework for topic-specific summarization of large text corpora and illustrate how it can be used for the analysis of news databases. Our framework, concise comparative summarization (CCS), is built on sparse classification methods. CCS is a lightweight and flexible tool that offers a compromise between simple word(More)
Low-rank matrix approximation can be used not just for greater computational efficiency or robustness, but also increasing data interpretability. We propose using sparse principal component analysis (PCA) for summarizing large corpora of text documents. When made substantially sparse, i.e. with cardinalities of no more than ten features, the principal(More)
In this paper we propose a general framework for topic-specific summa-rization of large text corpora and illustrate how it can be used for the analysis of news databases. Our framework, concise comparative summarization (CCS), is built on sparse classification methods. CCS is a lightweight and flexible tool that offers a compromise between simple word(More)
The cellular system is the world's largest network, providing service to over five billion people. Operators of these networks face fundamental trade-offs in coverage, capacity and operating power. These trade-offs, when coupled with the reality of infrastructure in poorer areas, mean that upwards of a billion people lack access to this fundamental service.(More)
Cellular phones and related network equipment comprise the world's largest network, providing service to over five billion unique users. Operators of these networks face fundamental trade-offs in coverage, capacity and operating power. These trade-offs, when coupled with the reality of infrastructure in poorer areas, mean that upwards of a billion people(More)
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to(More)