Learning to Match using Local and Distributed Representations of Text for Web Search
- Bhaskar Mitra, Fernando Diaz, Nick Craswell
- Computer ScienceThe Web Conference
- 26 October 2016
This work proposes a novel document ranking model composed of two separate deep neural networks, one that matches the query and the document using a local representation, and another that Matching with distributed representations complements matching with traditional local representations.
UMass at TREC 2004: Novelty and HARD
- N. A. Jaleel, James Allan, C. Wade
- Computer ScienceText Retrieval Conference
- 2004
The primary findings for passage retrieval are that document retrieval methods performed better than passage retrieval methods on the passage evaluation metric of binary preference at 12,000 characters, and that clarification forms improved passage retrieval for every retrieval method explored.
Processing Social Media Messages in Mass Emergency
- Muhammad Imran, Carlos Castillo, Fernando Diaz, Sarah Vieweg
- Computer ScienceACM Computing Surveys
- 25 July 2014
This survey surveys the state of the art regarding computational methods to process social media messages and highlights both their contributions and shortcomings, and methodically examines a series of key subproblems ranging from the detection of events to the creation of actionable and useful summaries.
Temporal profiles of queries
- R. Jones, Fernando Diaz
- Computer ScienceTOIS
- 1 July 2007
The results show that meta-features associated with a query can be combined with text retrieval techniques to improve the understanding and treatment of text search on documents with timestamps.
CrisisLex: A Lexicon for Collecting and Filtering Microblogged Communications in Crises
- Alexandra Olteanu, Carlos Castillo, Fernando Diaz, Sarah Vieweg
- Computer ScienceInternational Conference on Web and Social Media
- 16 May 2014
Using a crisis lexicon leads to substantial improvements in terms of recall when added to a set of crisis-specific keywords manually chosen by experts; it also helps to preserve the original distribution of message types.
Improving the estimation of relevance models using large external corpora
- Fernando Diaz, Donald Metzler
- Computer ScienceAnnual International ACM SIGIR Conference on…
- 6 August 2006
The results show that using a high quality corpus that is comparable to the evaluation corpus can be as, if not more, effective than using the web.
Sources of evidence for vertical selection
- Jaime Arguello, Fernando Diaz, Jamie Callan, J. Crespo
- Computer ScienceAnnual International ACM SIGIR Conference on…
- 19 July 2009
This work addresses the problem of vertical selection, predicting relevant verticals for queries issued to the search engine's main web search page by focusing on 18 different verticals, which differ in terms of semantics, media type, size, and level of query traffic.
Extracting information nuggets from disaster- Related messages in social media
- Muhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz, P. Meier
- Computer ScienceInternational Conference on Information Systems…
- 2013
This paper focuses on extracting valuable “information nuggets”, brief, self-contained information items relevant to disaster response, using automatic methods for extracting information from microblog posts that leverage machine learning methods for classifying posts and information extraction.
Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries
- Alexandra Olteanu, Carlos Castillo, Fernando Diaz, Emre Kıcıman
- Political ScienceFrontiers in Big Data
- 2 February 2018
A framework for identifying a broad range of menaces in the research and practices around social data is presented, including biases and inaccuracies at the source of the data, but also introduced during processing.
Query Expansion with Locally-Trained Word Embeddings
- Fernando Diaz, Bhaskar Mitra, Nick Craswell
- Computer ScienceAnnual Meeting of the Association for…
- 25 May 2016
It is demonstrated that word embeddings such as word2vec and GloVe, when trained globally, underperform corpus and query specific embeddlings for retrieval tasks, suggesting that other tasks benefiting from global embeddments may also benefit from local embeddins.
...
...