Alexandre Passos

Learn More
Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It(More)
There is rising interest in vector-space word embeddings and their use in NLP, especially given recent methods for their fast estimation at very large scale. Nearly all this work, however, assumes a single vector per word type—ignoring polysemy and thus jeopardizing their usefulness for downstream tasks. We present an extension to the Skip-gram model that(More)
Most state-of-the-art approaches for namedentity recognition (NER) use semi supervised information in the form of word clusters and lexicons. Recently neural network-based language models have been explored, as they as a byproduct generate highly informative vector representations for words, known as word embeddings. In this paper we present two(More)
Accurately segmenting a citation string into fields for authors, titles, etc. is a challenging task because the output typically obeys various global constraints. Previous work has shown that modeling soft constraints, where the model is encouraged, but not require to obey the constraints, can substantially improve segmentation performance. On the other(More)
Multitask learning algorithms are typically designed assuming some fixed, a priori known latent structure shared by all the tasks. However, it is usually unclear what type of latent task structure is the most appropriate for a given multitask learning problem. Ideally, the “right” latent task structure should be learned in a data-driven manner. We present a(More)
Inference of the document-specific topic distributions in latent Dirichlet allocation (LDA) [2] and decoding in compressed sensing [3] exhibit many similarities. Given a matrix and a noisy observed vector, the goal of both tasks is to recover a sparse vector that, when combined with the matrix, provides a good explanation of the noisy observed data. In the(More)
Behavioural responses of organisms are frequently affected by variation in resource availability. For eusocial insects, the nutritional status of the colony can modulate responses to chemical cues determining intra- and inter-colonial aggressiveness. Species co-occurrence in termites seems to be modulated by resource availability. Here, we tested the(More)