Michael Lamar

Learn More
Motivated by an application to unsupervised part-of-speech tagging, we present an algorithm for the Euclidean embedding of large sets of categorical data based on co-occurrence statistics. We use the CODE model of Globerson et al. but constrain the embedding to lie on a highdimensional unit sphere. This constraint allows for efficient optimization, even in(More)
We revisit the algorithm of Schütze (1995) for unsupervised part-of-speech tagging. The algorithm uses reduced-rank singular value decomposition followed by clustering to extract latent features from context distributions. As implemented here, it achieves state-of-the-art tagging accuracy at considerably less cost than more recent methods. It can also(More)
We present a novel approach to distributionalonly, fully unsupervised, POS tagging, based on an adaptation of the EM algorithm for the estimation of a Gaussian mixture. In this approach, which we call Latent-Descriptor Clustering (LDC), word types are clustered using a series of progressively more informative descriptor vectors. These descriptors, which are(More)
An operational protocol, appropriate for a tier 1 or tier 2 type relative risk evaluation of a site that has polycyclic aromatic hydrocarbon (PAH) or petroleum hydrocarbon impacted soils, was developed to estimate the fraction of anthropogenic hydrophobic hydrocarbons that will be released rapidly from such soils. The development of this protocol used over(More)
  • 1