Learn More
Apache Spark is a popular open-source platform for large-scale data processing that is well-suited for iterative machine learning tasks. In this paper we present MLlib, Spark’s open-source distributed machine learning library. MLlib provides efficient functionality for a wide range of learning settings and includes several underlying statistical,(More)
We describe LASER, a scalable response prediction platform currently used as part of a social network advertising system. LASER enables the familiar logistic regression model to be applied to very large scale response prediction problems, including ones beyond advertising. Though the underlying model is well understood, we apply a whole-system approach to(More)
We study the computational complexity of the recently proposed nubots model of molecular-scale self-assembly. The model generalises asynchronous cellular automata to have non-local movement where large assemblies of molecules can be moved around, analogous to millions of molecular motors in animal muscle effecting the rapid movement of macroscale arms and(More)
Tendencies of individuals to behave like those around them leads to cascading phenomenon, in which an idea or behavior spreads quickly throughout a social network, being adopted by nearly all individuals in an area. We crawl the Twitter social graph and monitor users' posts, or 'tweets,' for several weeks, monitoring the spread of keywords, or 'hashtags,'(More)
Reliable predictions on the risk and survival time of prostate cancer patients based on their clinical records can help guide their treatment and provide hints about the disease mechanism. The Cox regression is currently a commonly accepted approach for such tasks in clinical applications. More complex methods, like ensemble approaches, have the potential(More)
In recommender systems based on low-rank factorization of a partially observed user-item matrix, a common phenomenon that plagues many otherwise effective models is the interleaving of good and spurious recommendations in the top-<i>K</i> results. A single spurious recommendation can dramatically impact the perceived quality of a recommender system.(More)
Definition 11.1.2 (Volume of a version space) The volume of the version space is the total prior probability mass of its hypotheses. Thus, given a prior Pr[·] on H, we have vol(V) = h∈V Pr[h] (11.1.2) Recall from last lecture that the goal of the myopic strategy is to maximize the expected shrinkage of the version space. Formally, this can be done by(More)
  • 1