Doris Xin

Learn More
Apache Spark is a popular open-source platform for large-scale data processing that is well-suited for iterative machine learning tasks. In this paper we present MLlib, Spark's open-source distributed machine learning library. MLlib provides efficient functionality for a wide range of learning settings and includes several underlying statistical,(More)
We study the computational complexity of the recently proposed nubots model of molecular-scale self-assembly. The model generalises asynchronous cellular automata to have non-local movement where large assemblies of molecules can be moved around, analogous to millions of molecular motors in animal muscle effecting the rapid movement of macroscale arms and(More)
We describe LASER, a scalable response prediction platform currently used as part of a social network advertising system. LASER enables the familiar logistic regression model to be applied to very large scale response prediction problems, including ones beyond advertising. Though the underlying model is well understood, we apply a whole-system approach to(More)
Tendencies of individuals to behave like those around them leads to cascading phenomenon, in which an idea or behavior spreads quickly throughout a social network, being adopted by nearly all individuals in an area. We crawl the Twitter social graph and monitor users' posts, or 'tweets,' for several weeks, monitoring the spread of keywords, or 'hashtags,'(More)
  • 1