We present Darwin, an enabling technology for mobile phone sensing that combines collaborative sensing and classification techniques to reason about human behavior and context on mobile phones. Darwin advances mobile phone sensing through the deployment of efficient but sophisticated machine learning techniques specifically designed to run directly on sensor-enabled mobile phones (i.e., smartphones). Darwin tackles three key sensing and inference challenges that are barriers to mass-scale adoption of mobile phone sensing applications: (i) the human-burden of training classifiers, (ii) the ability to perform reliably in different environments (e.g., indoor, outdoor) and (iii) the ability to scale to a large number of phones without jeopardizing the "phone experience" (e.g., usability and battery lifetime). Darwin is a collaborative reasoning framework built on three concepts: classifier/model evolution, model pooling, and collaborative inference. To the best of our knowledge Darwin is the first system that applies distributed machine learning techniques and collaborative inference concepts to mobile phones. We implement the Darwin system on the Nokia N97 and Apple iPhone. While Darwin represents a general framework applicable to a wide variety of emerging mobile sensing applications, we implement a speaker recognition application and an augmented reality application to evaluate the benefits of Darwin. We show experimental results from eight individuals carrying Nokia N97s and demonstrate that Darwin improves the reliability and scalability of the proof-of-concept speaker recognition application without additional burden to users.