• Corpus ID: 61930801

Data Science from Scratch: First Principles with Python

  title={Data Science from Scratch: First Principles with Python},
  author={Joel Grus},
  • Joel Grus
  • Published 30 April 2015
  • Computer Science
Data science libraries, frameworks, modules, and toolkits are great for doing data science, but they're also a good way to dive into the discipline without actually understanding data science. In this book, you'll learn how many of the most fundamental data science tools and algorithms work by implementing them from scratch. If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science… 

Interrogating Data Science

This workshop gathers researchers and practitioners together to take a collective and critical look at data science work-practices, and at how those work- Practices make crucial and often invisible impacts on theformal work of data science.

DSWorkFlow: A Framework for Capturing Data Scientists’ Workflows

DSWorkFlow is a data collection framework that provides researchers with the ability to observe and analyze data scientists’ cognitive workflows as they develop predictive models and test three machine learning models to inform the extraction algorithms.

An Empirical Approach to Understanding Data Science and Engineering Education

This working group report shows an empirical and data-driven view of the data-related education landscape, and includes several recommendations for both academia and industry that are based on this analysis.

Looking at Data Science through the Lens of Scheduling and Load Balancing

This chapter proposes an analysis of scheduling and load balancing from the perspective of data science scenarios, and presents concepts, environments, and tools to summarize the theoretical background required to define, assign, and execute data science workflows.

A Big Data Primer

The aim of this chapter is to describe the history of big data and its characteristics—variety, velocity, and volume—and to serve as a big data primer. Many organizations are using big data to

AI education matters: a first introduction to modeling and learning using the data science workflow

Traditionally artificial intelligence (AI) and machine learning (ML) courses are taught at the senior and graduate level in higher-education computer science curricula following the mastery learning

Forgetting Practices in the Data Sciences

A taxonomy of data silences in data work is used to analyze how data workers forget, erase, and unknow aspects of data and an analytic vocabulary for future work in remembering, forgetting, and erasing in HCI and the data sciences is contributed.

Changing the Nature of Quantitative Biology Education: Data Science as a Driver.

Development of open curricula that extend beyond the job certification rhetoric and combine data acumen with modeling, experimental, and computational methods through engaging projects, while also providing awareness and deep exploration of their societal implications are suggested.

An Intermediate Representation for Optimizing Machine Learning Pipelines

Lara is presented, a declarative domainspecific language for collections and matrices with intermediate representation (IR) that reflects on the complete program, i.e., UDFs, control flow, and both data types, to enable holistic optimization of ML training pipelines.

Clearing up uncertainties in graduate programs candidate selection using a data science approach

This research presents a research on the construction of an artifact that clears up the selection of candidates for the programs, categorizing them according to the profiles of previous students, based on descriptive statistics and data analytics.