Now that answering complex and compelling questions with data can make the difference in an election or a business model, data science is an attractive discipline. But how can you learn this wide-ranging, interdisciplinary field? With this book, youll get material from Columbia Universitys "Introduction to Data Science" class in an easy-to-follow format.

Each chapter-long lecture features a guest data scientist from a prominent company such as Google, Microsoft, or eBay teaching new algorithms, methods, or models by sharing case studies and actual code they use. Youll learn whats involved in the lives of data scientists and be able to use the techniques they present.

Guest lectures focus on topics such as:

  • Machine learning and data mining algorithms
  • Statistical models and methods
  • Prediction vs. description
  • Exploratory data analysis
  • Communication and visualization
  • Data processing
  • Big data
  • Programming
  • Ethics
  • Asking good questions
If youre familiar with linear algebra, probability and statistics, and have some programming experience, this book will get you started with data science.

Doing Data Science is collaboration between course instructor Rachel Schutt (also employed by Google) and data science consultant Cathy ONeil (former quantitative analyst for D.E. Shaw) who attended and blogged about the course.
"I enjoyed Rachel and Cathy's book, it's readable, informative, and like no other book I've read on the topic of statistics or data science." --Andrew GelmanProfessor of statistics and political science, and director of the Applied Statistics Center at Columbia University

Rachel Schutt is a Senior Statistician at Google Research in the New York office and adjunct assistant professor at Columbia University. She earned a PhD from Columbia University in statistics, and masters degrees in mathematics and operations research from the Courant Institute and Stanford University, respectively. Her statistical research interests include modeling and analyzing social networks, epidemiology, hierarchical modeling and Bayesian statistics. Her education-related research interests include curriculum design. Cathy O'Neil earned a Ph.D. in math from Harvard, was postdoc at the MIT math department, and a professor at Barnard College where she published a number of research papers in arithmetic algebraic geometry. She then chucked it and switched over to the private sector. She worked as a quant for the hedge fund D.E. Shaw in the middle of the credit crisis, and then for RiskMetrics, a risk software company that assesses risk for the holdings of hedge funds and banks. She is currently a data scientist on the New York start-up scene, writes a blog at, and is involved with Occupy Wall Street.