With Digital Download
"Modern Data Science with R is one of the first textbooks to provide a comprehensive introduction to data science for students at the undergraduate level (it is also suitable for graduate students and professionals in other fields). The authors follow the approach taken by Garrett Grolemund and Hadley Wickham in their book, R for Data Science, and David Robinson in Teach the Tidyverse to Beginners, which emphasizes the teaching of data visualization and the tidyverse (using dplyr and chained pipes) before covering base R, along with using real-world data and modern data science methods. The textbook includes end of chapter exercises (an instructor's solution manual is available), and a series of lab activities is also under development. The result is an excellent textbook that provides a solid foundation in data science for students and professionals alike... Modern Data Science with R is a breakthrough textbook." ~ ACM SIGACT News "Only about 60 of the book's 551 pages address the questions of uncertainty and inference that constitute the core of the statistics tradition. The remaining pages attend the other components of working with data-the import, wrangling, tidying, visualization, and storage-that are often the more prominent barriers to understanding modern datasets...Modern Data Science with R is a landmark: the first full textbook in data science. (It can serve) as the backbone of a semester-long course targeted at students with little background in statistics or computing. It is rich with examples and is guided by a strong narrative voice. What's more, it presents an organizing framework that makes a convincing argument that data science is a course distinct from applied statistics...By using the tidyverse, the textbook authors are able to seamlessly interweave a conceptual framework for data science with the corresponding implementation in R code....Even though this book is heavily dependent on R, readers come away with a more general natural language with which to talk and think about data. Indeed, if R were to cease to exist tomorrow, these readers would still be well-situated to be data scientists. In a nutshell, that approach is what makes this such a successful textbook." ~The American Statistician "Baumer, Kaplan, and Horton have managed to write a book that will serve a huge variety of educators while being endlessly interesting and useful to students of a modern era. Modern Data Science in R is a compilation of ideas from both ends of the data science and statistics spectrum-tools for setting up databases and working with regular expressions are intermixed with fundamentals like regression analysis. Additionally, the authors pull together fantastic examples from the scientific community as well as the media at large. Their examples will engage today's students into understanding why data wrangling, reproducibility, and ethics are a fundamental part of any data analysis. Good visualization skills (Tukey) and ethical analyses (Hoff, "How to Lie with Statistics") are not new ideas. However, they have recently been lost in the drive for more sophisticated mathematical and computational methods for working with data. Baumer et al. modernize the need for good visualization and communication in ways that will resonate with today's practitioners. Like Wickham's "ggplot2" and "The Elements of Statistical Learning" by Hastie et al., "Modern Data Science in R" promises to be a staple on every data analyst's bookshelf. Accessible to students and a valuable resource for those who have been in the field for many years, this book promises to be a treasure you will want to discover." ~ Jo Hardin, Pomona College "This book would be an excellent text book for an introductory data science course. Many academic institutions are now trying to open data science programs. But, there is not a good text book available for data science courses."
Benjamin S. Baumer is an assistant professor in the Statistical & Data Sciences program at Smith College. He has been a practicing data scientist since 2004, when he became the first full-time statistical analyst for the New York Mets. Ben is a co-author of The Sabermetric Revolution and won the 2016 Contemporary Baseball Analysis Award from the Society for American Baseball Research. Daniel T. Kaplan is the DeWitt Wallace professor of mathematics and computer science at Macalester College. He is the author of several textbooks on statistical modeling and statistical computing, and received the 2006 Macalester Excellence in Teaching award. Nicholas J. Horton is a professor of statistics at Amherst College. He is a Fellow of the American Statistical Association (ASA), member of the NRC Committee on Applied and Theoretical Statistics, recipient of a number of national teaching awards, author of a series of books on statistical computing, and actively involved in curricular reform to help students "think with data."
This site includes additional resources: http://mdsr-book.github.io/ Introduction to Data Science Prologue: Why data science? Data visualization A grammar for graphics Data wrangling Tidy data and iteration Professional Ethics Statistics and Modeling Statistical foundations Statistical learning and predictive analytics Unsupervised learning Simulation Topics in Data Science Interactive data graphics Database querying using SQL Database administration Working with spatial data Text as data Network science Epilogue: Towards \big data" Appendices Packages used in this book Introduction to R and RStudio Algorithmic thinking Reproducible analysis and workflow Regression modeling Setting up a database server