Unlock predictable bottom line growth through tailored data and AI strategies. In The Data & AI Imperative: Designing Strategies for Exponential Growth, celebrated data-driven growth leader, Lillian Pierson, delivers a masterclass in developing cu...
Lillian Pierson, P.E. is a data scientist, professional environmental engineer, and leading data science consultant to global leaders in IT, major governmental and non-governmental entities, prestigious media corporations, and not-for-profit technology groups.
Foreword xv Introduction 1 About This Book 2 Foolish Assumptions 2 Icons Used in This Book 3 Beyond the Book 3 Where to Go from Here 4 Part 1: Getting Started with Data Science 5 Chapter 1: Wrapping Your Head around Data Science 7 Seeing Who Can Make Use of Data Science 8 Analyzing the Pieces of the Data Science Puzzle 10 Collecting, querying, and consuming data 10 Applying mathematical modeling to data science tasks 11 Deriving insights from statistical methods 12 Coding, coding, coding - it's just part of the game 12 Applying data science to a subject area 12 Communicating data insights 14 Exploring the Data Science Solution Alternatives 14 Assembling your own in-house team 14 Outsourcing requirements to private data science consultants 15 Leveraging cloud-based platform solutions 15 Letting Data Science Make You More Marketable 16 Chapter 2: Exploring Data Engineering Pipelines and Infrastructure 17 Defining Big Data by the Three Vs 18 Grappling with data volume 18 Handling data velocity 18 Dealing with data variety 19 Identifying Big Data Sources 20 Grasping the Difference between Data Science and Data Engineering 21 Defining data science 21 Defining data engineering 22 Comparing data scientists and data engineers 23 Making Sense of Data in Hadoop 24 Digging into MapReduce 24 Stepping into real-time processing 26 Storing data on the Hadoop distributed file system (HDFS) 27 Putting it all together on the Hadoop platform 28 Identifying Alternative Big Data Solutions 28 Introducing massively parallel processing (MPP) platforms 29 Introducing NoSQL databases 29 Data Engineering in Action: A Case Study 30 Identifying the business challenge 30 Solving business problems with data engineering 32 Boasting about benefits 32 Chapter 3: Applying Data-Driven Insights to Business and Industry 33 Benefiting from Business-Centric Data Science 34 Converting Raw Data into Actionable Insights with Data Analytics 35 Types of analytics 35 Common challenges in analytics 36 Data wrangling 36 Taking Action on Business Insights 37 Distinguishing between Business Intelligence and Data Science 39 Business intelligence, defined 39 The kinds of data used in business intelligence 40 Technologies and skillsets that are useful in business intelligence 40 Defining Business-Centric Data Science 41 Kinds of data that are useful in business-centric data science 42 Technologies and skillsets that are useful in business-centric data science 43 Making business value from machine learning methods 43 Differentiating between Business Intelligence and Business-Centric Data Science 44 Knowing Whom to Call to Get the Job Done Right 45 Exploring Data Science in Business: A Data-Driven Business Success Story 46 Part 2: Using Data Science to Extract Meaning from Your Data 49 Chapter 4: Machine Learning: Learning from Data with Your Machine 51 Defining Machine Learning and Its Processes 51 Walking through the steps of the machine learning process 52 Getting familiar with machine learning terms 52 Considering Learning Styles 53 Learning with supervised algorithms 53 Learning with unsupervised algorithms 53 Learning with reinforcement 54 Seeing What You Can Do 54 Selecting algorithms based on function 54 Using Spark to generate real-time big data analytics 58 Chapter 5: Math, Probability, and Statistical Modeling 61 Exploring Probability and Inferential Statistics 62 Probability distributions 63 Conditional probability with Naive Bayes 65 Quantifying Correlation 66 Calculating correlation with Pearson's r 66 Ranking variable-pairs using Spearman's rank correlation 66 Reducing Data Dimensionality with Linear Algebra 67 Decomposing data to reduce dimensionality 67 Reducing dimensionality with factor analysis 69 Decreasing dimensionality and removing outliers with PCA 70 Modeling Decisions with Multi-Criteria