Denny Lee – författare
Visar alla böcker från författaren Denny Lee. Handla med fri frakt och snabb leverans.
4 produkter
4 produkter
Häftad, Engelska, 2020
583 kr
Skickas inom 5-8 vardagar
Data is getting bigger, arriving faster, and coming in varied formats-and it all needs to be processed at scale for analytics or machine learning. How can you process such varied data workloads efficiently? Enter Apache Spark. Updated to emphasize new features in Spark 2.4., this second edition shows data engineers and scientists why structure and unification in Spark matters. Specifically, this book explains how to perform simple and complex data analytics and employ machine-learning algorithms. Through discourse, code snippets, and notebooks, you'll be able to: Learn Python, SQL, Scala, or Java high-level APIs: DataFrames and Datasets Peek under the hood of the Spark SQL engine to understand Spark transformations and performance Inspect, tune, and debug your Spark operations with Spark configurations and Spark UI Connect to data sources: JSON, Parquet, CSV, Avro, ORC, Hive, S3, or Kafka Perform analytics on batch and streaming data using Structured Streaming Build reliable data pipelines with open source Delta Lake and Spark Develop machine learning pipelines with MLlib and productionize models using MLflow Use open source Pandas framework Koalas and Spark for data transformation and feature engineering
Häftad, Engelska, 2024
585 kr
Skickas inom 5-8 vardagar
Ready to simplify the process of building data lakehouses and data pipelines at scale? In this practical guide, learn how Delta Lake is helping data engineers, data scientists, and data analysts overcome key data reliability challenges with modern data engineering and management techniques.Authors Denny Lee, Tristen Wentling, Scott Haines, and Prashanth Babu (with contributions from Delta Lake maintainer R. Tyler Croy) share expert insights on all things Delta Lake--including how to run batch and streaming jobs concurrently and accelerate the usability of your data. You'll also uncover how ACID transactions bring reliability to data lakehouses at scale.This book helps you:Understand key data reliability challenges and how Delta Lake solves themExplain the critical role of Delta transaction logs as a single source of truthLearn the Delta Lake ecosystem with technologies like Apache Flink, Kafka, and TrinoArchitect data lakehouses with the medallion architectureOptimize Delta Lake performance with features like deletion vectors and liquid clustering
Häftad, Engelska, 2017
628 kr
Skickas inom 5-8 vardagar
Häftad, Engelska, 2018
565 kr
Skickas inom 5-8 vardagar