Dipankar Mazumdar - Böcker
Visar alla böcker från författaren Dipankar Mazumdar. Handla med fri frakt och snabb leverans.
2 produkter
2 produkter
Apache Iceberg: The Definitive Guide
Data Lakehouse Functionality, Performance, and Scalability on the Data Lake
Häftad, Engelska, 2024
504 kr
Skickas inom 7-10 vardagar
Traditional data architecture patterns are severely limited. To use these patterns, you have to ETL data into each tool-a cost-prohibitive process for making warehouse features available to all of your data. This lack of flexibility forces you to adjust your workflow to the tool your data is locked in, which creates data silos and data drift. This book shows you a better way.Apache Iceberg provides the capabilities, performance, scalability, and savings that fulfill the promise of an open data lakehouse. By following the lessons in this book, you'll be able to achieve interactive, batch, machine learning, and streaming analytics with this lakehouse. Authors Tomer Shiran, Jason Hughes, Alex Merced, and Dipankar Mazumdar from Dremio guide you through the process.With this book, you'll learn:The architecture of Apache Iceberg tablesWhat happens under the hood when you perform operations on Iceberg tablesHow to further optimize Apache Iceberg tables for maximum performanceHow to use Apache Iceberg with popular data engines such as Apache Spark, Apache Flink, and Dremio SonarHow Apache Iceberg can be used in streaming and batch ingestionDiscover why Apache Iceberg is a foundational technology for implementing an open data lakehouse.
Engineering Lakehouses with Open Table Formats
Build scalable and efficient lakehouses with Apache Iceberg, Apache Hudi, and Delta Lake
Häftad, Engelska, 2025
557 kr
Skickas inom 5-8 vardagar
Jump-start your journey toward mastering open data architectural patterns by learning the fundamentals and applications of open table formatsKey FeaturesBuild lakehouses with open table formats using compute engines such as Apache Spark, Flink, Trino, and PythonOptimize lakehouses with techniques such as pruning, partitioning, compaction, indexing, and clusteringFind out how to enable seamless integration, data management, and interoperability using Apache XTablePurchase of the print or Kindle book includes a free PDF eBookBook DescriptionEngineering Lakehouses with Open Table Formats provides detailed insights into lakehouse concepts, and dives deep into the practical implementation of open table formats such as Apache Iceberg, Apache Hudi, and Delta Lake.You’ll explore the internals of a table format and learn in detail about the transactional capabilities of lakehouses. You’ll also get hands on with each table format with exercises using popular computing engines, such as Apache Spark, Flink, Trino, and Python-based tools. The book addresses advanced topics, including performance optimization techniques and interoperability among different formats, equipping you to build production-ready lakehouses. With step-by-step explanations, you’ll get to grips with the key components of lakehouse architecture and learn how to build, maintain, and optimize them.By the end of this book, you’ll be proficient in evaluating and implementing open table formats, optimizing lakehouse performance, and applying these concepts to real-world scenarios, ensuring you make informed decisions in selecting the right architecture for your organization’s data needs.What you will learnExplore lakehouse fundamentals, such as table formats, file formats, compute engines, and catalogsGain a complete understanding of data lifecycle management in lakehousesLearn how to systematically evaluate and choose the right lakehouse table formatOptimize performance with sorting, clustering, and indexing techniquesUse the open table format data with ML frameworks like TensorFlow and MLflowInteroperate across different table formats with Apache XTable and UniFormSecure your lakehouse with access controls and ensure regulatory complianceWho this book is forThis book is for data engineers, software engineers, and data architects who want to deepen their understanding of open table formats, such as Apache Iceberg, Apache Hudi, and Delta Lake, and see how they are used to build lakehouses. It is also valuable for professionals working with traditional data warehouses, relational databases, and data lakes who wish to transition to an open data architectural pattern. Basic knowledge of databases, Python, Apache Spark, Java, and SQL is recommended for a smooth learning experience.