SKU/Artículo: AMZ-B0G6XH233R

Feature Engineering with Spark ML Transformers: Building Scalable, Testable, and Maintainable Feature Engines for Big Data

Format:

Kindle

Kindle

Paperback

Detalles del producto
Disponibilidad:
En stock
Peso con empaque:
0.15 kg
Devolución:
Condición
Nuevo
Producto de:
Amazon
Viaja desde
USA

Sobre este producto
  • Most machine learning projects don’t fail because of bad models. They fail because feature engineering logic—often written as fragile scripts—breaks under scale, drifts between training and serving, or becomes impossible to test and maintain. This book addresses that failure point directly by treating features as software, not as disposable preprocessing code.Rather than focusing on machine learning theory, this book focuses on architecture, correctness, and operational discipline. It shows how to design feature pipelines using Apache Spark’s ML Transformer and Estimator abstractions so that the exact same logic used during training can be safely reused in batch jobs, streaming systems, and real-time inference.You’ll learn how to think in Spark’s native execution model—DAGs, lazy evaluation, and optimizer behavior—so your pipelines remain efficient at scale. You’ll build feature pipelines as composable, persistable artifacts using pyspark.ml, eliminating training-serving skew and reducing rewrite cycles between experimentation and production. Throughout the book, emphasis is placed on testability, schema safety, null handling, and metadata propagation, all critical requirements for enterprise systems that process billions of records.The book goes beyond basic Spark ML usage and dives into advanced, real-world concerns: extending Spark with custom Transformers, enforcing strict data contracts, designing pipelines that survive data drift, and validating feature behavior through unit tests and property-based testing. It also bridges the gap to operations by covering persistence, versioning, and deployment patterns that integrate cleanly with modern MLOps workflows and feature stores.This book is written for software engineers moving into machine learning systems, data engineers responsible for production pipelines, and MLOps engineers who need guarantees—not experiments. It assumes familiarity with Python and Spark, and it rewards readers who care about correctness, performance, and long-term maintainability.

Producto prohibido

Este producto no está disponible

Este producto viaja de USA a tus manos en