This document discusses engineering machine learning data pipelines and addresses five big challenges: 1) scattered and difficult to access data, 2) data cleansing at scale, 3) entity resolution, 4) tracking data lineage, and 5) ongoing real-time changed data capture and streaming. It presents DMX Change Data Capture as a solution to capture changes from various data sources and replicate them in real-time to targets like Kafka, HDFS, databases and data lakes to feed machine learning models. Case studies demonstrate how DMX-h has helped customers like a global hotel chain and insurance and healthcare companies build scalable data pipelines.