Introducing Cloudera DataFlow (CDF) 2.13.19

© Cloudera, Inc. All rights reserved.
INTRODUCING
CLOUDERA DATAFLOW (CDF)
Dinesh Chandrasekhar
Product Marketing Lead, Data-in-Motion BU
Cloudera
@AppInt4All
George Vetticaden
Product Management Lead, Data-in-Motion BU
Cloudera
@gvetticaden

© Cloudera, Inc. All rights reserved. 2© Cloudera, Inc. All rights reserved.
Cloud
~$410 B
Streaming
~$1.65 B
Data Science
~$180 B
Big Data
~$210 B
IoT
~$1.2 T
MARKET OPPORTUNITIES

IOT MARKET
By 2024 more than 24.9 Billion IoT connections will be established
An estimated $70 billion will be spent by global manufacturers on
IoT solutions in 2020
An estimated 646 million healthcare devices (excluding fitness
trackers and wearable devices) will be connected by 2020
An estimated 78% of cars shipped globally will be built with
hardware that connects to the internet by 2020
50% of decision-makers in IT, services, utilities, and manufacturing
have either deployed IoT, or will deploy it in the next 12-24 months
$70B
646M
78%
50%
24.9B

KEY CUSTOMER CHALLENGES
Visibility: Lack visibility of end-to-end streaming data flows,
inability to troubleshoot bottlenecks, consumption patterns etc.
Data Ingestion: High-volume streaming sources, multiple message
formats, diverse protocols and multi-vendor devices creates data
ingestion challenges
Real-time Insights: Analyzing continuous and rapid inflow
(velocity) of streaming data at high volumes creates major
challenges for gaining real-time insights

CLOUDERA DATAFLOW

WHAT IS CLOUDERA DATAFLOW (CDF)?
Cloudera DataFlow (CDF) is a scalable, real-time
streaming data platform that collects, curates, and
analyzes data so customers gain key insights for
immediate actionable intelligence.

Mid-2000’s
NiFi was developed
and used at NSA
2015
Onyara is acquired
HDF is born
2018
Strong Streaming Platform
- Support for Kafka 2.0
- SMM is introduced
Tomorrow:
Edge-to-AI
Bring this to the edge with
connected platforms
HISTORY OF CDF
Data-in-Motion:
• Comprehensive real-time streaming data
platform
• Manage data-in-motion from edge-to-
enterprise
• Power IoT-scale streaming architectures
Enable next generation
Modern Data Architecture
2019
Cloudera merger
Enable Edge Intelligence

COMMON USE CASES
Data Movement
Optimize resource utilization by moving data
between data centers or between on-premises
infrastructure and cloud infrastructure
Optimize Log Collection & Analysis
Optimize log analytics solutions by using CDF
as a single platform to collect and deliver
multiple data sources
Gain key insights with Streaming Analytics
Accelerate big data ROI by analyzing
streaming data for patterns, comparing with ML
models and delivering actionable intelligence
Single view / 360° view of customer
Ingest, transform and combine customer
data from multiple sources into a single data
view / lake
Stream Processing
Combine multiple streams of data in real-
time, enrich the data and route it to different
end points based on rules
Capture IoT Data
Ingest sensor data from IoT devices and
stream it for further processing and
comprehensive analysis

Public Sector Transportation Utilities Healthcare Manufacturing Retail
COMMON IOT USE CASES BY INDUSTRY
Fleet
Management
Connected
Cars
Smart
Cities
Predictive
Analytics
Inventory/
Material
Tracking
• IoT is a $1.13T market opportunity in 2021.
• Americas - $329B IoT spending. Manufacturing and Transportation are top industries, accounting for 26% of total spending.
• APAC - $500B IoT spending. Manufacturing, Utilities and Transportation are top industries.
• EMEA - $264B IoT spending. Manufacturing is top industry, powered by Industry 4.0 initiatives.
• Worldwide IoT Analytics and Information Management Market = $573M
Top 5
Use cases Utility
Monitoring
Predictive
Maintenance
Patient
Monitoring
Usage-based
Insurance
Asset
Tracking /
Monitoring
Edge Data
Collection

© Cloudera, Inc. All rights reserved. 10
CUSTOMERS

Improving Healthcare with SMART data
Combine multi-format data
streams, with hundreds of
sources, into one platform
• Needed a platform that could
combine multi-format data
streaming
• Data scarcity & latency
problems
• Machine learning & data
science
• First to deliver SMART real-
time streaming data
• Clearsense’s Inception™
product enables fast decisions
for clinicians
• Customers have access to all
data sources with HDP & CDF
Cloud-based systems
architected to deliver
SMART data, using HDP
and CDF
• Mission critical data is now
available for doctors to make
critical decisions
• Cost efficiencies led to access for
2,000 rural providers
• Real-time data helps prevent
“Code Blue”
Mission-critical data and
relevant insight for 2,000
rural providers
Photo by rawpixel on Unsplash
Lack of medical
expertise around
patient care, post
surgery
• Patient Code Blue status
• Possible cardiac arrest 4–
6 hours post surgery
C H A L L E N G E R E S U L TS O L U T I O NI M P A C T

Positioning technology products & services empower companies worldwide
Provide accurate data for
small carriers to improve
business results
• 95% of small carriers (less
than 50 trucks) have a deficit
of data available
• Estimated data, price points
and revenue base
opportunity for controlling
fuel cost
• Understanding of freight and
lane movement
• Leveraging big data powering
Blockchain, with machine
learning, to revolutionize
Transportation and Logistics
industries
• Analyzed fuel data; can
consolidate data set for small
carriers to generate community
data lake
Big Data in the Cloud
with HDP, CDF, and
Microsoft Azure
• Managing for 4 million
trucks daily
• $31 billion dollars in freight
movement guides
customers to profitability
• Blockchain driven
architecture
Double digit revenue
increase, year over year
C H A L L E N G E
Photo by rawpixel.com on Unsplash
Continuing on current
path would slow
organizational growth and
impact customers
• Being unable to predict
weather patterns would lead to
delays and decreased product
quality
• Operational inefficiencies
prevent reaching business
revenue goals, lack of insights
• Loss of product during
transportation
R E S U L TS O L U T I O NI M P A C T

PRODUCT OVERVIEW

CLOUDERA DATAFLOW

CLOUDERA DATAFLOW Data-in-motion platform

EDGE DATA MANAGEMENT
• Edge data collection powered by Apache MiNiFi
• MiNiFi – smaller footprint than NiFi
• Guaranteed delivery
• Data buffering
• Prioritized queuing
• Flow-specific QoS
• Data provenance
• Designed for extension
• C++ / Java agents
• Designed for IoT

FLOW MANAGEMENT
• Web-based user interface
• Highly configurable
• Out-of-the-box data provenance
• Designed for extensibility
• Secure
• NiFi Registry
• DevOps support
• FDLC
• Versioning
• Deployment

280+ PROCESSORS FOR DEEPER ECOSYSTEM INTEGRATION
Hash
Extract
Merge
Duplicate
Scan
GeoEnrich
Replace
ConvertSplit
Translate
Route Content
Route Context
Route Text
Control Rate
Distribute Load
Generate Table Fetch
Jolt Transform JSON
Prioritized Delivery
Encrypt
Tail
Evaluate
Execute
All Apache project logos are trademarks of the ASF and the respective projects.
Fetch
HTTP
Syslog
Email
HTML
Image
HL7
FTP
UDP
XML
SFTP
AMQP
WebSocket

Streaming Analytics Reference Architecture
Data Flow Apps
Powered by NiFi
Kafka is Everywhere. Critical Component of Streaming Architectures
Kafka Producers Kafka Topics Kafka TopicsKafka Consumers & Producers Kafka Consumers
US West Fleet
Truck Sensors C++
Agent
US Central Fleet
Truck Sensors C++
Agent
US East Fleet
Truck Sensors C++
Agent
Analytics App 1
Analytics App 2
Analytics App 5
Analytics App 3
Analytics App 4

Cloudera Streams Messaging Manager (SMM)
What is SMM?
 Kafka Management and Monitoring
tool
 Cure the “Kafka Blindness”
 Single Monitoring Dashboard for all
your Kafka Clusters across 4 entities
– Broker
– Producer
– Topic
– Consumer
 REST as a First Class Citizen
 Alerting
 Schema Management
 Integration with Schema Registry

STREAMING ANALYTICS
• Pattern matching
• Predictive and Prescriptive Analytics
• Complex Event Processing
• Continuous & Real-time Insights

OLAP Access PatternSQL Access Pattern
Streaming Event Storage Substrate
Topic A
Kafka Topic Kafka Topic
Topic B
Kafka Topic
Topic C
Kafka Topic
Topic D
Kafka Topic
Topic X
3 KafkaAnalyticsAccess Patterns
Streaming Access Pattern
N
ew
KAFKA SQL
New
KAFKA OLAP
New

ENTERPRISE SERVICES
• Provisioning
• Management
• Monitoring
• Unified Security
• Single Sign-on
• Audit
• Compliance
• Edge-to-Enterprise Governance

KEY DIFFERENTIATORS
Comprehensive streaming platform – Only big data vendor to offer a comprehensive streaming
platform from real-time data ingestion, transformation, routing to descriptive, prescriptive and predictive
analytics.
100% open source technology – Only vendor with this strategy; prevents vendor lock-in
280+ pre-built processors – Only product to offer such comprehensive connectivity from edge to
enterprise
Built-in data provenance – Only product in the market to offer out-of-the-box data provenance on data-
in-motion
3 Streaming analytics engines – Only vendor to offer a choice of three streaming analytics engines to
customers for all their streaming architecture needs

DEMO

QUESTIONS?

Introducing Cloudera DataFlow (CDF) 2.13.19

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introducing Cloudera DataFlow (CDF) 2.13.19

Similar to Introducing Cloudera DataFlow (CDF) 2.13.19 (20)

More from Cloudera, Inc.

More from Cloudera, Inc. (20)

Recently uploaded

Recently uploaded (20)

Introducing Cloudera DataFlow (CDF) 2.13.19

Editor's Notes