尊敬的 微信汇率:1円 ≈ 0.046166 元 支付宝汇率:1円 ≈ 0.046257元 [退出登录]
SlideShare a Scribd company logo
1
Stefan Richter

@stefanrrichter



29.10.2016
A look at Flink 1.2 and beyond
Agenda
▪ Flink 1.2 feature overview & walkthrough
▪ Taking a closer look at two features:
▪ Queryable state
▪ Dynamic scaling
2
Feature Overview
Flink Release 1.2
3
Flink 1.1+ ongoing development
4
Session Windows(Stream) SQL
Library

enhancements
Metric

System
Metrics &

Visualization
Dynamic Scaling
Savepoint

compatibility Checkpoints

to savepoints
Connectors in Flink
Stream SQL

Windows
Large state

Maintenance
Fine grained

recovery
Side in-/outputs
Window DSL
Security
Mesos &

others
Dynamic Resource

Management
Authentication
Queryable StateApache Bahir connectors
Operations
Ecosystem
Application

Features
Broader

Audience
Flink 1.1+ ongoing development
4
Session Windows(Stream) SQL
Library

enhancements
Metric

System
Metrics &

Visualization
Dynamic Scaling
Savepoint

compatibility Checkpoints

to savepoints
Connectors in Flink
Stream SQL

Windows
Large state

Maintenance
Fine grained

recovery
Side in-/outputs
Window DSL
Security
Mesos &

others
Dynamic Resource

Management
Authentication
Queryable StateApache Bahir connectors
Operations
Ecosystem
Application

Features
Broader

Audience
Flink 1.1+ ongoing development
4
Session Windows(Stream) SQL
Library

enhancements
Metric

System
Metrics &

Visualization
Dynamic Scaling
Savepoint

compatibility Checkpoints

to savepoints
Connectors in Flink
Stream SQL

Windows
Large state

Maintenance
Fine grained

recovery
Side in-/outputs
Window DSL
Security
Mesos &

others
Dynamic Resource

Management
Authentication
Queryable StateApache Bahir connectors
Operations
Ecosystem
Application

Features
Broader

Audience
Flink 1.1+ ongoing development
4
Session Windows(Stream) SQL
Library

enhancements
Metric

System
Metrics &

Visualization
Dynamic Scaling
Savepoint

compatibility Checkpoints

to savepoints
Connectors in Flink
Stream SQL

Windows
Large state

Maintenance
Fine grained

recovery
Side in-/outputs
Window DSL
Security
Mesos &

others
Dynamic Resource

Management
Authentication
Queryable StateApache Bahir connectors
Operations
Ecosystem
Application

Features
Broader

Audience
Flink 1.1+ ongoing development
4
Session Windows(Stream) SQL
Library

enhancements
Metric

System
Metrics &

Visualization
Dynamic Scaling
Savepoint

compatibility Checkpoints

to savepoints
Connectors in Flink
Stream SQL

Windows
Large state

Maintenance
Fine grained

recovery
Side in-/outputs
Window DSL
Security
Mesos &

others
Dynamic Resource

Management
Authentication
Queryable StateApache Bahir connectors
Operations
Ecosystem
Application

Features
Broader

Audience
Flink 1.2 Improvements
5
Session Windows(Stream) SQL
Library

enhancements
Metric

System
Operations
Ecosystem
Application

Features
Metrics &

Visualization
Dynamic Scaling
Savepoint

compatibility Checkpoints

to savepoints
Connectors in Flink
Stream SQL

Windows
Large state

Maintenance
Fine grained

recovery
Side in-/outputs
Window DSL
Broader

Audience
Security
Mesos &

others
Dynamic Resource

Management
Authentication
Queryable StateApache Bahir connectors
Security / Authentication - Flink 1.2
6
Authorized data access
Secured clusters with Kerberos-based authentication
• Kafka, ZooKeeper, HDFS, YARN, HBase, …
Encrypted traffic between Flink Processes
• RPC, Data Exchange, Web UI, … - „SSL for all connections“
Largely contributed by
Prevent malicious users to hook into Flink jobs
Cluster Management - Flink 1.1
7
Standalone
Flink on Yarn
Cluster Management - Flink 1.2
8Mesos integration contributed by
Standalone
Flink on Yarn
Flink on Mesos
Cluster Management - Beyond 1.2
9
Efforts to seamlessly interoperate with various
cluster managers.
Generalized abstraction (FLIP-6).
Driven by and
Cluster Management - Beyond (ct’d)
10
TaskManagerJobManager
(1) Register
(2) Deploy Tasks
ResourceManager
(1) Request
slots
TaskManager
JobManager
(2) Start
TaskManager
(3) Register
(4) Deploy Tasks
Dispatcher
(0) Start
JobManager
Cluster Management - Beyond (ct’d)
10
TaskManagerJobManager
(1) Register
(2) Deploy Tasks
ResourceManager
(1) Request
slots
TaskManager
JobManager
(2) Start
TaskManager
(3) Register
(4) Deploy Tasks
Dispatcher
(0) Start
JobManager
Metrics
11
Metrics
▪ Rates
11
Metrics
▪ Rates
▪ Latency (operator)
11
Metrics
▪ Rates
▪ Latency (operator)
▪ Visualization in WebUI
11
Savepoint / Checkpoint Robustness
12
Savepoint / Checkpoint Robustness
▪ Resume job from
checkpoints
12
C S
Savepoint / Checkpoint Robustness
▪ Resume job from
checkpoints
▪ Use older checkpoint
on failed recovery
12
C1 C2 C3
t
✘
Savepoint / Checkpoint Robustness
▪ Resume job from
checkpoints
▪ Use older checkpoint
on failed recovery
▪ Skip failed Checkpoints
12
C1 C2 C3
t
✘
Savepoint / Checkpoint Robustness
▪ Resume job from
checkpoints
▪ Use older checkpoint
on failed recovery
▪ Skip failed Checkpoints
▪ Backwards compatible
12
S
1.1 1.2
Processing Function
13
Stream SQL
Streaming API
Processing Function
Window
Operator
Timer
Handling
?
Problem: Implement custom windowing?
Processing Function
13
Stream SQL
Streaming API
Processing Function
Window
Operator
Timer
Handling
Interface ProcessingFunction:
void flatMap(I value, Context ctx, Collector<O> out) throws Exception;
void onTimer(long timestamp, OnTimerContext ctx, Collector<O> out) throws Exception
Table API & Stream SQL
14
Example:
Table API & Stream SQL
▪ Group-windows
14
Example:
table
.groupBy('user')
.window(Session withGap
10.minutes on 'rowtime')
.select('uid', 'product.count')
Table API & Stream SQL
▪ Group-windows
▪ More SQL operations
14
Example:
EXISTS, VALUES, LIMIT
Table API & Stream SQL
▪ Group-windows
▪ More SQL operations
▪ More built-in scalar functions
14
Example:
CURRENT_DATE, INITCAP, NULLIF
Table API & Stream SQL
▪ Group-windows
▪ More SQL operations
▪ More built-in scalar functions
▪ More datatypes & better
integration
14
Example:
pojo.get('field')
pojo.flatten()
Table API & Stream SQL
▪ Group-windows
▪ More SQL operations
▪ More built-in scalar functions
▪ More datatypes & better
integration
▪ User-defined scalar functions
14
Example:
table.
select('uid',
parseName('userJson'))
Many more improvements…
15
Many more improvements…
▪ Kafka 0.10 (with watermarks)
15
Many more improvements…
▪ Kafka 0.10 (with watermarks)
▪ Bucketing Sink: divides output into different file w.r.t. user
logic
15
Many more improvements…
▪ Kafka 0.10 (with watermarks)
▪ Bucketing Sink: divides output into different file w.r.t. user
logic
▪ Detached execution: first step in programatically controlled
job
15
Many more improvements…
▪ Kafka 0.10 (with watermarks)
▪ Bucketing Sink: divides output into different file w.r.t. user
logic
▪ Detached execution: first step in programatically controlled
job
▪ Async IO operator: non-blocking queries to external systems
15
Many more improvements…
▪ Kafka 0.10 (with watermarks)
▪ Bucketing Sink: divides output into different file w.r.t. user
logic
▪ Detached execution: first step in programatically controlled
job
▪ Async IO operator: non-blocking queries to external systems
▪ Improved scalability, robustness + bugfixes
15
Queryable State
Flink 1.2
16
Queryable State - Motivation
17
Realtime
Queries
Periodically (every second)

flush new aggregates

to Redis
Queryable State - Motivation
18
Number of

Keys
Queryable State - Motivation
19
Realtime
QueriesWhere is the bottleneck?
Queryable State - Motivation
19
Writes to the key/value

store take too long
Realtime
QueriesWhere is the bottleneck?
Queryable State - Idea
20
Realtime
Queries
Archive
Database
Optional +
only at end of windows
“Streamprocessor
as a database“
Queryable State - Performance
21
Number of

Keys
Queryable State - Implementation
22
Query Client
State

Registry
window()
/
sum()
Job Manager Task Manager
ExecutionGraph
State Location Server
deploy
status
Query: /job/operation/state-name/key
State

Registry
Task Manager
(1) Get location of "key-partition"

for "operator" of" job"
(2) Look up

location
(3)

Respond location
(4) Query

state-name and key
local

state
register
window()
/
sum()
Queryable State Enablers
23
Queryable State Enablers
▪ Flink has state as a first class citizen
23
Queryable State Enablers
▪ Flink has state as a first class citizen
▪ State is fault tolerant (exactly once semantics)
23
Queryable State Enablers
▪ Flink has state as a first class citizen
▪ State is fault tolerant (exactly once semantics)
▪ State is partitioned (sharded) together with the
operators that create/update it
23
Queryable State Enablers
▪ Flink has state as a first class citizen
▪ State is fault tolerant (exactly once semantics)
▪ State is partitioned (sharded) together with the
operators that create/update it
▪ State is continuous (not mini batched)
23
Queryable State Enablers
▪ Flink has state as a first class citizen
▪ State is fault tolerant (exactly once semantics)
▪ State is partitioned (sharded) together with the
operators that create/update it
▪ State is continuous (not mini batched)
▪ State is scalable (e.g., embedded RocksDB state
backend)
23
Dynamic Scaling
Flink 1.2
24
Motivation - Changing Workloads
25
Motivation - Changing Workloads
25
Motivation - Changing Workloads
25
Motivation - Resource Adaption
26
time
Workload
Resources
Motivation - Resource Adaption
26
time
Workload
Resources
time
Workload
Resources
Motivation - Resource Adaption
26
+
time
Workload
Resources
time
Workload
Resources
Basic Idea
27
• Spread work across more workers to decrease workload
Scaling Stateless Jobs
28
Scale Up Scale Down
Source
Mapper
Sink
• Scale up: Deploy new tasks
• Scale down: Cancel running tasks
Scaling Stateful Jobs
29
?
• Problem 1: Which state to assign to new task?
• Problem 2: Read + filter whole state?
Non-keyed vs Keyed State
30
• State bound to an operator + key
• E.g. Keyed UDF and window state
• „SELECT count(*) FROM t GROUP BY t.key“
• State bound only to operator
• E.g. Source state
KeyedNon-keyed
Non-keyed vs Keyed State
30
• State bound to an operator + key
• E.g. Keyed UDF and window state
• „SELECT count(*) FROM t GROUP BY t.key“
• State bound only to operator
• E.g. Source state
KeyedNon-keyed
Repartitioning Non-keyed state
31
#1 #2
#3 #4
#1 #2
#3 #4
Flink 1.1:
T snapshot()
void restore(T)
Flink 1.2:
List<T> snapshot()
void restore(List<T>)
Idea: break up state into finer granules that can be redistributed independently
Example: Kafka Source Flink 1.1
32
partitionId: 1, offset: 42
partitionId: 3, offset: 10
partitionId: 6, offset: 27
?
Operator state is black box. How to repartition?
Example: Kafka Source Flink 1.2
33
partitionId: 1, offset: 42
partitionId: 3, offset: 10
partitionId: 6, offset: 27
?
?
?
Return a list of sub-states which can be freely repartitioned.
partitionId: 1, offset: 42
partitionId: 6, offset: 27
Example: Kafka Source Flink 1.2
34
partitionId: 3, offset: 10
Scale Out
partitionId: 1, offset: 42
partitionId: 6, offset: 27
Example: Kafka Source Flink 1.2
34
partitionId: 3, offset: 10
Scale Out
Example: Kafka Source Flink 1.2
35
partitionId: 1, offset: 42
partitionId: 6, offset: 27
partitionId: 3, offset: 10
Scale In
Example: Kafka Source Flink 1.2
35
partitionId: 1, offset: 42
partitionId: 6, offset: 27
partitionId: 3, offset: 10
Scale In
Non-keyed vs Keyed State
36
• State bound to an operator + key
• E.g. Keyed UDF and window state
• „SELECT count(*) FROM t GROUP BY t.key“
• State bound only to operator
• E.g. Source state
KeyedNon-keyed
Repartitioning Keyed State
▪ Split key space into
key groups
▪ Every key falls into
exactly one key group
▪ Assign key groups to
tasks
37
Key space
Key group #1 Key group #2
Key group #3Key group #4
One key
Repartitioning Keyed State (ct’d)
▪ Rescaling changes
key group assignment
▪ Maximum parallelism
defined by #key
groups
38
Current State in Flink 1.2
▪ Manual rescaling
1. Take savepoint
2. Restart job with adjusted parallelism and
savepoint
39
Next Steps beyond Flink 1.2
▪ Rescaling individual operators w/o restart
▪ Refactor Flink deployment and process
model (previously discussed)
▪ On-the-fly Scaling
40
Autoscaling Policies
41
• Latency
• Throughput
• Resource utilization
• Kubernetes on GCE, EC2 and Mesos (marathon-
autoscale) already support auto-scaling
Conclusion
42
Conclusion
▪ Many great features in Flink 1.2
▪ Walkthrough
▪ Queryable State & Dynamic Scaling
42
Conclusion
▪ Many great features in Flink 1.2
▪ Walkthrough
▪ Queryable State & Dynamic Scaling
▪ Glimpse beyond the 1.2 release
42
43
Thank you!
@stefanrrichter
@ApacheFlink
@dataArtisans
Questions?
44

More Related Content

What's hot

Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016
Stephan Ewen
 
Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4
Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4
Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4
Flink Forward
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Ververica
 
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Flink Forward
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkTzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Ververica
 
From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4
Till Rohrmann
 
Flink Streaming @BudapestData
Flink Streaming @BudapestDataFlink Streaming @BudapestData
Flink Streaming @BudapestData
Gyula Fóra
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
Kostas Tzoumas
 
Flink 1.0-slides
Flink 1.0-slidesFlink 1.0-slides
Flink 1.0-slides
Jamie Grier
 
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward
 
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on FlinkTran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Flink Forward
 
Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016
Kostas Tzoumas
 
Marton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingMarton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream Processing
Flink Forward
 
Zurich Flink Meetup
Zurich Flink MeetupZurich Flink Meetup
Zurich Flink Meetup
Konstantinos Kloudas
 
Flink Forward San Francisco 2018: Steven Wu - "Scaling Flink in Cloud"
Flink Forward San Francisco 2018: Steven Wu - "Scaling Flink in Cloud" Flink Forward San Francisco 2018: Steven Wu - "Scaling Flink in Cloud"
Flink Forward San Francisco 2018: Steven Wu - "Scaling Flink in Cloud"
Flink Forward
 
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming APIFlink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward
 
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
Robert Metzger
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward
 
Big Data Warsaw
Big Data WarsawBig Data Warsaw
Big Data Warsaw
Maximilian Michels
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System Overview
Flink Forward
 

What's hot (20)

Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016
 
Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4
Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4
Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache Flink
 
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkTzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
 
From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4
 
Flink Streaming @BudapestData
Flink Streaming @BudapestDataFlink Streaming @BudapestData
Flink Streaming @BudapestData
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 
Flink 1.0-slides
Flink 1.0-slidesFlink 1.0-slides
Flink 1.0-slides
 
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
 
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on FlinkTran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
 
Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016
 
Marton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingMarton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream Processing
 
Zurich Flink Meetup
Zurich Flink MeetupZurich Flink Meetup
Zurich Flink Meetup
 
Flink Forward San Francisco 2018: Steven Wu - "Scaling Flink in Cloud"
Flink Forward San Francisco 2018: Steven Wu - "Scaling Flink in Cloud" Flink Forward San Francisco 2018: Steven Wu - "Scaling Flink in Cloud"
Flink Forward San Francisco 2018: Steven Wu - "Scaling Flink in Cloud"
 
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming APIFlink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
 
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
 
Big Data Warsaw
Big Data WarsawBig Data Warsaw
Big Data Warsaw
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System Overview
 

Similar to A look at Flink 1.2

Flink at netflix paypal speaker series
Flink at netflix   paypal speaker seriesFlink at netflix   paypal speaker series
Flink at netflix paypal speaker series
Monal Daxini
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward
 
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Building Stream Processing as a Service
Building Stream Processing as a ServiceBuilding Stream Processing as a Service
Building Stream Processing as a Service
Steven Wu
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paas
Monal Daxini
 
Create a One Click Migration (OCM) process to Automate Repeatable Infrastruct...
Create a One Click Migration (OCM) process to Automate Repeatable Infrastruct...Create a One Click Migration (OCM) process to Automate Repeatable Infrastruct...
Create a One Click Migration (OCM) process to Automate Repeatable Infrastruct...
Quantyca - Data at Core
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
Going Reactive with Relational Databases
Going Reactive with Relational DatabasesGoing Reactive with Relational Databases
Going Reactive with Relational Databases
Ivaylo Pashov
 
Circonus: Design failures - A Case Study
Circonus: Design failures - A Case StudyCirconus: Design failures - A Case Study
Circonus: Design failures - A Case Study
Heinrich Hartmann
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
Jiangjie Qin
 
What You Should Know About WebLogic Server 12c (12.2.1.2) #oow2015 #otntour2...
What You Should Know About WebLogic Server 12c (12.2.1.2)  #oow2015 #otntour2...What You Should Know About WebLogic Server 12c (12.2.1.2)  #oow2015 #otntour2...
What You Should Know About WebLogic Server 12c (12.2.1.2) #oow2015 #otntour2...
Frank Munz
 
Using Docker EE to Scale Operational Intelligence at Splunk
Using Docker EE to Scale Operational Intelligence at SplunkUsing Docker EE to Scale Operational Intelligence at Splunk
Using Docker EE to Scale Operational Intelligence at Splunk
Docker, Inc.
 
Autosys Trainer CV
Autosys Trainer CVAutosys Trainer CV
Autosys Trainer CV
DS gupta
 
Data Virtualization: Revolutionizing data cloning
Data Virtualization: Revolutionizing data cloningData Virtualization: Revolutionizing data cloning
Data Virtualization: Revolutionizing data cloning
Kyle Hailey
 
Spark Streaming @ Scale (Clicktale)
Spark Streaming @ Scale (Clicktale)Spark Streaming @ Scale (Clicktale)
Spark Streaming @ Scale (Clicktale)
Yuval Itzchakov
 
How Scylla Manager Handles Backups
How Scylla Manager Handles BackupsHow Scylla Manager Handles Backups
How Scylla Manager Handles Backups
ScyllaDB
 
Bots on guard of sdlc
Bots on guard of sdlcBots on guard of sdlc
Bots on guard of sdlc
Alexey Tokar
 
Building Apps with Distributed In-Memory Computing Using Apache Geode
Building Apps with Distributed In-Memory Computing Using Apache GeodeBuilding Apps with Distributed In-Memory Computing Using Apache Geode
Building Apps with Distributed In-Memory Computing Using Apache Geode
PivotalOpenSourceHub
 
From Code to Kubernetes
From Code to KubernetesFrom Code to Kubernetes
From Code to Kubernetes
Daniel Oliveira Filho
 

Similar to A look at Flink 1.2 (20)

Flink at netflix paypal speaker series
Flink at netflix   paypal speaker seriesFlink at netflix   paypal speaker series
Flink at netflix paypal speaker series
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
 
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Building Stream Processing as a Service
Building Stream Processing as a ServiceBuilding Stream Processing as a Service
Building Stream Processing as a Service
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paas
 
Create a One Click Migration (OCM) process to Automate Repeatable Infrastruct...
Create a One Click Migration (OCM) process to Automate Repeatable Infrastruct...Create a One Click Migration (OCM) process to Automate Repeatable Infrastruct...
Create a One Click Migration (OCM) process to Automate Repeatable Infrastruct...
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Going Reactive with Relational Databases
Going Reactive with Relational DatabasesGoing Reactive with Relational Databases
Going Reactive with Relational Databases
 
Circonus: Design failures - A Case Study
Circonus: Design failures - A Case StudyCirconus: Design failures - A Case Study
Circonus: Design failures - A Case Study
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
 
What You Should Know About WebLogic Server 12c (12.2.1.2) #oow2015 #otntour2...
What You Should Know About WebLogic Server 12c (12.2.1.2)  #oow2015 #otntour2...What You Should Know About WebLogic Server 12c (12.2.1.2)  #oow2015 #otntour2...
What You Should Know About WebLogic Server 12c (12.2.1.2) #oow2015 #otntour2...
 
Using Docker EE to Scale Operational Intelligence at Splunk
Using Docker EE to Scale Operational Intelligence at SplunkUsing Docker EE to Scale Operational Intelligence at Splunk
Using Docker EE to Scale Operational Intelligence at Splunk
 
Autosys Trainer CV
Autosys Trainer CVAutosys Trainer CV
Autosys Trainer CV
 
Data Virtualization: Revolutionizing data cloning
Data Virtualization: Revolutionizing data cloningData Virtualization: Revolutionizing data cloning
Data Virtualization: Revolutionizing data cloning
 
Spark Streaming @ Scale (Clicktale)
Spark Streaming @ Scale (Clicktale)Spark Streaming @ Scale (Clicktale)
Spark Streaming @ Scale (Clicktale)
 
How Scylla Manager Handles Backups
How Scylla Manager Handles BackupsHow Scylla Manager Handles Backups
How Scylla Manager Handles Backups
 
Bots on guard of sdlc
Bots on guard of sdlcBots on guard of sdlc
Bots on guard of sdlc
 
Building Apps with Distributed In-Memory Computing Using Apache Geode
Building Apps with Distributed In-Memory Computing Using Apache GeodeBuilding Apps with Distributed In-Memory Computing Using Apache Geode
Building Apps with Distributed In-Memory Computing Using Apache Geode
 
From Code to Kubernetes
From Code to KubernetesFrom Code to Kubernetes
From Code to Kubernetes
 

Recently uploaded

So You've Lost Quorum: Lessons From Accidental Downtime
So You've Lost Quorum: Lessons From Accidental DowntimeSo You've Lost Quorum: Lessons From Accidental Downtime
So You've Lost Quorum: Lessons From Accidental Downtime
ScyllaDB
 
APJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes WebinarAPJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes Webinar
ThousandEyes
 
Communications Mining Series - Zero to Hero - Session 2
Communications Mining Series - Zero to Hero - Session 2Communications Mining Series - Zero to Hero - Session 2
Communications Mining Series - Zero to Hero - Session 2
DianaGray10
 
Introduction to ThousandEyes AMER Webinar
Introduction  to ThousandEyes AMER WebinarIntroduction  to ThousandEyes AMER Webinar
Introduction to ThousandEyes AMER Webinar
ThousandEyes
 
An All-Around Benchmark of the DBaaS Market
An All-Around Benchmark of the DBaaS MarketAn All-Around Benchmark of the DBaaS Market
An All-Around Benchmark of the DBaaS Market
ScyllaDB
 
DynamoDB to ScyllaDB: Technical Comparison and the Path to Success
DynamoDB to ScyllaDB: Technical Comparison and the Path to SuccessDynamoDB to ScyllaDB: Technical Comparison and the Path to Success
DynamoDB to ScyllaDB: Technical Comparison and the Path to Success
ScyllaDB
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0
Neeraj Kumar Singh
 
ScyllaDB Real-Time Event Processing with CDC
ScyllaDB Real-Time Event Processing with CDCScyllaDB Real-Time Event Processing with CDC
ScyllaDB Real-Time Event Processing with CDC
ScyllaDB
 
ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes New Product Features and Release Highlights: June 2024ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes
 
New ThousandEyes Product Features and Release Highlights: June 2024
New ThousandEyes Product Features and Release Highlights: June 2024New ThousandEyes Product Features and Release Highlights: June 2024
New ThousandEyes Product Features and Release Highlights: June 2024
ThousandEyes
 
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
dipikamodels1
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
manji sharman06
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
zjhamm304
 
Real-Time Persisted Events at Supercell
Real-Time Persisted Events at  SupercellReal-Time Persisted Events at  Supercell
Real-Time Persisted Events at Supercell
ScyllaDB
 
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - MydbopsMySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
Mydbops
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My IdentityCNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
Cynthia Thomas
 
Automation Student Developers Session 3: Introduction to UI Automation
Automation Student Developers Session 3: Introduction to UI AutomationAutomation Student Developers Session 3: Introduction to UI Automation
Automation Student Developers Session 3: Introduction to UI Automation
UiPathCommunity
 

Recently uploaded (20)

So You've Lost Quorum: Lessons From Accidental Downtime
So You've Lost Quorum: Lessons From Accidental DowntimeSo You've Lost Quorum: Lessons From Accidental Downtime
So You've Lost Quorum: Lessons From Accidental Downtime
 
APJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes WebinarAPJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes Webinar
 
Communications Mining Series - Zero to Hero - Session 2
Communications Mining Series - Zero to Hero - Session 2Communications Mining Series - Zero to Hero - Session 2
Communications Mining Series - Zero to Hero - Session 2
 
Introduction to ThousandEyes AMER Webinar
Introduction  to ThousandEyes AMER WebinarIntroduction  to ThousandEyes AMER Webinar
Introduction to ThousandEyes AMER Webinar
 
An All-Around Benchmark of the DBaaS Market
An All-Around Benchmark of the DBaaS MarketAn All-Around Benchmark of the DBaaS Market
An All-Around Benchmark of the DBaaS Market
 
DynamoDB to ScyllaDB: Technical Comparison and the Path to Success
DynamoDB to ScyllaDB: Technical Comparison and the Path to SuccessDynamoDB to ScyllaDB: Technical Comparison and the Path to Success
DynamoDB to ScyllaDB: Technical Comparison and the Path to Success
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0
 
ScyllaDB Real-Time Event Processing with CDC
ScyllaDB Real-Time Event Processing with CDCScyllaDB Real-Time Event Processing with CDC
ScyllaDB Real-Time Event Processing with CDC
 
ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes New Product Features and Release Highlights: June 2024ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes New Product Features and Release Highlights: June 2024
 
New ThousandEyes Product Features and Release Highlights: June 2024
New ThousandEyes Product Features and Release Highlights: June 2024New ThousandEyes Product Features and Release Highlights: June 2024
New ThousandEyes Product Features and Release Highlights: June 2024
 
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
 
Real-Time Persisted Events at Supercell
Real-Time Persisted Events at  SupercellReal-Time Persisted Events at  Supercell
Real-Time Persisted Events at Supercell
 
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - MydbopsMySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My IdentityCNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
 
Automation Student Developers Session 3: Introduction to UI Automation
Automation Student Developers Session 3: Introduction to UI AutomationAutomation Student Developers Session 3: Introduction to UI Automation
Automation Student Developers Session 3: Introduction to UI Automation
 

A look at Flink 1.2

  • 2. Agenda ▪ Flink 1.2 feature overview & walkthrough ▪ Taking a closer look at two features: ▪ Queryable state ▪ Dynamic scaling 2
  • 4. Flink 1.1+ ongoing development 4 Session Windows(Stream) SQL Library
 enhancements Metric
 System Metrics &
 Visualization Dynamic Scaling Savepoint
 compatibility Checkpoints
 to savepoints Connectors in Flink Stream SQL
 Windows Large state
 Maintenance Fine grained
 recovery Side in-/outputs Window DSL Security Mesos &
 others Dynamic Resource
 Management Authentication Queryable StateApache Bahir connectors Operations Ecosystem Application
 Features Broader
 Audience
  • 5. Flink 1.1+ ongoing development 4 Session Windows(Stream) SQL Library
 enhancements Metric
 System Metrics &
 Visualization Dynamic Scaling Savepoint
 compatibility Checkpoints
 to savepoints Connectors in Flink Stream SQL
 Windows Large state
 Maintenance Fine grained
 recovery Side in-/outputs Window DSL Security Mesos &
 others Dynamic Resource
 Management Authentication Queryable StateApache Bahir connectors Operations Ecosystem Application
 Features Broader
 Audience
  • 6. Flink 1.1+ ongoing development 4 Session Windows(Stream) SQL Library
 enhancements Metric
 System Metrics &
 Visualization Dynamic Scaling Savepoint
 compatibility Checkpoints
 to savepoints Connectors in Flink Stream SQL
 Windows Large state
 Maintenance Fine grained
 recovery Side in-/outputs Window DSL Security Mesos &
 others Dynamic Resource
 Management Authentication Queryable StateApache Bahir connectors Operations Ecosystem Application
 Features Broader
 Audience
  • 7. Flink 1.1+ ongoing development 4 Session Windows(Stream) SQL Library
 enhancements Metric
 System Metrics &
 Visualization Dynamic Scaling Savepoint
 compatibility Checkpoints
 to savepoints Connectors in Flink Stream SQL
 Windows Large state
 Maintenance Fine grained
 recovery Side in-/outputs Window DSL Security Mesos &
 others Dynamic Resource
 Management Authentication Queryable StateApache Bahir connectors Operations Ecosystem Application
 Features Broader
 Audience
  • 8. Flink 1.1+ ongoing development 4 Session Windows(Stream) SQL Library
 enhancements Metric
 System Metrics &
 Visualization Dynamic Scaling Savepoint
 compatibility Checkpoints
 to savepoints Connectors in Flink Stream SQL
 Windows Large state
 Maintenance Fine grained
 recovery Side in-/outputs Window DSL Security Mesos &
 others Dynamic Resource
 Management Authentication Queryable StateApache Bahir connectors Operations Ecosystem Application
 Features Broader
 Audience
  • 9. Flink 1.2 Improvements 5 Session Windows(Stream) SQL Library
 enhancements Metric
 System Operations Ecosystem Application
 Features Metrics &
 Visualization Dynamic Scaling Savepoint
 compatibility Checkpoints
 to savepoints Connectors in Flink Stream SQL
 Windows Large state
 Maintenance Fine grained
 recovery Side in-/outputs Window DSL Broader
 Audience Security Mesos &
 others Dynamic Resource
 Management Authentication Queryable StateApache Bahir connectors
  • 10. Security / Authentication - Flink 1.2 6 Authorized data access Secured clusters with Kerberos-based authentication • Kafka, ZooKeeper, HDFS, YARN, HBase, … Encrypted traffic between Flink Processes • RPC, Data Exchange, Web UI, … - „SSL for all connections“ Largely contributed by Prevent malicious users to hook into Flink jobs
  • 11. Cluster Management - Flink 1.1 7 Standalone Flink on Yarn
  • 12. Cluster Management - Flink 1.2 8Mesos integration contributed by Standalone Flink on Yarn Flink on Mesos
  • 13. Cluster Management - Beyond 1.2 9 Efforts to seamlessly interoperate with various cluster managers. Generalized abstraction (FLIP-6). Driven by and
  • 14. Cluster Management - Beyond (ct’d) 10 TaskManagerJobManager (1) Register (2) Deploy Tasks ResourceManager (1) Request slots TaskManager JobManager (2) Start TaskManager (3) Register (4) Deploy Tasks Dispatcher (0) Start JobManager
  • 15. Cluster Management - Beyond (ct’d) 10 TaskManagerJobManager (1) Register (2) Deploy Tasks ResourceManager (1) Request slots TaskManager JobManager (2) Start TaskManager (3) Register (4) Deploy Tasks Dispatcher (0) Start JobManager
  • 19. Metrics ▪ Rates ▪ Latency (operator) ▪ Visualization in WebUI 11
  • 20. Savepoint / Checkpoint Robustness 12
  • 21. Savepoint / Checkpoint Robustness ▪ Resume job from checkpoints 12 C S
  • 22. Savepoint / Checkpoint Robustness ▪ Resume job from checkpoints ▪ Use older checkpoint on failed recovery 12 C1 C2 C3 t ✘
  • 23. Savepoint / Checkpoint Robustness ▪ Resume job from checkpoints ▪ Use older checkpoint on failed recovery ▪ Skip failed Checkpoints 12 C1 C2 C3 t ✘
  • 24. Savepoint / Checkpoint Robustness ▪ Resume job from checkpoints ▪ Use older checkpoint on failed recovery ▪ Skip failed Checkpoints ▪ Backwards compatible 12 S 1.1 1.2
  • 25. Processing Function 13 Stream SQL Streaming API Processing Function Window Operator Timer Handling ? Problem: Implement custom windowing?
  • 26. Processing Function 13 Stream SQL Streaming API Processing Function Window Operator Timer Handling Interface ProcessingFunction: void flatMap(I value, Context ctx, Collector<O> out) throws Exception; void onTimer(long timestamp, OnTimerContext ctx, Collector<O> out) throws Exception
  • 27. Table API & Stream SQL 14 Example:
  • 28. Table API & Stream SQL ▪ Group-windows 14 Example: table .groupBy('user') .window(Session withGap 10.minutes on 'rowtime') .select('uid', 'product.count')
  • 29. Table API & Stream SQL ▪ Group-windows ▪ More SQL operations 14 Example: EXISTS, VALUES, LIMIT
  • 30. Table API & Stream SQL ▪ Group-windows ▪ More SQL operations ▪ More built-in scalar functions 14 Example: CURRENT_DATE, INITCAP, NULLIF
  • 31. Table API & Stream SQL ▪ Group-windows ▪ More SQL operations ▪ More built-in scalar functions ▪ More datatypes & better integration 14 Example: pojo.get('field') pojo.flatten()
  • 32. Table API & Stream SQL ▪ Group-windows ▪ More SQL operations ▪ More built-in scalar functions ▪ More datatypes & better integration ▪ User-defined scalar functions 14 Example: table. select('uid', parseName('userJson'))
  • 34. Many more improvements… ▪ Kafka 0.10 (with watermarks) 15
  • 35. Many more improvements… ▪ Kafka 0.10 (with watermarks) ▪ Bucketing Sink: divides output into different file w.r.t. user logic 15
  • 36. Many more improvements… ▪ Kafka 0.10 (with watermarks) ▪ Bucketing Sink: divides output into different file w.r.t. user logic ▪ Detached execution: first step in programatically controlled job 15
  • 37. Many more improvements… ▪ Kafka 0.10 (with watermarks) ▪ Bucketing Sink: divides output into different file w.r.t. user logic ▪ Detached execution: first step in programatically controlled job ▪ Async IO operator: non-blocking queries to external systems 15
  • 38. Many more improvements… ▪ Kafka 0.10 (with watermarks) ▪ Bucketing Sink: divides output into different file w.r.t. user logic ▪ Detached execution: first step in programatically controlled job ▪ Async IO operator: non-blocking queries to external systems ▪ Improved scalability, robustness + bugfixes 15
  • 40. Queryable State - Motivation 17 Realtime Queries Periodically (every second)
 flush new aggregates
 to Redis
  • 41. Queryable State - Motivation 18 Number of
 Keys
  • 42. Queryable State - Motivation 19 Realtime QueriesWhere is the bottleneck?
  • 43. Queryable State - Motivation 19 Writes to the key/value
 store take too long Realtime QueriesWhere is the bottleneck?
  • 44. Queryable State - Idea 20 Realtime Queries Archive Database Optional + only at end of windows “Streamprocessor as a database“
  • 45. Queryable State - Performance 21 Number of
 Keys
  • 46. Queryable State - Implementation 22 Query Client State
 Registry window() / sum() Job Manager Task Manager ExecutionGraph State Location Server deploy status Query: /job/operation/state-name/key State
 Registry Task Manager (1) Get location of "key-partition"
 for "operator" of" job" (2) Look up
 location (3)
 Respond location (4) Query
 state-name and key local
 state register window() / sum()
  • 48. Queryable State Enablers ▪ Flink has state as a first class citizen 23
  • 49. Queryable State Enablers ▪ Flink has state as a first class citizen ▪ State is fault tolerant (exactly once semantics) 23
  • 50. Queryable State Enablers ▪ Flink has state as a first class citizen ▪ State is fault tolerant (exactly once semantics) ▪ State is partitioned (sharded) together with the operators that create/update it 23
  • 51. Queryable State Enablers ▪ Flink has state as a first class citizen ▪ State is fault tolerant (exactly once semantics) ▪ State is partitioned (sharded) together with the operators that create/update it ▪ State is continuous (not mini batched) 23
  • 52. Queryable State Enablers ▪ Flink has state as a first class citizen ▪ State is fault tolerant (exactly once semantics) ▪ State is partitioned (sharded) together with the operators that create/update it ▪ State is continuous (not mini batched) ▪ State is scalable (e.g., embedded RocksDB state backend) 23
  • 54. Motivation - Changing Workloads 25
  • 55. Motivation - Changing Workloads 25
  • 56. Motivation - Changing Workloads 25
  • 57. Motivation - Resource Adaption 26 time Workload Resources
  • 58. Motivation - Resource Adaption 26 time Workload Resources time Workload Resources
  • 59. Motivation - Resource Adaption 26 + time Workload Resources time Workload Resources
  • 60. Basic Idea 27 • Spread work across more workers to decrease workload
  • 61. Scaling Stateless Jobs 28 Scale Up Scale Down Source Mapper Sink • Scale up: Deploy new tasks • Scale down: Cancel running tasks
  • 62. Scaling Stateful Jobs 29 ? • Problem 1: Which state to assign to new task? • Problem 2: Read + filter whole state?
  • 63. Non-keyed vs Keyed State 30 • State bound to an operator + key • E.g. Keyed UDF and window state • „SELECT count(*) FROM t GROUP BY t.key“ • State bound only to operator • E.g. Source state KeyedNon-keyed
  • 64. Non-keyed vs Keyed State 30 • State bound to an operator + key • E.g. Keyed UDF and window state • „SELECT count(*) FROM t GROUP BY t.key“ • State bound only to operator • E.g. Source state KeyedNon-keyed
  • 65. Repartitioning Non-keyed state 31 #1 #2 #3 #4 #1 #2 #3 #4 Flink 1.1: T snapshot() void restore(T) Flink 1.2: List<T> snapshot() void restore(List<T>) Idea: break up state into finer granules that can be redistributed independently
  • 66. Example: Kafka Source Flink 1.1 32 partitionId: 1, offset: 42 partitionId: 3, offset: 10 partitionId: 6, offset: 27 ? Operator state is black box. How to repartition?
  • 67. Example: Kafka Source Flink 1.2 33 partitionId: 1, offset: 42 partitionId: 3, offset: 10 partitionId: 6, offset: 27 ? ? ? Return a list of sub-states which can be freely repartitioned.
  • 68. partitionId: 1, offset: 42 partitionId: 6, offset: 27 Example: Kafka Source Flink 1.2 34 partitionId: 3, offset: 10 Scale Out
  • 69. partitionId: 1, offset: 42 partitionId: 6, offset: 27 Example: Kafka Source Flink 1.2 34 partitionId: 3, offset: 10 Scale Out
  • 70. Example: Kafka Source Flink 1.2 35 partitionId: 1, offset: 42 partitionId: 6, offset: 27 partitionId: 3, offset: 10 Scale In
  • 71. Example: Kafka Source Flink 1.2 35 partitionId: 1, offset: 42 partitionId: 6, offset: 27 partitionId: 3, offset: 10 Scale In
  • 72. Non-keyed vs Keyed State 36 • State bound to an operator + key • E.g. Keyed UDF and window state • „SELECT count(*) FROM t GROUP BY t.key“ • State bound only to operator • E.g. Source state KeyedNon-keyed
  • 73. Repartitioning Keyed State ▪ Split key space into key groups ▪ Every key falls into exactly one key group ▪ Assign key groups to tasks 37 Key space Key group #1 Key group #2 Key group #3Key group #4 One key
  • 74. Repartitioning Keyed State (ct’d) ▪ Rescaling changes key group assignment ▪ Maximum parallelism defined by #key groups 38
  • 75. Current State in Flink 1.2 ▪ Manual rescaling 1. Take savepoint 2. Restart job with adjusted parallelism and savepoint 39
  • 76. Next Steps beyond Flink 1.2 ▪ Rescaling individual operators w/o restart ▪ Refactor Flink deployment and process model (previously discussed) ▪ On-the-fly Scaling 40
  • 77. Autoscaling Policies 41 • Latency • Throughput • Resource utilization • Kubernetes on GCE, EC2 and Mesos (marathon- autoscale) already support auto-scaling
  • 79. Conclusion ▪ Many great features in Flink 1.2 ▪ Walkthrough ▪ Queryable State & Dynamic Scaling 42
  • 80. Conclusion ▪ Many great features in Flink 1.2 ▪ Walkthrough ▪ Queryable State & Dynamic Scaling ▪ Glimpse beyond the 1.2 release 42
  翻译: