尊敬的 微信汇率:1円 ≈ 0.046166 元 支付宝汇率:1円 ≈ 0.046257元 [退出登录]
SlideShare a Scribd company logo
An Architect's Guide to Building
Real Time Big Data Systems
Raja SP
10 July 2014, Singapore
Lead Architect & Head of Products
< Real Time >
Big Data
WHY WHAT HOW
< Real Time >
Big Data
WHY WHAT HOW
What is the right time to shoot
me ?
There is a rhythm in the universe
Telecom Marketing Scenario
Cell Utilisation is Low In a Geo-Fence High Balance Frequent Visitor High Data User in the
Past
What is out there?
Square Kilometers of Arrays Tens of Thousands of Antennae Terabits of Data
Security / Intelligence
< Real Time >
Big Data
WHY WHAT HOW
Partitioned Parallel Processing
TASK
TASK
TASK
DATAi
DATAj
DATAk
Pipelined Parallel Processing
DATA TASK i TASK j TASK k
TASKDATA
Hybrid Parallel Processing
DATA TASK i
TASKj
TASK mTASK k
TASK l
TASKDATA
Should Data go to Tasks?
Or
Tasks go to Data?
DATATASK TASK TASK TASK TASK TASK
Static Data / Data at Rest
DATA DATA DATA TASK DATA DATA DATA
Streaming Data / Data in Motion
Streaming Data / Data in Motion Analytics
The classic “Word Count” (Stream Computing Version)
Counter
Counter
Java Python
Lisp
Python Java
C++
Counter
Python Python Python 2
Token
Splitter
Sink
Stream Computing Programming Constructs
Stream Tuple
Operator / Bolt
Counter
Counter
Java Python
Lisp
Python Java
C++
Counter
Python Python Python 2
Token
Splitter
Sink
Operator
Source Operator
Sink Operator
IBM Infosphere Streams Apache Storms
Bolt
Spout
-------
Composite Topology
Composite WordCountApp {
Graph
Stream< rstring sentence > Sentence = FileSource() {}
Stream< rstring word > Word = Split( Sentence ) {}
Stream< rstring word, int count > Counts = Count( Word ) {}
}
Source Split Count
IBM Infosphere Streams
Sentence Word Counts
Apache Storms
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout( ”Source", new RandomSentenceSpout(), 5 );
builder.setBolt( ”Split", new SplitSentence(), 8).shuffleGrouping( "Source” );
builder.setBolt( ”Count", new WordCount(), 12).fieldsGrouping( ”Split", new Fields( "word” ));
Source Split Count
IBM Infosphere Streams – Some Operators
Functor Perform tuple-level manipulations (~250 functions)
Filter Remove some tuples from a stream
Aggregate Group and summarize incoming tuples
Sort Impose an order on incoming tuples in a stream
Join Correlate two streams
Punctor Insert window punctuation markers into a stream
IBM Infosphere Streams – Some Operators (continued)
Barrier Synchronize tuples from sequence-correlated streams
Pair Group tuples from multiple streams of same type
Split Forward tuples to output streams based on a predicate
ThreadedSplit Distribute tuples over output streams by availability
Union Construct an output tuple from each input tuple
DeDuplicate Suppress duplicate tuples seen within a given time period
DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA
Stream Window
Aggregate
Sort
Join
< Real Time >
Big Data
WHY WHAT HOW
Streams Application Development
Method
Apache Storms
RunTime
Components
IBM Infosphere Streams
Instance
Management
Host
Application Host
Nimbus ZooKeeper
Node 1
Node 2
Node 3
Cluste
r
Apache Storms
Application Deployment Units
Instance
Management
Host
Application Host 1
Processin
g Element
1
Processin
g Element
2
Cluste
r
Management Node
(Nimbus)
Node 1
Worker 1 Worker 2
Executor
IBM Infosphere Streams
Executo
r
Executo
r
ZooKeeper
Node
High Availability & Adaptability
Optimizing scheduler assigns jobs to nodes, and
continually manages resource allocation
Apache StormsIBM Infosphere Streams
High Availability & Adaptability
Apache StormsIBM Infosphere Streams
Dynamically add Nodes and Jobs
High Availability & Adaptability
Apache StormsIBM Infosphere Streams
Execution Units on Failed Nodes can be
moved automatically with communications re-
routed
Topic:
Organized by
UNICOM Trainings & Seminars Pvt. Ltd.
contact@unicomlearning.com
DEMO
Topic:
Organized by
UNICOM Trainings & Seminars Pvt. Ltd.
contact@unicomlearning.com
Speaker name: Raja SP
Email ID: raja@knowesis.com
Thank You

More Related Content

What's hot

Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous Applications
Databricks
 
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward
 
Spark Summit EU talk by Rolf Jagerman
Spark Summit EU talk by Rolf JagermanSpark Summit EU talk by Rolf Jagerman
Spark Summit EU talk by Rolf Jagerman
Spark Summit
 
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy MonitoringApache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Databricks
 
Bridging the Gap Between Datasets and DataFrames
Bridging the Gap Between Datasets and DataFramesBridging the Gap Between Datasets and DataFrames
Bridging the Gap Between Datasets and DataFrames
Databricks
 
Analytics at Scale with Apache Spark on AWS with Jonathan Fritz
Analytics at Scale with Apache Spark on AWS with Jonathan FritzAnalytics at Scale with Apache Spark on AWS with Jonathan Fritz
Analytics at Scale with Apache Spark on AWS with Jonathan Fritz
Databricks
 
Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16 Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16
PivotalOpenSourceHub
 
Time series database, InfluxDB & PHP
Time series database, InfluxDB & PHPTime series database, InfluxDB & PHP
Time series database, InfluxDB & PHP
Corley S.r.l.
 
Zeppelin at Twitter
Zeppelin at TwitterZeppelin at Twitter
Zeppelin at Twitter
Prasad Wagle
 
DataFlow & Beam
DataFlow & BeamDataFlow & Beam
DataFlow & Beam
Gabriel Hamilton
 
Scaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersScaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of Parameters
Jen Aman
 
Ray: Enterprise-Grade, Distributed Python
Ray: Enterprise-Grade, Distributed PythonRay: Enterprise-Grade, Distributed Python
Ray: Enterprise-Grade, Distributed Python
Databricks
 
Big Data Ecosystem - 1000 Simulated Drones
Big Data Ecosystem - 1000 Simulated DronesBig Data Ecosystem - 1000 Simulated Drones
Big Data Ecosystem - 1000 Simulated Drones
Espeo Software
 
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Databricks
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Databricks
 
Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...
Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...
Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...
Spark Summit
 
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache FlinkGelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Vasia Kalavri
 
Big Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics PlatformBig Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics Platform
Sudhir Tonse
 
Structured streaming for machine learning
Structured streaming for machine learningStructured streaming for machine learning
Structured streaming for machine learning
Seth Hendrickson
 
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsGreg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Flink Forward
 

What's hot (20)

Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous Applications
 
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
 
Spark Summit EU talk by Rolf Jagerman
Spark Summit EU talk by Rolf JagermanSpark Summit EU talk by Rolf Jagerman
Spark Summit EU talk by Rolf Jagerman
 
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy MonitoringApache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
 
Bridging the Gap Between Datasets and DataFrames
Bridging the Gap Between Datasets and DataFramesBridging the Gap Between Datasets and DataFrames
Bridging the Gap Between Datasets and DataFrames
 
Analytics at Scale with Apache Spark on AWS with Jonathan Fritz
Analytics at Scale with Apache Spark on AWS with Jonathan FritzAnalytics at Scale with Apache Spark on AWS with Jonathan Fritz
Analytics at Scale with Apache Spark on AWS with Jonathan Fritz
 
Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16 Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16
 
Time series database, InfluxDB & PHP
Time series database, InfluxDB & PHPTime series database, InfluxDB & PHP
Time series database, InfluxDB & PHP
 
Zeppelin at Twitter
Zeppelin at TwitterZeppelin at Twitter
Zeppelin at Twitter
 
DataFlow & Beam
DataFlow & BeamDataFlow & Beam
DataFlow & Beam
 
Scaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersScaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of Parameters
 
Ray: Enterprise-Grade, Distributed Python
Ray: Enterprise-Grade, Distributed PythonRay: Enterprise-Grade, Distributed Python
Ray: Enterprise-Grade, Distributed Python
 
Big Data Ecosystem - 1000 Simulated Drones
Big Data Ecosystem - 1000 Simulated DronesBig Data Ecosystem - 1000 Simulated Drones
Big Data Ecosystem - 1000 Simulated Drones
 
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...
Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...
Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...
 
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache FlinkGelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
 
Big Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics PlatformBig Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics Platform
 
Structured streaming for machine learning
Structured streaming for machine learningStructured streaming for machine learning
Structured streaming for machine learning
 
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsGreg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
 

Similar to An Architect's guide to real time big data systems

Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要
Paulo Gutierrez
 
SQL Server 2008 Integration Services
SQL Server 2008 Integration ServicesSQL Server 2008 Integration Services
SQL Server 2008 Integration Services
Eduardo Castro
 
Spark streaming State of the Union - Strata San Jose 2015
Spark streaming State of the Union - Strata San Jose 2015Spark streaming State of the Union - Strata San Jose 2015
Spark streaming State of the Union - Strata San Jose 2015
Databricks
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
Data Science Milan
 
Building data pipelines
Building data pipelinesBuilding data pipelines
Building data pipelines
Jonathan Holloway
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
Chicago Hadoop Users Group
 
Towards sql for streams
Towards sql for streamsTowards sql for streams
Towards sql for streams
Radu Tudoran
 
Is there a way that we can build our Azure Synapse Pipelines all with paramet...
Is there a way that we can build our Azure Synapse Pipelines all with paramet...Is there a way that we can build our Azure Synapse Pipelines all with paramet...
Is there a way that we can build our Azure Synapse Pipelines all with paramet...
Erwin de Kreuk
 
Java one 2010
Java one 2010Java one 2010
Java one 2010
scdn
 
Social media analytics using Azure Technologies
Social media analytics using Azure TechnologiesSocial media analytics using Azure Technologies
Social media analytics using Azure Technologies
Koray Kocabas
 
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016 A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
Databricks
 
SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017
Jags Ramnarayan
 
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData
 
Spark Seattle meetup - Breaking ETL barrier with Spark Streaming
Spark Seattle meetup - Breaking ETL barrier with Spark StreamingSpark Seattle meetup - Breaking ETL barrier with Spark Streaming
Spark Seattle meetup - Breaking ETL barrier with Spark Streaming
Santosh Sahoo
 
Os Lonergan
Os LonerganOs Lonergan
Os Lonergan
oscon2007
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Helena Edelson
 
SQL Server 2008 for Developers
SQL Server 2008 for DevelopersSQL Server 2008 for Developers
SQL Server 2008 for Developers
ukdpe
 
Running Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data PlatformRunning Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data Platform
Eva Tse
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
Amazon Web Services
 
Apache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & librariesApache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & libraries
Walaa Hamdy Assy
 

Similar to An Architect's guide to real time big data systems (20)

Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要
 
SQL Server 2008 Integration Services
SQL Server 2008 Integration ServicesSQL Server 2008 Integration Services
SQL Server 2008 Integration Services
 
Spark streaming State of the Union - Strata San Jose 2015
Spark streaming State of the Union - Strata San Jose 2015Spark streaming State of the Union - Strata San Jose 2015
Spark streaming State of the Union - Strata San Jose 2015
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
 
Building data pipelines
Building data pipelinesBuilding data pipelines
Building data pipelines
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Towards sql for streams
Towards sql for streamsTowards sql for streams
Towards sql for streams
 
Is there a way that we can build our Azure Synapse Pipelines all with paramet...
Is there a way that we can build our Azure Synapse Pipelines all with paramet...Is there a way that we can build our Azure Synapse Pipelines all with paramet...
Is there a way that we can build our Azure Synapse Pipelines all with paramet...
 
Java one 2010
Java one 2010Java one 2010
Java one 2010
 
Social media analytics using Azure Technologies
Social media analytics using Azure TechnologiesSocial media analytics using Azure Technologies
Social media analytics using Azure Technologies
 
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016 A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
 
SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017
 
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
 
Spark Seattle meetup - Breaking ETL barrier with Spark Streaming
Spark Seattle meetup - Breaking ETL barrier with Spark StreamingSpark Seattle meetup - Breaking ETL barrier with Spark Streaming
Spark Seattle meetup - Breaking ETL barrier with Spark Streaming
 
Os Lonergan
Os LonerganOs Lonergan
Os Lonergan
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
 
SQL Server 2008 for Developers
SQL Server 2008 for DevelopersSQL Server 2008 for Developers
SQL Server 2008 for Developers
 
Running Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data PlatformRunning Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data Platform
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
 
Apache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & librariesApache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & libraries
 

Recently uploaded

Introduction to ThousandEyes AMER Webinar
Introduction  to ThousandEyes AMER WebinarIntroduction  to ThousandEyes AMER Webinar
Introduction to ThousandEyes AMER Webinar
ThousandEyes
 
Multivendor cloud production with VSF TR-11 - there and back again
Multivendor cloud production with VSF TR-11 - there and back againMultivendor cloud production with VSF TR-11 - there and back again
Multivendor cloud production with VSF TR-11 - there and back again
Kieran Kunhya
 
Building a Semantic Layer of your Data Platform
Building a Semantic Layer of your Data PlatformBuilding a Semantic Layer of your Data Platform
Building a Semantic Layer of your Data Platform
Enterprise Knowledge
 
ScyllaDB Leaps Forward with Dor Laor, CEO of ScyllaDB
ScyllaDB Leaps Forward with Dor Laor, CEO of ScyllaDBScyllaDB Leaps Forward with Dor Laor, CEO of ScyllaDB
ScyllaDB Leaps Forward with Dor Laor, CEO of ScyllaDB
ScyllaDB
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
Ortus Solutions, Corp
 
intra-mart Accel series 2024 Spring updates_En
intra-mart Accel series 2024 Spring updates_Enintra-mart Accel series 2024 Spring updates_En
intra-mart Accel series 2024 Spring updates_En
NTTDATA INTRAMART
 
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google CloudRadically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
ScyllaDB
 
ScyllaDB Real-Time Event Processing with CDC
ScyllaDB Real-Time Event Processing with CDCScyllaDB Real-Time Event Processing with CDC
ScyllaDB Real-Time Event Processing with CDC
ScyllaDB
 
Discover the Unseen: Tailored Recommendation of Unwatched Content
Discover the Unseen: Tailored Recommendation of Unwatched ContentDiscover the Unseen: Tailored Recommendation of Unwatched Content
Discover the Unseen: Tailored Recommendation of Unwatched Content
ScyllaDB
 
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
manji sharman06
 
Fuxnet [EN] .pdf
Fuxnet [EN]                                   .pdfFuxnet [EN]                                   .pdf
Fuxnet [EN] .pdf
Overkill Security
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
New ThousandEyes Product Features and Release Highlights: June 2024
New ThousandEyes Product Features and Release Highlights: June 2024New ThousandEyes Product Features and Release Highlights: June 2024
New ThousandEyes Product Features and Release Highlights: June 2024
ThousandEyes
 
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
anilsa9823
 
Communications Mining Series - Zero to Hero - Session 2
Communications Mining Series - Zero to Hero - Session 2Communications Mining Series - Zero to Hero - Session 2
Communications Mining Series - Zero to Hero - Session 2
DianaGray10
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
dipikamodels1
 
Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0
Neeraj Kumar Singh
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
UiPathCommunity
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
ScyllaDB
 

Recently uploaded (20)

Introduction to ThousandEyes AMER Webinar
Introduction  to ThousandEyes AMER WebinarIntroduction  to ThousandEyes AMER Webinar
Introduction to ThousandEyes AMER Webinar
 
Multivendor cloud production with VSF TR-11 - there and back again
Multivendor cloud production with VSF TR-11 - there and back againMultivendor cloud production with VSF TR-11 - there and back again
Multivendor cloud production with VSF TR-11 - there and back again
 
Building a Semantic Layer of your Data Platform
Building a Semantic Layer of your Data PlatformBuilding a Semantic Layer of your Data Platform
Building a Semantic Layer of your Data Platform
 
ScyllaDB Leaps Forward with Dor Laor, CEO of ScyllaDB
ScyllaDB Leaps Forward with Dor Laor, CEO of ScyllaDBScyllaDB Leaps Forward with Dor Laor, CEO of ScyllaDB
ScyllaDB Leaps Forward with Dor Laor, CEO of ScyllaDB
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
 
intra-mart Accel series 2024 Spring updates_En
intra-mart Accel series 2024 Spring updates_Enintra-mart Accel series 2024 Spring updates_En
intra-mart Accel series 2024 Spring updates_En
 
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google CloudRadically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
 
ScyllaDB Real-Time Event Processing with CDC
ScyllaDB Real-Time Event Processing with CDCScyllaDB Real-Time Event Processing with CDC
ScyllaDB Real-Time Event Processing with CDC
 
Discover the Unseen: Tailored Recommendation of Unwatched Content
Discover the Unseen: Tailored Recommendation of Unwatched ContentDiscover the Unseen: Tailored Recommendation of Unwatched Content
Discover the Unseen: Tailored Recommendation of Unwatched Content
 
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
 
Fuxnet [EN] .pdf
Fuxnet [EN]                                   .pdfFuxnet [EN]                                   .pdf
Fuxnet [EN] .pdf
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
New ThousandEyes Product Features and Release Highlights: June 2024
New ThousandEyes Product Features and Release Highlights: June 2024New ThousandEyes Product Features and Release Highlights: June 2024
New ThousandEyes Product Features and Release Highlights: June 2024
 
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
 
Communications Mining Series - Zero to Hero - Session 2
Communications Mining Series - Zero to Hero - Session 2Communications Mining Series - Zero to Hero - Session 2
Communications Mining Series - Zero to Hero - Session 2
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
 
Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
 

An Architect's guide to real time big data systems

  • 1. An Architect's Guide to Building Real Time Big Data Systems Raja SP 10 July 2014, Singapore Lead Architect & Head of Products
  • 2. < Real Time > Big Data WHY WHAT HOW
  • 3. < Real Time > Big Data WHY WHAT HOW
  • 4. What is the right time to shoot me ?
  • 5. There is a rhythm in the universe
  • 6. Telecom Marketing Scenario Cell Utilisation is Low In a Geo-Fence High Balance Frequent Visitor High Data User in the Past
  • 7. What is out there? Square Kilometers of Arrays Tens of Thousands of Antennae Terabits of Data
  • 9. < Real Time > Big Data WHY WHAT HOW
  • 10. Partitioned Parallel Processing TASK TASK TASK DATAi DATAj DATAk Pipelined Parallel Processing DATA TASK i TASK j TASK k TASKDATA Hybrid Parallel Processing DATA TASK i TASKj TASK mTASK k TASK l
  • 11. TASKDATA Should Data go to Tasks? Or Tasks go to Data?
  • 12. DATATASK TASK TASK TASK TASK TASK Static Data / Data at Rest DATA DATA DATA TASK DATA DATA DATA Streaming Data / Data in Motion
  • 13. Streaming Data / Data in Motion Analytics
  • 14. The classic “Word Count” (Stream Computing Version) Counter Counter Java Python Lisp Python Java C++ Counter Python Python Python 2 Token Splitter Sink
  • 15. Stream Computing Programming Constructs Stream Tuple Operator / Bolt Counter Counter Java Python Lisp Python Java C++ Counter Python Python Python 2 Token Splitter Sink
  • 16. Operator Source Operator Sink Operator IBM Infosphere Streams Apache Storms Bolt Spout ------- Composite Topology
  • 17. Composite WordCountApp { Graph Stream< rstring sentence > Sentence = FileSource() {} Stream< rstring word > Word = Split( Sentence ) {} Stream< rstring word, int count > Counts = Count( Word ) {} } Source Split Count IBM Infosphere Streams Sentence Word Counts
  • 18. Apache Storms TopologyBuilder builder = new TopologyBuilder(); builder.setSpout( ”Source", new RandomSentenceSpout(), 5 ); builder.setBolt( ”Split", new SplitSentence(), 8).shuffleGrouping( "Source” ); builder.setBolt( ”Count", new WordCount(), 12).fieldsGrouping( ”Split", new Fields( "word” )); Source Split Count
  • 19. IBM Infosphere Streams – Some Operators Functor Perform tuple-level manipulations (~250 functions) Filter Remove some tuples from a stream Aggregate Group and summarize incoming tuples Sort Impose an order on incoming tuples in a stream Join Correlate two streams Punctor Insert window punctuation markers into a stream
  • 20. IBM Infosphere Streams – Some Operators (continued) Barrier Synchronize tuples from sequence-correlated streams Pair Group tuples from multiple streams of same type Split Forward tuples to output streams based on a predicate ThreadedSplit Distribute tuples over output streams by availability Union Construct an output tuple from each input tuple DeDuplicate Suppress duplicate tuples seen within a given time period
  • 21. DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA Stream Window Aggregate Sort Join
  • 22. < Real Time > Big Data WHY WHAT HOW
  • 24. Apache Storms RunTime Components IBM Infosphere Streams Instance Management Host Application Host Nimbus ZooKeeper Node 1 Node 2 Node 3 Cluste r
  • 25. Apache Storms Application Deployment Units Instance Management Host Application Host 1 Processin g Element 1 Processin g Element 2 Cluste r Management Node (Nimbus) Node 1 Worker 1 Worker 2 Executor IBM Infosphere Streams Executo r Executo r ZooKeeper Node
  • 26. High Availability & Adaptability Optimizing scheduler assigns jobs to nodes, and continually manages resource allocation Apache StormsIBM Infosphere Streams
  • 27. High Availability & Adaptability Apache StormsIBM Infosphere Streams Dynamically add Nodes and Jobs
  • 28. High Availability & Adaptability Apache StormsIBM Infosphere Streams Execution Units on Failed Nodes can be moved automatically with communications re- routed
  • 29. Topic: Organized by UNICOM Trainings & Seminars Pvt. Ltd. contact@unicomlearning.com DEMO
  • 30. Topic: Organized by UNICOM Trainings & Seminars Pvt. Ltd. contact@unicomlearning.com Speaker name: Raja SP Email ID: raja@knowesis.com Thank You

Editor's Notes

  1. Enough chaos What – architectural thinking, programming concepts. Stream, Storm – map/reduce idea comes from lisp (1958). The 80’s game Can’t roll sleeves and deploy a 1000 node system
  2. Option 1 – I am here you are pointing your gun to me. Will you pull the trigger right now? OR Option 2 – Wait until 3 hours after I left this place and THEN pull the trigger?
  3. Wife cooks rearely….. I Thank god for that….. ½ km Spin Speed 30KM orbit speed
  4. Radio Astronomy Tyco Brahe Uppsala University and the LOFAR Outrigger In Scandinavia (LOIS )
  5. NSA breakout – prism, snowden Torture the data and it will confess to anything. Fallacy – Endogeneity Big Data has arrvied but not big Analytics – Tim Harford – the undercover economist – Financial Times
  6. Singlish – sequential process – until cows come home oredy Shared nothing data Divide Data – Example – Calculating tax for all singaporean. Work hard and earn less group
  7. Hadoop – Map Reduce Stream Computing
  8. 13
  9. Compare with map reduce Splitter heuristics continuous running streams – transient counts… sorts aggregates… windows A man cannot take bath in the same river twice
  10. Tuple – composite of fields. Tuple Schema
  11. 2 Popular frameworks
  12. InputDeclarer
  13. Relational Operators
  14. Utility Operators
  15. Tumbling Windows Sliding Windows
  16. Describe the components Describe how they are deployed
  翻译: