尊敬的 微信汇率:1円 ≈ 0.046078 元 支付宝汇率:1円 ≈ 0.046168元 [退出登录]
SlideShare a Scribd company logo
Building Custom Big Data
Integrations
Pat Patterson
Community Champion
@metadaddy
pat@streamsets.com
Agenda
Ingest, Data Drift and StreamSets
Short Demo
Building a custom integration
Real-world integration: Salesforce Wave Analytics
Traditional and Big Data
Founders
Company Background
Top tier Investors
Momentum to Date
Strategic Partners
● Launched 2014; exited stealth 9/15
● ~30 employees
● Double-digit enterprise customers
● 10,000 downloads
Past ETL ETL
Emerging Ingest Analyze
Data Sources Data Stores Data Consumers
Market Trends
Problem: Data Drift
The unpredictable, unannounced and unending mutation of data characteristics caused
by the operation, maintenance and modernization of the systems that produce the data
Structure
Drift
Semantic
Drift
Infrastructure
Drift
Delayed and
False Insights
Solving Data Drift
Tools
Applications
Data Stores Data ConsumersData Sources
Poor Data QualityData Drift
Custom code
Fixed-schema
Trusted InsightsData KPIs
Solving Data Drift
Tools
Applications
Data Stores Data ConsumersData Sources
Data Drift
Intent-Driven
Drift-Handling
Demo
Let’s build a simple pipeline to answer a real
question:
What’s the biggest city lot in San Francisco?
Customizing StreamSets
Currently 25 standard StreamSets destinations,
covering a wide variety of target systems, from flat
files to S3 to Kafka
But… there’s always some system not on the list
Solution: DIY!
Create Your Own Destination
Five Step Process:
○ Create template from Maven archetype
○ Add logging
○ Create a record buffer
○ Add configuration parameters
○ Send data to external system
bit.ly/sdc-dest
Your
System
Here!
Create Template from Archetype
mvn archetype:generate
-DarchetypeGroupId=com.streamsets -DarchetypeArtifactId=streamsets-
datacollector-stage-lib-tutorial
-DarchetypeVersion=1.3.0.0
-DinteractiveMode=true
Add Logging
Not 100% necessary, but VERY helpful
StreamSets uses SLF4J
$ tail -f streamsets-datacollector-1.3.0.0/log/sdc.log
Create a Record Buffer
Leverage existing code where possible!
StreamSets includes generators for CSV, JSON, Avro,
Protocol Buffers etc
Configuration
Separate configuration and code
DON’T PUT CREDENTIALS IN CODE!!!
DON’T PUT CREDENTIALS IN CODE!!!
Make your users’ and your lives easier!
Send Data to the External System
Don’t forget security policy!
streamsets-datacollector/etc/sdc-security.policy
grant codebase "file://${sdc.dist.dir}/user-libs/sampletest/-" {
permission java.net.SocketPermission "requestb.in", "connect, resolve";
};
A Real Custom Destination
Salesforce Wave Analytics
● Adapt to batch processing model
○ Configure wait time before ‘closing’ a batch
● External Data API
○ Create new dataset
○ Write to dataset
○ Close dataset on timeout
○ Trigger dataflow execution
Conclusion
StreamSets Data Collector makes simple tasks easy,
complex tasks possible
Use ‘off the shelf’ stages for simple tasks
Leverage script processors (Jython, JavaScript, Groovy) for
more complex work
Build custom stages for ultimate performance, flexibility
Thank You!
Structure
Drift
Data structures and
formats evolve and
change unexpectedly
Implication:
Data Loss
Data Squandering
Delimited
Data
107.3.137.195 fe80::21b:21ff:fe83:90fa
Attribute Format
Changes
{
“first“: “jon”
“last“: “smith”
“email“: “jsmith@acme.com”
“add1“: “123 Washington”
“add2“: “”
“city“: “Tucson”
“state“: “AZ”
“zip“: “85756”
}
{
“first“: “jane”
“last“: “smith”
“email“: “jane@earth.net”
“add1“: “456 Fillmore”
“add2“: “Apt 120”
“city“: “Fairfield”
“state“: “VA”
“zip“: “24435-1001”
“phone”: “401-555-1212”
}
Data Structure Evolution
Structure Drift
Semantic
Drift
Data semantics change
with evolving
applications
Implication:
Data Corrosion
Data Loss
Semantic Drift
24122-52172 00-24122-52172
Account Number Expansion
M134: user {jsmith} read access granted {ac:24122-52172}
M134: user {jsmith} read access granted {ca.ac:24122-52172}
Namespace Qualification
……
…,3588310669797950,$91.41,jcb,K1088-W#9,…
…,6759006011936944,$155.04,switch,A6504-Y#9,…
…,6771111111151415,$37.78,laser,Q9936-T#9,…
…,3585905063294299,$164.48,jcb,S4643-H#9,…
…,5363527828638736,$117.52,mastercard,X3286-P#9,…
…,4903080150282806,$168.03,switch,I9133-W#3,…
……
Outlier / Anomaly Detection
Infrastructure
Drift
Physical and Logical
Infrastructure changes
rapidly
Implication:
Poor Agility
Operational Downtime
Data Center 1 Data Center 2 Data Center n
3rd
Party Service Provider
App a App k
App q
Cloud
Infrastructure
Infrastructure Drift

More Related Content

What's hot

Building a Federated Data Directory Platform for Public Health
Building a Federated Data Directory Platform for Public HealthBuilding a Federated Data Directory Platform for Public Health
Building a Federated Data Directory Platform for Public Health
Databricks
 
Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on Hadoop
Tyler Mitchell
 
How to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the CloudHow to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the Cloud
VMware Tanzu
 
Optimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystemOptimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystem
DataWorks Summit
 
Intuit Analytics Cloud 101
Intuit Analytics Cloud 101Intuit Analytics Cloud 101
Intuit Analytics Cloud 101
DataWorks Summit/Hadoop Summit
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Flink Forward
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
Sudheer Kondla
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
Delta Lake: Open Source Reliability w/ Apache Spark
Delta Lake: Open Source Reliability w/ Apache SparkDelta Lake: Open Source Reliability w/ Apache Spark
Delta Lake: Open Source Reliability w/ Apache Spark
George Chow
 
About CDAP
About CDAPAbout CDAP
About CDAP
Cask Data
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
Databricks
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Paris Data Engineers !
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
DataWorks Summit
 
Building Sessionization Pipeline at Scale with Databricks Delta
Building Sessionization Pipeline at Scale with Databricks DeltaBuilding Sessionization Pipeline at Scale with Databricks Delta
Building Sessionization Pipeline at Scale with Databricks Delta
Databricks
 
Democratizing Data
Democratizing DataDemocratizing Data
Democratizing Data
Databricks
 
Phar Data Platform: From the Lakehouse Paradigm to the Reality
Phar Data Platform: From the Lakehouse Paradigm to the RealityPhar Data Platform: From the Lakehouse Paradigm to the Reality
Phar Data Platform: From the Lakehouse Paradigm to the Reality
Databricks
 
Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data Warehouse
DataWorks Summit
 
What is an Open Data Lake? - Data Sheets | Whitepaper
What is an Open Data Lake? - Data Sheets | WhitepaperWhat is an Open Data Lake? - Data Sheets | Whitepaper
What is an Open Data Lake? - Data Sheets | Whitepaper
Vasu S
 
StreamSet ETL tool
StreamSet  ETL toolStreamSet  ETL tool
StreamSet ETL tool
SwapnilSHampi
 

What's hot (20)

Building a Federated Data Directory Platform for Public Health
Building a Federated Data Directory Platform for Public HealthBuilding a Federated Data Directory Platform for Public Health
Building a Federated Data Directory Platform for Public Health
 
Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on Hadoop
 
How to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the CloudHow to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the Cloud
 
Optimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystemOptimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystem
 
Intuit Analytics Cloud 101
Intuit Analytics Cloud 101Intuit Analytics Cloud 101
Intuit Analytics Cloud 101
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Delta Lake: Open Source Reliability w/ Apache Spark
Delta Lake: Open Source Reliability w/ Apache SparkDelta Lake: Open Source Reliability w/ Apache Spark
Delta Lake: Open Source Reliability w/ Apache Spark
 
About CDAP
About CDAPAbout CDAP
About CDAP
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
 
Building Sessionization Pipeline at Scale with Databricks Delta
Building Sessionization Pipeline at Scale with Databricks DeltaBuilding Sessionization Pipeline at Scale with Databricks Delta
Building Sessionization Pipeline at Scale with Databricks Delta
 
Democratizing Data
Democratizing DataDemocratizing Data
Democratizing Data
 
Phar Data Platform: From the Lakehouse Paradigm to the Reality
Phar Data Platform: From the Lakehouse Paradigm to the RealityPhar Data Platform: From the Lakehouse Paradigm to the Reality
Phar Data Platform: From the Lakehouse Paradigm to the Reality
 
Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data Warehouse
 
What is an Open Data Lake? - Data Sheets | Whitepaper
What is an Open Data Lake? - Data Sheets | WhitepaperWhat is an Open Data Lake? - Data Sheets | Whitepaper
What is an Open Data Lake? - Data Sheets | Whitepaper
 
StreamSet ETL tool
StreamSet  ETL toolStreamSet  ETL tool
StreamSet ETL tool
 

Viewers also liked

Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
Pat Patterson
 
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
Pat Patterson
 
Provisioning IDaaS - Using SCIM to Enable Cloud Identity
Provisioning IDaaS - Using SCIM to Enable Cloud IdentityProvisioning IDaaS - Using SCIM to Enable Cloud Identity
Provisioning IDaaS - Using SCIM to Enable Cloud Identity
Pat Patterson
 
Adaptive Data Cleansing with StreamSets and Cassandra
Adaptive Data Cleansing with StreamSets and CassandraAdaptive Data Cleansing with StreamSets and Cassandra
Adaptive Data Cleansing with StreamSets and Cassandra
Pat Patterson
 
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)
Pat Patterson
 
All Aboard the Boxcar! Going Beyond the Basics of REST
All Aboard the Boxcar! Going Beyond the Basics of RESTAll Aboard the Boxcar! Going Beyond the Basics of REST
All Aboard the Boxcar! Going Beyond the Basics of REST
Pat Patterson
 
Data Aggregation At Scale Using Apache Flume
Data Aggregation At Scale Using Apache FlumeData Aggregation At Scale Using Apache Flume
Data Aggregation At Scale Using Apache Flume
Arvind Prabhakar
 
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsBuilding Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSets
Pat Patterson
 
Building Continuously Curated Ingestion Pipelines
Building Continuously Curated Ingestion PipelinesBuilding Continuously Curated Ingestion Pipelines
Building Continuously Curated Ingestion Pipelines
Arvind Prabhakar
 
Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!
Pat Patterson
 
Apache Flume - DataDayTexas
Apache Flume - DataDayTexasApache Flume - DataDayTexas
Apache Flume - DataDayTexas
Arvind Prabhakar
 

Viewers also liked (11)

Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
 
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
 
Provisioning IDaaS - Using SCIM to Enable Cloud Identity
Provisioning IDaaS - Using SCIM to Enable Cloud IdentityProvisioning IDaaS - Using SCIM to Enable Cloud Identity
Provisioning IDaaS - Using SCIM to Enable Cloud Identity
 
Adaptive Data Cleansing with StreamSets and Cassandra
Adaptive Data Cleansing with StreamSets and CassandraAdaptive Data Cleansing with StreamSets and Cassandra
Adaptive Data Cleansing with StreamSets and Cassandra
 
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)
 
All Aboard the Boxcar! Going Beyond the Basics of REST
All Aboard the Boxcar! Going Beyond the Basics of RESTAll Aboard the Boxcar! Going Beyond the Basics of REST
All Aboard the Boxcar! Going Beyond the Basics of REST
 
Data Aggregation At Scale Using Apache Flume
Data Aggregation At Scale Using Apache FlumeData Aggregation At Scale Using Apache Flume
Data Aggregation At Scale Using Apache Flume
 
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsBuilding Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSets
 
Building Continuously Curated Ingestion Pipelines
Building Continuously Curated Ingestion PipelinesBuilding Continuously Curated Ingestion Pipelines
Building Continuously Curated Ingestion Pipelines
 
Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!
 
Apache Flume - DataDayTexas
Apache Flume - DataDayTexasApache Flume - DataDayTexas
Apache Flume - DataDayTexas
 

Similar to Building Custom Big Data Integrations

Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
Mark Kromer
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
Eric Kavanagh
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
Ruhani Arora
 
Building IoT and Big Data Solutions on Azure
Building IoT and Big Data Solutions on AzureBuilding IoT and Big Data Solutions on Azure
Building IoT and Big Data Solutions on Azure
Ido Flatow
 
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
Denodo
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?
confluent
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
Denodo
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Denodo
 
Logical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business OutcomesLogical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business Outcomes
Denodo
 
Data Vault 2.0: Big Data Meets Data Warehousing
Data Vault 2.0: Big Data Meets Data WarehousingData Vault 2.0: Big Data Meets Data Warehousing
Data Vault 2.0: Big Data Meets Data Warehousing
All Things Open
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Hortonworks
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafka
Jeffrey T. Pollock
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lon
Jeffrey T. Pollock
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
Mark Kromer
 
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
Denodo
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Databricks
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Denodo
 
Rev_3 Components of a Data Warehouse
Rev_3 Components of a Data WarehouseRev_3 Components of a Data Warehouse
Rev_3 Components of a Data Warehouse
Ryan Andhavarapu
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
Jeffrey T. Pollock
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
Denodo
 

Similar to Building Custom Big Data Integrations (20)

Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
 
Building IoT and Big Data Solutions on Azure
Building IoT and Big Data Solutions on AzureBuilding IoT and Big Data Solutions on Azure
Building IoT and Big Data Solutions on Azure
 
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
 
Logical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business OutcomesLogical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business Outcomes
 
Data Vault 2.0: Big Data Meets Data Warehousing
Data Vault 2.0: Big Data Meets Data WarehousingData Vault 2.0: Big Data Meets Data Warehousing
Data Vault 2.0: Big Data Meets Data Warehousing
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafka
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lon
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
 
Rev_3 Components of a Data Warehouse
Rev_3 Components of a Data WarehouseRev_3 Components of a Data Warehouse
Rev_3 Components of a Data Warehouse
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 

More from Pat Patterson

DevOps from the Provider Perspective
DevOps from the Provider PerspectiveDevOps from the Provider Perspective
DevOps from the Provider Perspective
Pat Patterson
 
How Imprivata Combines External Data Sources for Business Insights
How Imprivata Combines External Data Sources for Business InsightsHow Imprivata Combines External Data Sources for Business Insights
How Imprivata Combines External Data Sources for Business Insights
Pat Patterson
 
Data Integration with Apache Kafka: What, Why, How
Data Integration with Apache Kafka: What, Why, HowData Integration with Apache Kafka: What, Why, How
Data Integration with Apache Kafka: What, Why, How
Pat Patterson
 
Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...
Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...
Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...
Pat Patterson
 
Integrating with Einstein Analytics
Integrating with Einstein AnalyticsIntegrating with Einstein Analytics
Integrating with Einstein Analytics
Pat Patterson
 
Efficient Schemas in Motion with Kafka and Schema Registry
Efficient Schemas in Motion with Kafka and Schema RegistryEfficient Schemas in Motion with Kafka and Schema Registry
Efficient Schemas in Motion with Kafka and Schema Registry
Pat Patterson
 
Enterprise IoT: Data in Context
Enterprise IoT: Data in ContextEnterprise IoT: Data in Context
Enterprise IoT: Data in Context
Pat Patterson
 
OData: A Standard API for Data Access
OData: A Standard API for Data AccessOData: A Standard API for Data Access
OData: A Standard API for Data Access
Pat Patterson
 
API-Driven Relationships: Building The Trans-Internet Express of the Future
API-Driven Relationships: Building The Trans-Internet Express of the FutureAPI-Driven Relationships: Building The Trans-Internet Express of the Future
API-Driven Relationships: Building The Trans-Internet Express of the Future
Pat Patterson
 
Using Salesforce to Manage Your Developer Community
Using Salesforce to Manage Your Developer CommunityUsing Salesforce to Manage Your Developer Community
Using Salesforce to Manage Your Developer Community
Pat Patterson
 
Identity in the Cloud
Identity in the CloudIdentity in the Cloud
Identity in the Cloud
Pat Patterson
 
OpenID Connect: An Overview
OpenID Connect: An OverviewOpenID Connect: An Overview
OpenID Connect: An Overview
Pat Patterson
 
How I Learned to Stop Worrying and Love Open Source Identity (Paris Edition)
How I Learned to Stop Worrying and Love Open Source Identity (Paris Edition)How I Learned to Stop Worrying and Love Open Source Identity (Paris Edition)
How I Learned to Stop Worrying and Love Open Source Identity (Paris Edition)
Pat Patterson
 
Salesforce Integration with Twilio
Salesforce Integration with TwilioSalesforce Integration with Twilio
Salesforce Integration with Twilio
Pat Patterson
 
SAML Smackdown
SAML SmackdownSAML Smackdown
SAML Smackdown
Pat Patterson
 
How I Learned to Stop Worrying and Love Open Source Identity
How I Learned to Stop Worrying and Love Open Source IdentityHow I Learned to Stop Worrying and Love Open Source Identity
How I Learned to Stop Worrying and Love Open Source Identity
Pat Patterson
 
Mobile Developer Week
Mobile Developer WeekMobile Developer Week
Mobile Developer Week
Pat Patterson
 
Taking Identity from the Enterprise to the Cloud
Taking Identity from the Enterprise to the CloudTaking Identity from the Enterprise to the Cloud
Taking Identity from the Enterprise to the Cloud
Pat Patterson
 
Adapting OAuth to the Enterprise
Adapting OAuth to the EnterpriseAdapting OAuth to the Enterprise
Adapting OAuth to the Enterprise
Pat Patterson
 
Force.com: A Walk on the Enterprise Side
Force.com: A Walk on the Enterprise SideForce.com: A Walk on the Enterprise Side
Force.com: A Walk on the Enterprise Side
Pat Patterson
 

More from Pat Patterson (20)

DevOps from the Provider Perspective
DevOps from the Provider PerspectiveDevOps from the Provider Perspective
DevOps from the Provider Perspective
 
How Imprivata Combines External Data Sources for Business Insights
How Imprivata Combines External Data Sources for Business InsightsHow Imprivata Combines External Data Sources for Business Insights
How Imprivata Combines External Data Sources for Business Insights
 
Data Integration with Apache Kafka: What, Why, How
Data Integration with Apache Kafka: What, Why, HowData Integration with Apache Kafka: What, Why, How
Data Integration with Apache Kafka: What, Why, How
 
Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...
Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...
Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...
 
Integrating with Einstein Analytics
Integrating with Einstein AnalyticsIntegrating with Einstein Analytics
Integrating with Einstein Analytics
 
Efficient Schemas in Motion with Kafka and Schema Registry
Efficient Schemas in Motion with Kafka and Schema RegistryEfficient Schemas in Motion with Kafka and Schema Registry
Efficient Schemas in Motion with Kafka and Schema Registry
 
Enterprise IoT: Data in Context
Enterprise IoT: Data in ContextEnterprise IoT: Data in Context
Enterprise IoT: Data in Context
 
OData: A Standard API for Data Access
OData: A Standard API for Data AccessOData: A Standard API for Data Access
OData: A Standard API for Data Access
 
API-Driven Relationships: Building The Trans-Internet Express of the Future
API-Driven Relationships: Building The Trans-Internet Express of the FutureAPI-Driven Relationships: Building The Trans-Internet Express of the Future
API-Driven Relationships: Building The Trans-Internet Express of the Future
 
Using Salesforce to Manage Your Developer Community
Using Salesforce to Manage Your Developer CommunityUsing Salesforce to Manage Your Developer Community
Using Salesforce to Manage Your Developer Community
 
Identity in the Cloud
Identity in the CloudIdentity in the Cloud
Identity in the Cloud
 
OpenID Connect: An Overview
OpenID Connect: An OverviewOpenID Connect: An Overview
OpenID Connect: An Overview
 
How I Learned to Stop Worrying and Love Open Source Identity (Paris Edition)
How I Learned to Stop Worrying and Love Open Source Identity (Paris Edition)How I Learned to Stop Worrying and Love Open Source Identity (Paris Edition)
How I Learned to Stop Worrying and Love Open Source Identity (Paris Edition)
 
Salesforce Integration with Twilio
Salesforce Integration with TwilioSalesforce Integration with Twilio
Salesforce Integration with Twilio
 
SAML Smackdown
SAML SmackdownSAML Smackdown
SAML Smackdown
 
How I Learned to Stop Worrying and Love Open Source Identity
How I Learned to Stop Worrying and Love Open Source IdentityHow I Learned to Stop Worrying and Love Open Source Identity
How I Learned to Stop Worrying and Love Open Source Identity
 
Mobile Developer Week
Mobile Developer WeekMobile Developer Week
Mobile Developer Week
 
Taking Identity from the Enterprise to the Cloud
Taking Identity from the Enterprise to the CloudTaking Identity from the Enterprise to the Cloud
Taking Identity from the Enterprise to the Cloud
 
Adapting OAuth to the Enterprise
Adapting OAuth to the EnterpriseAdapting OAuth to the Enterprise
Adapting OAuth to the Enterprise
 
Force.com: A Walk on the Enterprise Side
Force.com: A Walk on the Enterprise SideForce.com: A Walk on the Enterprise Side
Force.com: A Walk on the Enterprise Side
 

Recently uploaded

Solar Panel Service Provider annual maintenance contract.pdf
Solar Panel Service Provider annual maintenance contract.pdfSolar Panel Service Provider annual maintenance contract.pdf
Solar Panel Service Provider annual maintenance contract.pdf
SERVE WELL CRM NASHIK
 
Call Girls in Rajkot (7426014248) call me [🔝Rajkot🔝] Escort In Rajkot service...
Call Girls in Rajkot (7426014248) call me [🔝Rajkot🔝] Escort In Rajkot service...Call Girls in Rajkot (7426014248) call me [🔝Rajkot🔝] Escort In Rajkot service...
Call Girls in Rajkot (7426014248) call me [🔝Rajkot🔝] Escort In Rajkot service...
vickythakur209464
 
Erotic Call Girls Bangalore🫱9079923931🫲 High Quality Call Girl Service Right ...
Erotic Call Girls Bangalore🫱9079923931🫲 High Quality Call Girl Service Right ...Erotic Call Girls Bangalore🫱9079923931🫲 High Quality Call Girl Service Right ...
Erotic Call Girls Bangalore🫱9079923931🫲 High Quality Call Girl Service Right ...
meenusingh4354543
 
Call Girls Solapur ☎️ +91-7426014248 😍 Solapur Call Girl Beauty Girls Solapur...
Call Girls Solapur ☎️ +91-7426014248 😍 Solapur Call Girl Beauty Girls Solapur...Call Girls Solapur ☎️ +91-7426014248 😍 Solapur Call Girl Beauty Girls Solapur...
Call Girls Solapur ☎️ +91-7426014248 😍 Solapur Call Girl Beauty Girls Solapur...
anshsharma8761
 
European Standard S1000D, an Unnecessary Expense to OEM.pptx
European Standard S1000D, an Unnecessary Expense to OEM.pptxEuropean Standard S1000D, an Unnecessary Expense to OEM.pptx
European Standard S1000D, an Unnecessary Expense to OEM.pptx
Digital Teacher
 
Streamlining End-to-End Testing Automation
Streamlining End-to-End Testing AutomationStreamlining End-to-End Testing Automation
Streamlining End-to-End Testing Automation
Anand Bagmar
 
Female Bangalore Call Girls 👉 7023059433 👈 Vip Escorts Service Available
Female Bangalore Call Girls 👉 7023059433 👈 Vip Escorts Service AvailableFemale Bangalore Call Girls 👉 7023059433 👈 Vip Escorts Service Available
Female Bangalore Call Girls 👉 7023059433 👈 Vip Escorts Service Available
isha sharman06
 
CCTV & Security Systems annual maintenance contract.pdf
CCTV & Security Systems annual maintenance contract.pdfCCTV & Security Systems annual maintenance contract.pdf
CCTV & Security Systems annual maintenance contract.pdf
SERVE WELL CRM NASHIK
 
Stork Product Overview: An AI-Powered Autonomous Delivery Fleet
Stork Product Overview: An AI-Powered Autonomous Delivery FleetStork Product Overview: An AI-Powered Autonomous Delivery Fleet
Stork Product Overview: An AI-Powered Autonomous Delivery Fleet
Vince Scalabrino
 
Happy Birthday Kubernetes, 10th Birthday edition of Kubernetes Birthday in Au...
Happy Birthday Kubernetes, 10th Birthday edition of Kubernetes Birthday in Au...Happy Birthday Kubernetes, 10th Birthday edition of Kubernetes Birthday in Au...
Happy Birthday Kubernetes, 10th Birthday edition of Kubernetes Birthday in Au...
Chad Crowell
 
119321250-History-of-Computer-Programming.ppt
119321250-History-of-Computer-Programming.ppt119321250-History-of-Computer-Programming.ppt
119321250-History-of-Computer-Programming.ppt
lavesingh522
 
Call Girls Goa 💯Call Us 🔝 7426014248 🔝 Independent Goa Escorts Service Available
Call Girls Goa 💯Call Us 🔝 7426014248 🔝 Independent Goa Escorts Service AvailableCall Girls Goa 💯Call Us 🔝 7426014248 🔝 Independent Goa Escorts Service Available
Call Girls Goa 💯Call Us 🔝 7426014248 🔝 Independent Goa Escorts Service Available
sapnaanpad7
 
NLJUG speaker academy 2024 - session 1, June 2024
NLJUG speaker academy 2024 - session 1, June 2024NLJUG speaker academy 2024 - session 1, June 2024
NLJUG speaker academy 2024 - session 1, June 2024
Bert Jan Schrijver
 
DDD tales from ProductLand - NewCrafts Paris - May 2024
DDD tales from ProductLand - NewCrafts Paris - May 2024DDD tales from ProductLand - NewCrafts Paris - May 2024
DDD tales from ProductLand - NewCrafts Paris - May 2024
Alberto Brandolini
 
Hyperledger Besu 빨리 따라하기 (Private Networks)
Hyperledger Besu 빨리 따라하기 (Private Networks)Hyperledger Besu 빨리 따라하기 (Private Networks)
Hyperledger Besu 빨리 따라하기 (Private Networks)
wonyong hwang
 
1 Million Orange Stickies later - Devoxx Poland 2024
1 Million Orange Stickies later - Devoxx Poland 20241 Million Orange Stickies later - Devoxx Poland 2024
1 Million Orange Stickies later - Devoxx Poland 2024
Alberto Brandolini
 
Lightning Talk - Ephemeral Containers on Kubernetes in 10 MInutes.pdf
Lightning Talk -  Ephemeral Containers on Kubernetes in 10 MInutes.pdfLightning Talk -  Ephemeral Containers on Kubernetes in 10 MInutes.pdf
Lightning Talk - Ephemeral Containers on Kubernetes in 10 MInutes.pdf
Natan Yellin
 
What’s New in VictoriaLogs - Q2 2024 Update
What’s New in VictoriaLogs - Q2 2024 UpdateWhat’s New in VictoriaLogs - Q2 2024 Update
What’s New in VictoriaLogs - Q2 2024 Update
VictoriaMetrics
 
TheFutureIsDynamic-BoxLang-CFCamp2024.pdf
TheFutureIsDynamic-BoxLang-CFCamp2024.pdfTheFutureIsDynamic-BoxLang-CFCamp2024.pdf
TheFutureIsDynamic-BoxLang-CFCamp2024.pdf
Ortus Solutions, Corp
 
Independent Call Girls In Bangalore 💯Call Us 🔝 7426014248 🔝Independent Bangal...
Independent Call Girls In Bangalore 💯Call Us 🔝 7426014248 🔝Independent Bangal...Independent Call Girls In Bangalore 💯Call Us 🔝 7426014248 🔝Independent Bangal...
Independent Call Girls In Bangalore 💯Call Us 🔝 7426014248 🔝Independent Bangal...
sapnasaifi408
 

Recently uploaded (20)

Solar Panel Service Provider annual maintenance contract.pdf
Solar Panel Service Provider annual maintenance contract.pdfSolar Panel Service Provider annual maintenance contract.pdf
Solar Panel Service Provider annual maintenance contract.pdf
 
Call Girls in Rajkot (7426014248) call me [🔝Rajkot🔝] Escort In Rajkot service...
Call Girls in Rajkot (7426014248) call me [🔝Rajkot🔝] Escort In Rajkot service...Call Girls in Rajkot (7426014248) call me [🔝Rajkot🔝] Escort In Rajkot service...
Call Girls in Rajkot (7426014248) call me [🔝Rajkot🔝] Escort In Rajkot service...
 
Erotic Call Girls Bangalore🫱9079923931🫲 High Quality Call Girl Service Right ...
Erotic Call Girls Bangalore🫱9079923931🫲 High Quality Call Girl Service Right ...Erotic Call Girls Bangalore🫱9079923931🫲 High Quality Call Girl Service Right ...
Erotic Call Girls Bangalore🫱9079923931🫲 High Quality Call Girl Service Right ...
 
Call Girls Solapur ☎️ +91-7426014248 😍 Solapur Call Girl Beauty Girls Solapur...
Call Girls Solapur ☎️ +91-7426014248 😍 Solapur Call Girl Beauty Girls Solapur...Call Girls Solapur ☎️ +91-7426014248 😍 Solapur Call Girl Beauty Girls Solapur...
Call Girls Solapur ☎️ +91-7426014248 😍 Solapur Call Girl Beauty Girls Solapur...
 
European Standard S1000D, an Unnecessary Expense to OEM.pptx
European Standard S1000D, an Unnecessary Expense to OEM.pptxEuropean Standard S1000D, an Unnecessary Expense to OEM.pptx
European Standard S1000D, an Unnecessary Expense to OEM.pptx
 
Streamlining End-to-End Testing Automation
Streamlining End-to-End Testing AutomationStreamlining End-to-End Testing Automation
Streamlining End-to-End Testing Automation
 
Female Bangalore Call Girls 👉 7023059433 👈 Vip Escorts Service Available
Female Bangalore Call Girls 👉 7023059433 👈 Vip Escorts Service AvailableFemale Bangalore Call Girls 👉 7023059433 👈 Vip Escorts Service Available
Female Bangalore Call Girls 👉 7023059433 👈 Vip Escorts Service Available
 
CCTV & Security Systems annual maintenance contract.pdf
CCTV & Security Systems annual maintenance contract.pdfCCTV & Security Systems annual maintenance contract.pdf
CCTV & Security Systems annual maintenance contract.pdf
 
Stork Product Overview: An AI-Powered Autonomous Delivery Fleet
Stork Product Overview: An AI-Powered Autonomous Delivery FleetStork Product Overview: An AI-Powered Autonomous Delivery Fleet
Stork Product Overview: An AI-Powered Autonomous Delivery Fleet
 
Happy Birthday Kubernetes, 10th Birthday edition of Kubernetes Birthday in Au...
Happy Birthday Kubernetes, 10th Birthday edition of Kubernetes Birthday in Au...Happy Birthday Kubernetes, 10th Birthday edition of Kubernetes Birthday in Au...
Happy Birthday Kubernetes, 10th Birthday edition of Kubernetes Birthday in Au...
 
119321250-History-of-Computer-Programming.ppt
119321250-History-of-Computer-Programming.ppt119321250-History-of-Computer-Programming.ppt
119321250-History-of-Computer-Programming.ppt
 
Call Girls Goa 💯Call Us 🔝 7426014248 🔝 Independent Goa Escorts Service Available
Call Girls Goa 💯Call Us 🔝 7426014248 🔝 Independent Goa Escorts Service AvailableCall Girls Goa 💯Call Us 🔝 7426014248 🔝 Independent Goa Escorts Service Available
Call Girls Goa 💯Call Us 🔝 7426014248 🔝 Independent Goa Escorts Service Available
 
NLJUG speaker academy 2024 - session 1, June 2024
NLJUG speaker academy 2024 - session 1, June 2024NLJUG speaker academy 2024 - session 1, June 2024
NLJUG speaker academy 2024 - session 1, June 2024
 
DDD tales from ProductLand - NewCrafts Paris - May 2024
DDD tales from ProductLand - NewCrafts Paris - May 2024DDD tales from ProductLand - NewCrafts Paris - May 2024
DDD tales from ProductLand - NewCrafts Paris - May 2024
 
Hyperledger Besu 빨리 따라하기 (Private Networks)
Hyperledger Besu 빨리 따라하기 (Private Networks)Hyperledger Besu 빨리 따라하기 (Private Networks)
Hyperledger Besu 빨리 따라하기 (Private Networks)
 
1 Million Orange Stickies later - Devoxx Poland 2024
1 Million Orange Stickies later - Devoxx Poland 20241 Million Orange Stickies later - Devoxx Poland 2024
1 Million Orange Stickies later - Devoxx Poland 2024
 
Lightning Talk - Ephemeral Containers on Kubernetes in 10 MInutes.pdf
Lightning Talk -  Ephemeral Containers on Kubernetes in 10 MInutes.pdfLightning Talk -  Ephemeral Containers on Kubernetes in 10 MInutes.pdf
Lightning Talk - Ephemeral Containers on Kubernetes in 10 MInutes.pdf
 
What’s New in VictoriaLogs - Q2 2024 Update
What’s New in VictoriaLogs - Q2 2024 UpdateWhat’s New in VictoriaLogs - Q2 2024 Update
What’s New in VictoriaLogs - Q2 2024 Update
 
TheFutureIsDynamic-BoxLang-CFCamp2024.pdf
TheFutureIsDynamic-BoxLang-CFCamp2024.pdfTheFutureIsDynamic-BoxLang-CFCamp2024.pdf
TheFutureIsDynamic-BoxLang-CFCamp2024.pdf
 
Independent Call Girls In Bangalore 💯Call Us 🔝 7426014248 🔝Independent Bangal...
Independent Call Girls In Bangalore 💯Call Us 🔝 7426014248 🔝Independent Bangal...Independent Call Girls In Bangalore 💯Call Us 🔝 7426014248 🔝Independent Bangal...
Independent Call Girls In Bangalore 💯Call Us 🔝 7426014248 🔝Independent Bangal...
 

Building Custom Big Data Integrations

  • 1. Building Custom Big Data Integrations Pat Patterson Community Champion @metadaddy pat@streamsets.com
  • 2. Agenda Ingest, Data Drift and StreamSets Short Demo Building a custom integration Real-world integration: Salesforce Wave Analytics
  • 3. Traditional and Big Data Founders Company Background Top tier Investors Momentum to Date Strategic Partners ● Launched 2014; exited stealth 9/15 ● ~30 employees ● Double-digit enterprise customers ● 10,000 downloads
  • 4. Past ETL ETL Emerging Ingest Analyze Data Sources Data Stores Data Consumers Market Trends
  • 5. Problem: Data Drift The unpredictable, unannounced and unending mutation of data characteristics caused by the operation, maintenance and modernization of the systems that produce the data Structure Drift Semantic Drift Infrastructure Drift
  • 6. Delayed and False Insights Solving Data Drift Tools Applications Data Stores Data ConsumersData Sources Poor Data QualityData Drift Custom code Fixed-schema
  • 7. Trusted InsightsData KPIs Solving Data Drift Tools Applications Data Stores Data ConsumersData Sources Data Drift Intent-Driven Drift-Handling
  • 8. Demo Let’s build a simple pipeline to answer a real question: What’s the biggest city lot in San Francisco?
  • 9. Customizing StreamSets Currently 25 standard StreamSets destinations, covering a wide variety of target systems, from flat files to S3 to Kafka But… there’s always some system not on the list Solution: DIY!
  • 10. Create Your Own Destination Five Step Process: ○ Create template from Maven archetype ○ Add logging ○ Create a record buffer ○ Add configuration parameters ○ Send data to external system bit.ly/sdc-dest Your System Here!
  • 11. Create Template from Archetype mvn archetype:generate -DarchetypeGroupId=com.streamsets -DarchetypeArtifactId=streamsets- datacollector-stage-lib-tutorial -DarchetypeVersion=1.3.0.0 -DinteractiveMode=true
  • 12. Add Logging Not 100% necessary, but VERY helpful StreamSets uses SLF4J $ tail -f streamsets-datacollector-1.3.0.0/log/sdc.log
  • 13. Create a Record Buffer Leverage existing code where possible! StreamSets includes generators for CSV, JSON, Avro, Protocol Buffers etc
  • 14. Configuration Separate configuration and code DON’T PUT CREDENTIALS IN CODE!!! DON’T PUT CREDENTIALS IN CODE!!! Make your users’ and your lives easier!
  • 15. Send Data to the External System Don’t forget security policy! streamsets-datacollector/etc/sdc-security.policy grant codebase "file://${sdc.dist.dir}/user-libs/sampletest/-" { permission java.net.SocketPermission "requestb.in", "connect, resolve"; };
  • 16. A Real Custom Destination Salesforce Wave Analytics ● Adapt to batch processing model ○ Configure wait time before ‘closing’ a batch ● External Data API ○ Create new dataset ○ Write to dataset ○ Close dataset on timeout ○ Trigger dataflow execution
  • 17. Conclusion StreamSets Data Collector makes simple tasks easy, complex tasks possible Use ‘off the shelf’ stages for simple tasks Leverage script processors (Jython, JavaScript, Groovy) for more complex work Build custom stages for ultimate performance, flexibility
  • 19. Structure Drift Data structures and formats evolve and change unexpectedly Implication: Data Loss Data Squandering Delimited Data 107.3.137.195 fe80::21b:21ff:fe83:90fa Attribute Format Changes { “first“: “jon” “last“: “smith” “email“: “jsmith@acme.com” “add1“: “123 Washington” “add2“: “” “city“: “Tucson” “state“: “AZ” “zip“: “85756” } { “first“: “jane” “last“: “smith” “email“: “jane@earth.net” “add1“: “456 Fillmore” “add2“: “Apt 120” “city“: “Fairfield” “state“: “VA” “zip“: “24435-1001” “phone”: “401-555-1212” } Data Structure Evolution Structure Drift
  • 20. Semantic Drift Data semantics change with evolving applications Implication: Data Corrosion Data Loss Semantic Drift 24122-52172 00-24122-52172 Account Number Expansion M134: user {jsmith} read access granted {ac:24122-52172} M134: user {jsmith} read access granted {ca.ac:24122-52172} Namespace Qualification …… …,3588310669797950,$91.41,jcb,K1088-W#9,… …,6759006011936944,$155.04,switch,A6504-Y#9,… …,6771111111151415,$37.78,laser,Q9936-T#9,… …,3585905063294299,$164.48,jcb,S4643-H#9,… …,5363527828638736,$117.52,mastercard,X3286-P#9,… …,4903080150282806,$168.03,switch,I9133-W#3,… …… Outlier / Anomaly Detection
  • 21. Infrastructure Drift Physical and Logical Infrastructure changes rapidly Implication: Poor Agility Operational Downtime Data Center 1 Data Center 2 Data Center n 3rd Party Service Provider App a App k App q Cloud Infrastructure Infrastructure Drift
  翻译: