尊敬的 微信汇率:1円 ≈ 0.046078 元 支付宝汇率:1円 ≈ 0.046168元 [退出登录]
SlideShare a Scribd company logo
Matt VanLandeghem, Nielsen
How Nielsen Utilized
Databricks for Large-scale
Research and Development
#EUent4
About Nielsen
• Founded 1923
• Buy & Watch
– Buy: Market Research
– Watch: Audience Measurement
• Not just TV!
• Also Radio and Digital, including PC, Mobile, Connected Devices, Digital
Audio, Digital TV
• Digital Ad spending now meeting/exceeding TV Ad spending
2#EUent4
What is Nielsen Digital Ad Ratings (DAR)?
• Measurement of computer, mobile, and over-the-top device audience
– Comparable to TV ratings
– Who is behind the screen?
• Advertising campaigns
– Primary focus is age/gender demographic breaks
– On-Target Delivery (%) is a key metric
• Global product
– 25 countries
3#EUent4
How does DAR work?
4#EUent4
4.3
2.4
2.0
0
1
2
3
4
5
Third-party
Demographics
Report to ClientAd Impression
“Big Data”Mobile, computer, over-the-
top
Overnight daily reporting of:
Unique audience, Ad
impressions, On-Target %
Nielsen
Bias-Correction
Adjustment
Focus of today’s
presentation…
Nielsen Adjustments
• “Big Data” is not perfect
– Needs bias correction
– Where the value of Nielsen’s high-quality panels really
shines
– Nielsen’s panels provide a “truth set” that can be used to
develop models that adjust big data
• 3 sources of bias
– Misrepresentation
– Misattribution
– Non-coverage
• Nielsen adjustments are an active area of Research
and Development
5#EUent4
Nielsen Adjustments
• Metered home PC behavior
– Representative sample of U.S. homes
– “Medium” data
• Production impression data
– Big data
• What is the best way to create Nielsen adjustments
AND test them in a production environment?
• Foundation for Nielsen’s Databricks Use Case
6#EUent4
Nielsen Business Case
• Recently created new DAR adjustment
methodologies
– Small-scale testing showed the new methodologies are an
enhancement over current methodologies
• Business requirement: test new methodologies on a
large # of campaigns
– Need to understand client impact
– Large-scale testing could identify corner or edge cases where new
methodologies could break down and cause a data quality issue
– Small scale testing: ~20 campaigns
– Large scale testing: ~4000 campaigns
7#EUent4
Databricks
• Cluster management
• Provide a friendly interface to Spark for our
Data Scientists
– Multiple programming languages
– Create adjustment factors
• Uses an algorithm not available in SQL
– Link to production databases
– Apply adjustment factors to production-level data
– Analyze data with new adjustment factors applied
8#EUent4
Nielsen Business Case
9#EUent4
Aggregated panel data
Netezza
Cloud
-Combine small and large data
-Run all analyses in one place
using PySpark/Spark SQL
Data Lake
Oracle
Aggregated production data
10#EUent4
11#EUent4
Nielsen Business Case
• Performance gains:
– What would have taken 36 hours with standalone
Python only took 1.5 hours in Spark/Databricks
– Edge-cases identified
• Advantages of one methodology over another also
identified
– Short turn-around if any revisions to
methodology
12#EUent4
Nielsen Business Case
• Other benefits
– Reduced time from idea to deployment
– Enhanced support/investigation once deployed
• Client data inquiries and issues addressed quicker
– Collaboration
• Application Development teams
• International data science teams
• These new methodologies being tested in other
products
– Enhanced skillsets of data scientists
13#EUent4
Summary
• At the end of the day, the Databricks/Spark
technology allowed us to solve this business
use case
• The reduced R&D timeline plus extensive
testing will allow enhanced methodologies to be
available to our clients sooner
14#EUent4
Copyright © 2017 The Nielsen Company. Confidential and proprietary.
Special thanks: Mala Sivarajan, Anil Singh

More Related Content

What's hot

Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Spark Summit
 
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Spark Summit
 
Big Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science: Keynote by Al EssaBig Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science: Keynote by Al Essa
Spark Summit
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Databricks
 
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
Databricks
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Keynote – From MapReduce to Spark: An Ecosystem Evolves by Doug Cutting, Chie...
Keynote – From MapReduce to Spark: An Ecosystem Evolves by Doug Cutting, Chie...Keynote – From MapReduce to Spark: An Ecosystem Evolves by Doug Cutting, Chie...
Keynote – From MapReduce to Spark: An Ecosystem Evolves by Doug Cutting, Chie...
Cloudera, Inc.
 
Using Azure Databricks, Structured Streaming, and Deep Learning Pipelines to ...
Using Azure Databricks, Structured Streaming, and Deep Learning Pipelines to ...Using Azure Databricks, Structured Streaming, and Deep Learning Pipelines to ...
Using Azure Databricks, Structured Streaming, and Deep Learning Pipelines to ...
Databricks
 
Data Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at ZalandoData Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at Zalando
Databricks
 
Building Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLBuilding Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemML
Jen Aman
 
Building Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta LakeBuilding Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta Lake
Databricks
 
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath LakkundiApache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Databricks
 
Spark at Airbnb
Spark at AirbnbSpark at Airbnb
Spark at Airbnb
Hao Wang
 
GPU Acceleration for Financial Services
GPU Acceleration for Financial ServicesGPU Acceleration for Financial Services
GPU Acceleration for Financial Services
Kinetica
 
Ray: Enterprise-Grade, Distributed Python
Ray: Enterprise-Grade, Distributed PythonRay: Enterprise-Grade, Distributed Python
Ray: Enterprise-Grade, Distributed Python
Databricks
 
Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code Generation
Databricks
 
Semantic Image Logging Using Approximate Statistics & MLflow
Semantic Image Logging Using Approximate Statistics & MLflowSemantic Image Logging Using Approximate Statistics & MLflow
Semantic Image Logging Using Approximate Statistics & MLflow
Databricks
 
Willump: Optimizing Feature Computation in ML Inference
Willump: Optimizing Feature Computation in ML InferenceWillump: Optimizing Feature Computation in ML Inference
Willump: Optimizing Feature Computation in ML Inference
Databricks
 
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
Databricks
 
Consolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest AirportsConsolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest Airports
Databricks
 

What's hot (20)

Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
 
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
 
Big Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science: Keynote by Al EssaBig Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science: Keynote by Al Essa
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
 
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Keynote – From MapReduce to Spark: An Ecosystem Evolves by Doug Cutting, Chie...
Keynote – From MapReduce to Spark: An Ecosystem Evolves by Doug Cutting, Chie...Keynote – From MapReduce to Spark: An Ecosystem Evolves by Doug Cutting, Chie...
Keynote – From MapReduce to Spark: An Ecosystem Evolves by Doug Cutting, Chie...
 
Using Azure Databricks, Structured Streaming, and Deep Learning Pipelines to ...
Using Azure Databricks, Structured Streaming, and Deep Learning Pipelines to ...Using Azure Databricks, Structured Streaming, and Deep Learning Pipelines to ...
Using Azure Databricks, Structured Streaming, and Deep Learning Pipelines to ...
 
Data Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at ZalandoData Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at Zalando
 
Building Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLBuilding Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemML
 
Building Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta LakeBuilding Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta Lake
 
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath LakkundiApache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
 
Spark at Airbnb
Spark at AirbnbSpark at Airbnb
Spark at Airbnb
 
GPU Acceleration for Financial Services
GPU Acceleration for Financial ServicesGPU Acceleration for Financial Services
GPU Acceleration for Financial Services
 
Ray: Enterprise-Grade, Distributed Python
Ray: Enterprise-Grade, Distributed PythonRay: Enterprise-Grade, Distributed Python
Ray: Enterprise-Grade, Distributed Python
 
Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code Generation
 
Semantic Image Logging Using Approximate Statistics & MLflow
Semantic Image Logging Using Approximate Statistics & MLflowSemantic Image Logging Using Approximate Statistics & MLflow
Semantic Image Logging Using Approximate Statistics & MLflow
 
Willump: Optimizing Feature Computation in ML Inference
Willump: Optimizing Feature Computation in ML InferenceWillump: Optimizing Feature Computation in ML Inference
Willump: Optimizing Feature Computation in ML Inference
 
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
 
Consolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest AirportsConsolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest Airports
 

Similar to How Nielsen Utilized Databricks for Large-Scale Research and Development with Matt VanLandeghem

GraphTalk Berlin - Einführung in Graphdatenbanken
GraphTalk Berlin - Einführung in GraphdatenbankenGraphTalk Berlin - Einführung in Graphdatenbanken
GraphTalk Berlin - Einführung in Graphdatenbanken
Neo4j
 
Harnessing Big Data_UCLA
Harnessing Big Data_UCLAHarnessing Big Data_UCLA
Harnessing Big Data_UCLA
Paul Barsch
 
Lynn Dwyer: Smarter Working: What is all this digitalisation about?
Lynn Dwyer: Smarter Working: What is all this digitalisation about?Lynn Dwyer: Smarter Working: What is all this digitalisation about?
Lynn Dwyer: Smarter Working: What is all this digitalisation about?
Association for Project Management
 
Being a digital communication superstar
Being a digital communication superstarBeing a digital communication superstar
Being a digital communication superstar
Ron McFarland
 
Digital Strategy for future business
Digital Strategy for future businessDigital Strategy for future business
Digital Strategy for future business
Ashish Bhasin
 
FINAL_Autumn 2015 Global AR Council Member Meeting Presentation - Optimizing ...
FINAL_Autumn 2015 Global AR Council Member Meeting Presentation - Optimizing ...FINAL_Autumn 2015 Global AR Council Member Meeting Presentation - Optimizing ...
FINAL_Autumn 2015 Global AR Council Member Meeting Presentation - Optimizing ...
Larry Yokell
 
Big Data & IoT. Opportunities and challenges
Big Data & IoT. Opportunities and challengesBig Data & IoT. Opportunities and challenges
Big Data & IoT. Opportunities and challenges
MediaTek Labs
 
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsPower to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Looker
 
final oracle presentation
final oracle presentationfinal oracle presentation
final oracle presentation
Priyesh Patel
 
Using Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue and Reduce CostsUsing Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue and Reduce Costs
Connotate
 
Big data session five ( a )f
Big data session five ( a )fBig data session five ( a )f
Big data session five ( a )f
marukanda
 
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data HubEnable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
Cloudera, Inc.
 
Valuing the data asset
Valuing the data assetValuing the data asset
Valuing the data asset
Bala Iyer
 
Chris Day VP IT Transformation and Office of the CIO at AstraZeneca
Chris Day VP IT Transformation and Office of the CIO at AstraZenecaChris Day VP IT Transformation and Office of the CIO at AstraZeneca
Chris Day VP IT Transformation and Office of the CIO at AstraZeneca
Steve Ashton
 
predictive analysis and usage in procurement ppt 2017
predictive analysis and usage in procurement  ppt 2017predictive analysis and usage in procurement  ppt 2017
predictive analysis and usage in procurement ppt 2017
Prashant Bhatmule
 
Machine Learning and Industrie 4.0
Machine Learning and Industrie 4.0Machine Learning and Industrie 4.0
Machine Learning and Industrie 4.0
Peter Schleinitz
 
Think Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceThink Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial Intelligence
Data Science Milan
 
Mobile, Wearables, Big Data and A Strategy to Move Forward (with NTT Data Ent...
Mobile, Wearables, Big Data and A Strategy to Move Forward (with NTT Data Ent...Mobile, Wearables, Big Data and A Strategy to Move Forward (with NTT Data Ent...
Mobile, Wearables, Big Data and A Strategy to Move Forward (with NTT Data Ent...
Barcoding, Inc.
 
20/10 Vision: Building A 21st Century Market Research Organization
20/10 Vision: Building A 21st Century Market Research Organization20/10 Vision: Building A 21st Century Market Research Organization
20/10 Vision: Building A 21st Century Market Research Organization
Gregory Weiss
 
Michele Nati - Digital Catapult viewpoint on Industrie 4.0 - Digital Technolo...
Michele Nati - Digital Catapult viewpoint on Industrie 4.0 - Digital Technolo...Michele Nati - Digital Catapult viewpoint on Industrie 4.0 - Digital Technolo...
Michele Nati - Digital Catapult viewpoint on Industrie 4.0 - Digital Technolo...
MicheleNati
 

Similar to How Nielsen Utilized Databricks for Large-Scale Research and Development with Matt VanLandeghem (20)

GraphTalk Berlin - Einführung in Graphdatenbanken
GraphTalk Berlin - Einführung in GraphdatenbankenGraphTalk Berlin - Einführung in Graphdatenbanken
GraphTalk Berlin - Einführung in Graphdatenbanken
 
Harnessing Big Data_UCLA
Harnessing Big Data_UCLAHarnessing Big Data_UCLA
Harnessing Big Data_UCLA
 
Lynn Dwyer: Smarter Working: What is all this digitalisation about?
Lynn Dwyer: Smarter Working: What is all this digitalisation about?Lynn Dwyer: Smarter Working: What is all this digitalisation about?
Lynn Dwyer: Smarter Working: What is all this digitalisation about?
 
Being a digital communication superstar
Being a digital communication superstarBeing a digital communication superstar
Being a digital communication superstar
 
Digital Strategy for future business
Digital Strategy for future businessDigital Strategy for future business
Digital Strategy for future business
 
FINAL_Autumn 2015 Global AR Council Member Meeting Presentation - Optimizing ...
FINAL_Autumn 2015 Global AR Council Member Meeting Presentation - Optimizing ...FINAL_Autumn 2015 Global AR Council Member Meeting Presentation - Optimizing ...
FINAL_Autumn 2015 Global AR Council Member Meeting Presentation - Optimizing ...
 
Big Data & IoT. Opportunities and challenges
Big Data & IoT. Opportunities and challengesBig Data & IoT. Opportunities and challenges
Big Data & IoT. Opportunities and challenges
 
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsPower to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
 
final oracle presentation
final oracle presentationfinal oracle presentation
final oracle presentation
 
Using Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue and Reduce CostsUsing Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue and Reduce Costs
 
Big data session five ( a )f
Big data session five ( a )fBig data session five ( a )f
Big data session five ( a )f
 
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data HubEnable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
 
Valuing the data asset
Valuing the data assetValuing the data asset
Valuing the data asset
 
Chris Day VP IT Transformation and Office of the CIO at AstraZeneca
Chris Day VP IT Transformation and Office of the CIO at AstraZenecaChris Day VP IT Transformation and Office of the CIO at AstraZeneca
Chris Day VP IT Transformation and Office of the CIO at AstraZeneca
 
predictive analysis and usage in procurement ppt 2017
predictive analysis and usage in procurement  ppt 2017predictive analysis and usage in procurement  ppt 2017
predictive analysis and usage in procurement ppt 2017
 
Machine Learning and Industrie 4.0
Machine Learning and Industrie 4.0Machine Learning and Industrie 4.0
Machine Learning and Industrie 4.0
 
Think Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceThink Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial Intelligence
 
Mobile, Wearables, Big Data and A Strategy to Move Forward (with NTT Data Ent...
Mobile, Wearables, Big Data and A Strategy to Move Forward (with NTT Data Ent...Mobile, Wearables, Big Data and A Strategy to Move Forward (with NTT Data Ent...
Mobile, Wearables, Big Data and A Strategy to Move Forward (with NTT Data Ent...
 
20/10 Vision: Building A 21st Century Market Research Organization
20/10 Vision: Building A 21st Century Market Research Organization20/10 Vision: Building A 21st Century Market Research Organization
20/10 Vision: Building A 21st Century Market Research Organization
 
Michele Nati - Digital Catapult viewpoint on Industrie 4.0 - Digital Technolo...
Michele Nati - Digital Catapult viewpoint on Industrie 4.0 - Digital Technolo...Michele Nati - Digital Catapult viewpoint on Industrie 4.0 - Digital Technolo...
Michele Nati - Digital Catapult viewpoint on Industrie 4.0 - Digital Technolo...
 

More from Spark Summit

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Spark Summit
 
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
Spark Summit
 
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Spark Summit
 

More from Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
 
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
 
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
 

Recently uploaded

A review of I_O behavior on Oracle database in ASM
A review of I_O behavior on Oracle database in ASMA review of I_O behavior on Oracle database in ASM
A review of I_O behavior on Oracle database in ASM
Alireza Kamrani
 
Delhi Call Girls Karol Bagh 👉 9711199012 👈 unlimited short high profile full ...
Delhi Call Girls Karol Bagh 👉 9711199012 👈 unlimited short high profile full ...Delhi Call Girls Karol Bagh 👉 9711199012 👈 unlimited short high profile full ...
Delhi Call Girls Karol Bagh 👉 9711199012 👈 unlimited short high profile full ...
mona lisa $A12
 
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call GirlCall Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
sapna sharmap11
 
machine learning notes by Andrew Ng and Tengyu Ma
machine learning notes by Andrew Ng and Tengyu Mamachine learning notes by Andrew Ng and Tengyu Ma
machine learning notes by Andrew Ng and Tengyu Ma
Vijayabaskar Uthirapathy
 
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
zoykygu
 
Direct Lake Deep Dive slides from Fabric Engineering Roadshow
Direct Lake Deep Dive slides from Fabric Engineering RoadshowDirect Lake Deep Dive slides from Fabric Engineering Roadshow
Direct Lake Deep Dive slides from Fabric Engineering Roadshow
Gabi Münster
 
IBM watsonx.data - Seller Enablement Deck.PPTX
IBM watsonx.data - Seller Enablement Deck.PPTXIBM watsonx.data - Seller Enablement Deck.PPTX
IBM watsonx.data - Seller Enablement Deck.PPTX
EbtsamRashed
 
CAP Excel Formulas & Functions July - Copy (4).pdf
CAP Excel Formulas & Functions July - Copy (4).pdfCAP Excel Formulas & Functions July - Copy (4).pdf
CAP Excel Formulas & Functions July - Copy (4).pdf
frp60658
 
❻❸❼⓿❽❻❷⓿⓿❼KALYAN MATKA CHART FINAL OPEN JODI PANNA FIXXX DPBOSS MATKA RESULT ...
❻❸❼⓿❽❻❷⓿⓿❼KALYAN MATKA CHART FINAL OPEN JODI PANNA FIXXX DPBOSS MATKA RESULT ...❻❸❼⓿❽❻❷⓿⓿❼KALYAN MATKA CHART FINAL OPEN JODI PANNA FIXXX DPBOSS MATKA RESULT ...
❻❸❼⓿❽❻❷⓿⓿❼KALYAN MATKA CHART FINAL OPEN JODI PANNA FIXXX DPBOSS MATKA RESULT ...
#kalyanmatkaresult #dpboss #kalyanmatka #satta #matka #sattamatka
 
❣VIP Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai Escorts S...
❣VIP Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai Escorts S...❣VIP Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai Escorts S...
❣VIP Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai Escorts S...
jasodak99
 
Salesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - CanariasSalesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - Canarias
davidpietrzykowski1
 
202406 - Cape Town Snowflake User Group - LLM & RAG.pdf
202406 - Cape Town Snowflake User Group - LLM & RAG.pdf202406 - Cape Town Snowflake User Group - LLM & RAG.pdf
202406 - Cape Town Snowflake User Group - LLM & RAG.pdf
Douglas Day
 
Essential Skills for Family Assessment - Marital and Family Therapy and Couns...
Essential Skills for Family Assessment - Marital and Family Therapy and Couns...Essential Skills for Family Assessment - Marital and Family Therapy and Couns...
Essential Skills for Family Assessment - Marital and Family Therapy and Couns...
PsychoTech Services
 
Ahmedabad Call Girls 7339748667 With Free Home Delivery At Your Door
Ahmedabad Call Girls 7339748667 With Free Home Delivery At Your DoorAhmedabad Call Girls 7339748667 With Free Home Delivery At Your Door
Ahmedabad Call Girls 7339748667 With Free Home Delivery At Your Door
Russian Escorts in Delhi 9711199171 with low rate Book online
 
Erotic Call Girls Hyderabad🫱9352988975🫲 High Quality Call Girl Service Right ...
Erotic Call Girls Hyderabad🫱9352988975🫲 High Quality Call Girl Service Right ...Erotic Call Girls Hyderabad🫱9352988975🫲 High Quality Call Girl Service Right ...
Erotic Call Girls Hyderabad🫱9352988975🫲 High Quality Call Girl Service Right ...
meenusingh4354543
 
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
PsychoTech Services
 
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance PaymentCall Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
prijesh mathew
 
Difference in Differences - Does Strict Speed Limit Restrictions Reduce Road ...
Difference in Differences - Does Strict Speed Limit Restrictions Reduce Road ...Difference in Differences - Does Strict Speed Limit Restrictions Reduce Road ...
Difference in Differences - Does Strict Speed Limit Restrictions Reduce Road ...
ThinkInnovation
 
Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...
Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...
Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...
mparmparousiskostas
 
🔥Mature Women / Aunty Call Girl Chennai 💯Call Us 🔝 8094342248 🔝💃Top Class Cal...
🔥Mature Women / Aunty Call Girl Chennai 💯Call Us 🔝 8094342248 🔝💃Top Class Cal...🔥Mature Women / Aunty Call Girl Chennai 💯Call Us 🔝 8094342248 🔝💃Top Class Cal...
🔥Mature Women / Aunty Call Girl Chennai 💯Call Us 🔝 8094342248 🔝💃Top Class Cal...
shivangimorya083
 

Recently uploaded (20)

A review of I_O behavior on Oracle database in ASM
A review of I_O behavior on Oracle database in ASMA review of I_O behavior on Oracle database in ASM
A review of I_O behavior on Oracle database in ASM
 
Delhi Call Girls Karol Bagh 👉 9711199012 👈 unlimited short high profile full ...
Delhi Call Girls Karol Bagh 👉 9711199012 👈 unlimited short high profile full ...Delhi Call Girls Karol Bagh 👉 9711199012 👈 unlimited short high profile full ...
Delhi Call Girls Karol Bagh 👉 9711199012 👈 unlimited short high profile full ...
 
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call GirlCall Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
 
machine learning notes by Andrew Ng and Tengyu Ma
machine learning notes by Andrew Ng and Tengyu Mamachine learning notes by Andrew Ng and Tengyu Ma
machine learning notes by Andrew Ng and Tengyu Ma
 
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
 
Direct Lake Deep Dive slides from Fabric Engineering Roadshow
Direct Lake Deep Dive slides from Fabric Engineering RoadshowDirect Lake Deep Dive slides from Fabric Engineering Roadshow
Direct Lake Deep Dive slides from Fabric Engineering Roadshow
 
IBM watsonx.data - Seller Enablement Deck.PPTX
IBM watsonx.data - Seller Enablement Deck.PPTXIBM watsonx.data - Seller Enablement Deck.PPTX
IBM watsonx.data - Seller Enablement Deck.PPTX
 
CAP Excel Formulas & Functions July - Copy (4).pdf
CAP Excel Formulas & Functions July - Copy (4).pdfCAP Excel Formulas & Functions July - Copy (4).pdf
CAP Excel Formulas & Functions July - Copy (4).pdf
 
❻❸❼⓿❽❻❷⓿⓿❼KALYAN MATKA CHART FINAL OPEN JODI PANNA FIXXX DPBOSS MATKA RESULT ...
❻❸❼⓿❽❻❷⓿⓿❼KALYAN MATKA CHART FINAL OPEN JODI PANNA FIXXX DPBOSS MATKA RESULT ...❻❸❼⓿❽❻❷⓿⓿❼KALYAN MATKA CHART FINAL OPEN JODI PANNA FIXXX DPBOSS MATKA RESULT ...
❻❸❼⓿❽❻❷⓿⓿❼KALYAN MATKA CHART FINAL OPEN JODI PANNA FIXXX DPBOSS MATKA RESULT ...
 
❣VIP Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai Escorts S...
❣VIP Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai Escorts S...❣VIP Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai Escorts S...
❣VIP Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai Escorts S...
 
Salesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - CanariasSalesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - Canarias
 
202406 - Cape Town Snowflake User Group - LLM & RAG.pdf
202406 - Cape Town Snowflake User Group - LLM & RAG.pdf202406 - Cape Town Snowflake User Group - LLM & RAG.pdf
202406 - Cape Town Snowflake User Group - LLM & RAG.pdf
 
Essential Skills for Family Assessment - Marital and Family Therapy and Couns...
Essential Skills for Family Assessment - Marital and Family Therapy and Couns...Essential Skills for Family Assessment - Marital and Family Therapy and Couns...
Essential Skills for Family Assessment - Marital and Family Therapy and Couns...
 
Ahmedabad Call Girls 7339748667 With Free Home Delivery At Your Door
Ahmedabad Call Girls 7339748667 With Free Home Delivery At Your DoorAhmedabad Call Girls 7339748667 With Free Home Delivery At Your Door
Ahmedabad Call Girls 7339748667 With Free Home Delivery At Your Door
 
Erotic Call Girls Hyderabad🫱9352988975🫲 High Quality Call Girl Service Right ...
Erotic Call Girls Hyderabad🫱9352988975🫲 High Quality Call Girl Service Right ...Erotic Call Girls Hyderabad🫱9352988975🫲 High Quality Call Girl Service Right ...
Erotic Call Girls Hyderabad🫱9352988975🫲 High Quality Call Girl Service Right ...
 
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
 
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance PaymentCall Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
 
Difference in Differences - Does Strict Speed Limit Restrictions Reduce Road ...
Difference in Differences - Does Strict Speed Limit Restrictions Reduce Road ...Difference in Differences - Does Strict Speed Limit Restrictions Reduce Road ...
Difference in Differences - Does Strict Speed Limit Restrictions Reduce Road ...
 
Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...
Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...
Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...
 
🔥Mature Women / Aunty Call Girl Chennai 💯Call Us 🔝 8094342248 🔝💃Top Class Cal...
🔥Mature Women / Aunty Call Girl Chennai 💯Call Us 🔝 8094342248 🔝💃Top Class Cal...🔥Mature Women / Aunty Call Girl Chennai 💯Call Us 🔝 8094342248 🔝💃Top Class Cal...
🔥Mature Women / Aunty Call Girl Chennai 💯Call Us 🔝 8094342248 🔝💃Top Class Cal...
 

How Nielsen Utilized Databricks for Large-Scale Research and Development with Matt VanLandeghem

  • 1. Matt VanLandeghem, Nielsen How Nielsen Utilized Databricks for Large-scale Research and Development #EUent4
  • 2. About Nielsen • Founded 1923 • Buy & Watch – Buy: Market Research – Watch: Audience Measurement • Not just TV! • Also Radio and Digital, including PC, Mobile, Connected Devices, Digital Audio, Digital TV • Digital Ad spending now meeting/exceeding TV Ad spending 2#EUent4
  • 3. What is Nielsen Digital Ad Ratings (DAR)? • Measurement of computer, mobile, and over-the-top device audience – Comparable to TV ratings – Who is behind the screen? • Advertising campaigns – Primary focus is age/gender demographic breaks – On-Target Delivery (%) is a key metric • Global product – 25 countries 3#EUent4
  • 4. How does DAR work? 4#EUent4 4.3 2.4 2.0 0 1 2 3 4 5 Third-party Demographics Report to ClientAd Impression “Big Data”Mobile, computer, over-the- top Overnight daily reporting of: Unique audience, Ad impressions, On-Target % Nielsen Bias-Correction Adjustment Focus of today’s presentation…
  • 5. Nielsen Adjustments • “Big Data” is not perfect – Needs bias correction – Where the value of Nielsen’s high-quality panels really shines – Nielsen’s panels provide a “truth set” that can be used to develop models that adjust big data • 3 sources of bias – Misrepresentation – Misattribution – Non-coverage • Nielsen adjustments are an active area of Research and Development 5#EUent4
  • 6. Nielsen Adjustments • Metered home PC behavior – Representative sample of U.S. homes – “Medium” data • Production impression data – Big data • What is the best way to create Nielsen adjustments AND test them in a production environment? • Foundation for Nielsen’s Databricks Use Case 6#EUent4
  • 7. Nielsen Business Case • Recently created new DAR adjustment methodologies – Small-scale testing showed the new methodologies are an enhancement over current methodologies • Business requirement: test new methodologies on a large # of campaigns – Need to understand client impact – Large-scale testing could identify corner or edge cases where new methodologies could break down and cause a data quality issue – Small scale testing: ~20 campaigns – Large scale testing: ~4000 campaigns 7#EUent4
  • 8. Databricks • Cluster management • Provide a friendly interface to Spark for our Data Scientists – Multiple programming languages – Create adjustment factors • Uses an algorithm not available in SQL – Link to production databases – Apply adjustment factors to production-level data – Analyze data with new adjustment factors applied 8#EUent4
  • 9. Nielsen Business Case 9#EUent4 Aggregated panel data Netezza Cloud -Combine small and large data -Run all analyses in one place using PySpark/Spark SQL Data Lake Oracle Aggregated production data
  • 12. Nielsen Business Case • Performance gains: – What would have taken 36 hours with standalone Python only took 1.5 hours in Spark/Databricks – Edge-cases identified • Advantages of one methodology over another also identified – Short turn-around if any revisions to methodology 12#EUent4
  • 13. Nielsen Business Case • Other benefits – Reduced time from idea to deployment – Enhanced support/investigation once deployed • Client data inquiries and issues addressed quicker – Collaboration • Application Development teams • International data science teams • These new methodologies being tested in other products – Enhanced skillsets of data scientists 13#EUent4
  • 14. Summary • At the end of the day, the Databricks/Spark technology allowed us to solve this business use case • The reduced R&D timeline plus extensive testing will allow enhanced methodologies to be available to our clients sooner 14#EUent4
  • 15. Copyright © 2017 The Nielsen Company. Confidential and proprietary. Special thanks: Mala Sivarajan, Anil Singh
  翻译: