尊敬的 微信汇率:1円 ≈ 0.046078 元 支付宝汇率:1円 ≈ 0.046168元 [退出登录]
SlideShare a Scribd company logo
Image Recognition
on Streaming Data
Neil Dahlke, Senior Solutions Engineer
15 November 2017
AT MEMSQL
Senior Solutions Engineer, San Francisco
BEFORE MEMSQL
I worked on Globus, a high performance data transfer tool for
research scientists, out of the University of Chicago in
coordination with Argonne National Lab.
PREVIOUS TALKS
Real Time, Geospatial, Maps (slides)
Streaming in the Enterprise (slides)
Real Time Analytics with Spark and MemSQL (slides)
2
Me at a Glance
Bold(ish) Claim
3
The future of
computing is
visual…
5
6
7
Mapping Social Imagery Handwriting
and many more.
and it is also
mathematical.
But first, let’s create a shared vocabulary.
Easy to setup real-time data
pipelines with exactly-once
semantics
Streaming Data Ingest
Memory optimized tables for
analyzing real-time events
Live Data
Disk optimized tables with up to
10x compression and vectorized
queries for fast analytics
Historical Data
11
MemSQL at a Glance
Data Loading Query Latency
Concurrency
FAST LOW
Vectorized queries
Real-time dashboards
Live data access
Multi-threaded processing
Transactions and Analytics
Scalable performance
HIGH
Stream data
Real-time loading
Full data access
12
• Distributed, ANSI SQL, database
• Full ACID features
• Lock free, shared nothing
• Compiled queries
• Massively parallel
• Geospatial and JSON
• In-memory and on-disk
• MySQL protocol
• Streaming
• HTAP (rowstore and columnstore)
MemSQL in One Slide
13
Architecture: MemSQL Building Blocks
memsqld
14
Architecture: Aggregators and Leaves
Agg 1 Agg 2
Leaf 1 Leaf 2 Leaf 3 Leaf 4
15
Architecture: Aggregators Aggregate
Agg 1 Agg 2
Leaf 1 Leaf 2 Leaf 3 Leaf 4
16
Architecture: Leaves Hold Partitions
Agg 1 Agg 2
Leaf 1 Leaf 2 Leaf 3 Leaf 4
17
Architecture: It’s SQL All The Way Down
Agg 1 Agg 2
agg1> select avg(price) from
orders;
leaf1> using memsql_demo_0
select count(1), sum(price)
from orders;
leaf2> using memsql_demo_12
select count(1), sum(price)
from orders;
...
Leaf 1 Leaf 2 Leaf 3 Leaf 4
18
Architecture: High Availability
Leaf 1 Leaf 2 Leaf 4>_<
Agg 1 Agg 2
▪ Leaves are paired up
▪ Replicated async by default
▪ Automatically fails over
▪ Automatically re-attaches
Architecture: Scaling
memsqld
20
Architecture: Scaling
Agg 1 Agg 2
Leaf 1 Leaf 2 Leaf 3 Leaf 4
21
Architecture: Scaling
Agg 1 Agg 2
Leaf 1 Leaf 2 Leaf 3 Leaf 4 memsqld memsqld
22
Architecture: Scaling
Agg 1 Agg 2
Leaf 1 Leaf 2 Leaf 3 Leaf 4 Leaf 5 Leaf 6
agg1> add leaf ...
23
CREATE PIPELINE
24
Load
Ingest from Apache Kafka,
Amazon S3, Azure Blob
Store, or remote file system.
Guarantee message
delivery with exactly-once
semantics
Transform
Map and enrich data with
user defined or Apache
Spark transformations
MemSQL Streaming
Extract
25
memsql> CREATE PIPELINE twitter_pipeline AS
-> LOAD DATA KAFKA "public-kafka.memcompute.com:9092/tweets-json"
-> INTO TABLE tweets
-> (id, tweet);
Query OK, (0.89 sec)
memsql> START PIPELINE twitter_pipeline;
Query OK, (0.01 sec)
memsql> SELECT text FROM tweets ORDER BY id DESC LIMIT 5G
26
Simple Streaming Setup with CREATE PIPELINE
27
MemSQL Pipelines Sequence
Data Sources MemSQL
1. Extract 2. Transform extracted data 3. Load into Database tables
Pipelines
28
MemSQL Pipelines Architecture: Kafka
Kafka
Broker
MemSQL LeafPipelines
Kafka
Broker
MemSQL LeafPipelines
Kafka
Broker
MemSQL LeafPipelines
MemSQL AggPipelines
1. Extract 2. Transform 3. Load
Data
reshuffle
Metadata query
1. Extract 2. Transform 3. Load
1. Extract 2. Transform 3. Load
Use Cases
29
30
Bold Claim
31
DIGITAL DEFENDERS
OF CHILDREN
100k
escort ads posted
every day in this
country
40M images in the
database
How does it work?
35
Real-Time Image Recognition Workflow
▪ Train a model with Spark and TensorFlow
▪ Use the Model to extract feature vectors from images
• Model + Image => FV
▪ You can store every feature vector in a MemSQL table
CREATE TABLE features (
id bigint(11) NOT NULL,
image binary(4096) DEFAULT NULL,
KEY id (id)USING CLUSTERED COLUMNSTORE
);
4,996 POINTS
 CLASSIFICATION
 DE-DUPLICATION
 MATCHING
949:0.026740,961:0.011758,962:0.01 ...
949:0.026740,961:0.011758,962:0.01 ...
12:0.005868,16:0.004575,49:0.002
193,52:0.009880,67:0.034832,72:0.
030992,77:0.012170,108:0.012382,
120:0.012916,125:0.005741,137:0.
015322,143:0.020548,157:0.03040
7,220:0.061202,228:0.026140,232:
0.040047,236:0.023434,242:0.0266
05,252:0.007459,264:0.022012,269
:0.016690,270:0.057932,282:0.011
975,292:0.028855,298:0.006937,31
7:0.005120,333:0.028555,338:0.03
9100,348:0.017727,358:0.055682,3
76:0.006209,386:0.028764,413:0.0
17220,417:0.018298,422:0.004943,
433:0.031690,443:0.011401,451:0.
016825,452:0.000745,458:0.01076
9,460:0.044923,471:0.039836,479:
0.008343,482:0.009446,484:0.0194
43,497:0.061289,502:0.015072,508
:0.029485,530:0.013753,532:0.007
153,543:0.044873,551:0.010136,55
5:0.012994,560:0.008001,563:0.03
8678,579:0.015128,610:0.007795,6
27:0.019286,634:0.021111,641:0.0
07065,642:0.007089,659:0.058285,
672:0.018122,674:0.024745,703:0.
012181,704:0.010520,705:0.01980
5,726:0.004800,734:0.020477,751:
0.005154,753:0.023470,763:0.0026
51,783:0.033653,786:0.010800,824
:0.017787,846:0.017696,850:0.040
618,853:0.006627,880:0.020177,88
7:0.040712,901:0.004130,902:0.01
2970,926:0.011321,949:0.026740,9
61:0.
067235,1551:0.002643,1569:0.030
303,1592:0.000982,1595:0.021256
,1606:0.029090,1619:0.030494,16
28:0.007809,1630:0.012805,1632:
0.074610,1658:0.046989,1663:0.0
11392,1683:0.025755,1689:0.0005
51,1690:0.019549,1707:0.002039,
1718:0.000027,1753:0.003988,176
1:0.016639,1787:0.004682,1788:0.
036989,1793:0.010178,1799:0.032
016,1820:0.001699,1862:0.026061
,1865:0.033358,1888:0.015540,18
93:0.015230,1913:0.029057,1917:
0.017459,1930:0.012725,1932:0.0
20591,1939:0.036401,1940:0.0014
55,1941:0.029777,1948:0.028731,
1950:0.015147,1966:0.008172,197
6:0.004087,2009:0.005937,2011:0.
026532,2016:0.018998,2023:0.003
567,2024:0.033425,2043:0.024501
,2060:0.035672,2077:0.026460,20
92:0.006496,2099:0.042786,2110:
0.031982,2117:0.026819,2118:0.0
02956,2127:0.002132,2171:0.0066
93,2174:0.006085,2193:0.038693,
2207:0.080437,2210:0.036449,221
5:0.027432,2216:0.000524,2228:0.
022542,2232:0.023016,2245:0.035
095,2258:0.008138,2291:0.014170
,2297:0.024569,2301:0.019651,23
10:0.037032,2333:0.010741,2337:
0.010183,2353:0.056520,2382:0.0
05700,2406:0.012346,2409:0.0459
50,2411:0.005816,2415:0.001264,
2424:0.046932,2439:0.010018,.
033653,786:0.010800,824:0.0177
87,846:0.017696,850:0.040618,8
53:0.006627,880:0.020177,887:0.
040712,901:0.004130,902:0.0129
70,926:0.011321,949:0.026740,9
61:0.011758,962:0.01,.003080,96
6:0.025391,969:0.008317,980:0.0
24180,999:0.025001,1003:0.0099
95,1018:0.026575,1024:0.014152
,1030:0.014807,1032:0.001685,1
037:0.059401,1041:0.008451,108
3:0.004498,1086:0.042539,1100:
0.019762,1107:0.003233,1111:0.
010055,1118:0.004970,1120:0.01
3391,1137:0.033611,1143:0.0041
84,1151:0.011988,1156:0.018991
,1164:0.005059,1165:0.009926,1
171:0.041736,1181:0.009872,118
7:0.001813,1188:0.010391,1193:
0.020764,1194:0.002471,1222:0.
006705,1238:0.009757,1246:0.06
7453,1259:0.042624,1264:0.0175
58,1265:0.019401,1269:0.015384
,1299:0.013593,1310:0.002139,1
359:0.006642,1371:0.034178,137
4:0.016396,1384:0.022928,1404:
0.017169,1408:0.009406,1418:0.
073914,1420:0.011940,1421:0.00
5672,1430:0.003974,1433:0.0027
76,1463:0.031537,1481:0.000885
,1485:0.039955,1492:0.023929,1
494:0.048229,1497:0.053608,150
8:0.003894,1518:0.011840,1524:
0.011318,1528:0.
39
Working with Feature Vectors
For every image we store an ID and a normalized feature vector in a MemSQL table called
features.
ID | Feature Vector
x | 4KB
To find similar images using cosine similarity, we use this SQL query:
SELECT
id
FROM
feature_vectors
WHERE
DOT_PRODUCT(image, 0xDEADBEEF) > 0.9
40
Understanding Dot Product
41
Understanding Dot Product
▪ Dot Product is an algebraic operation
• X = (x1, …, xN), Y = (y1, …, yN)
• (X*Y) = SUM(Xi * Yi)
▪ With the specific model and normalized feature vectors
DOT PRODUCT results in a similarity score.
• The closer the score is to 1 the more similar are the images
42
Understanding SIMD
 Intel AVX-2
 256-bit registers
 Pack multiple values per
register
 Special instructions for
SIMD register operations
 Arithmetic, logic, load,
store etc.
 Allows multiple
operations in 1 instruction
1 2 3 4
1 1 1 1
2 3 4 5
+
MemSQL Confidential
VectorizedNot Vectorized
Single row, Single instruction
CPU constrained
10,000 rows / sec / core
Multiple rows, Single instruction
CPU optimized
1,000,000,000 rows / sec / core
Understanding Query Vectorization
44
Performance expectations
▪ Memory speed: ~50GB/sec
▪ Vector size: 4KB
▪ 12.5 Million Images a second per node
▪ 1 Billion images a second on 100 node cluster
949:0.026740,961:0.011758,962:0.01 ...
12:0.005868,16:0.004575,49:0.002
193,52:0.009880,67:0.034832,72:0.
030992,77:0.012170,108:0.012382,
120:0.012916,125:0.005741,137:0.
015322,143:0.020548,157:0.03040
7,220:0.061202,228:0.026140,232:
0.040047,236:0.023434,242:0.0266
05,252:0.007459,264:0.022012,269
:0.016690,270:0.057932,282:0.011
975,292:0.028855,298:0.006937,31
7:0.005120,333:0.028555,338:0.03
9100,348:0.017727,358:0.055682,3
76:0.006209,386:0.028764,413:0.0
17220,417:0.018298,422:0.004943,
433:0.031690,443:0.011401,451:0.
016825,452:0.000745,458:0.01076
9,460:0.044923,471:0.039836,479:
0.008343,482:0.009446,484:0.0194
43,497:0.061289,502:0.015072,508
:0.029485,530:0.013753,532:0.007
153,543:0.044873,551:0.010136,55
5:0.012994,560:0.008001,563:0.03
8678,579:0.015128,610:0.007795,6
27:0.019286,634:0.021111,641:0.0
07065,642:0.007089,659:0.058285,
672:0.018122,674:0.024745,703:0.
012181,704:0.010520,705:0.01980
5,726:0.004800,734:0.020477,751:
0.005154,753:0.023470,763:0.0026
51,783:0.033653,786:0.010800,824
:0.017787,846:0.017696,850:0.040
618,853:0.006627,880:0.020177,88
7:0.040712,901:0.004130,902:0.01
2970,926:0.011321,949:0.026740,9
61:0.
067235,1551:0.002643,1569:0.030
303,1592:0.000982,1595:0.021256
,1606:0.029090,1619:0.030494,16
28:0.007809,1630:0.012805,1632:
0.074610,1658:0.046989,1663:0.0
11392,1683:0.025755,1689:0.0005
51,1690:0.019549,1707:0.002039,
1718:0.000027,1753:0.003988,176
1:0.016639,1787:0.004682,1788:0.
036989,1793:0.010178,1799:0.032
016,1820:0.001699,1862:0.026061
,1865:0.033358,1888:0.015540,18
93:0.015230,1913:0.029057,1917:
0.017459,1930:0.012725,1932:0.0
20591,1939:0.036401,1940:0.0014
55,1941:0.029777,1948:0.028731,
1950:0.015147,1966:0.008172,197
6:0.004087,2009:0.005937,2011:0.
026532,2016:0.018998,2023:0.003
567,2024:0.033425,2043:0.024501
,2060:0.035672,2077:0.026460,20
92:0.006496,2099:0.042786,2110:
0.031982,2117:0.026819,2118:0.0
02956,2127:0.002132,2171:0.0066
93,2174:0.006085,2193:0.038693,
2207:0.080437,2210:0.036449,221
5:0.027432,2216:0.000524,2228:0.
022542,2232:0.023016,2245:0.035
095,2258:0.008138,2291:0.014170
,2297:0.024569,2301:0.019651,23
10:0.037032,2333:0.010741,2337:
0.010183,2353:0.056520,2382:0.0
05700,2406:0.012346,2409:0.0459
50,2411:0.005816,2415:0.001264,
2424:0.046932,2439:0.010018,.
033653,786:0.010800,824:0.0177
87,846:0.017696,850:0.040618,8
53:0.006627,880:0.020177,887:0.
040712,901:0.004130,902:0.0129
70,926:0.011321,949:0.026740,9
61:0.011758,962:0.01,.003080,96
6:0.025391,969:0.008317,980:0.0
24180,999:0.025001,1003:0.0099
95,1018:0.026575,1024:0.014152
,1030:0.014807,1032:0.001685,1
037:0.059401,1041:0.008451,108
3:0.004498,1086:0.042539,1100:
0.019762,1107:0.003233,1111:0.
010055,1118:0.004970,1120:0.01
3391,1137:0.033611,1143:0.0041
84,1151:0.011988,1156:0.018991
,1164:0.005059,1165:0.009926,1
171:0.041736,1181:0.009872,118
7:0.001813,1188:0.010391,1193:
0.020764,1194:0.002471,1222:0.
006705,1238:0.009757,1246:0.06
7453,1259:0.042624,1264:0.0175
58,1265:0.019401,1269:0.015384
,1299:0.013593,1310:0.002139,1
359:0.006642,1371:0.034178,137
4:0.016396,1384:0.022928,1404:
0.017169,1408:0.009406,1418:0.
073914,1420:0.011940,1421:0.00
5672,1430:0.003974,1433:0.0027
76,1463:0.031537,1481:0.000885
,1485:0.039955,1492:0.023929,1
494:0.048229,1497:0.053608,150
8:0.003894,1518:0.011840,1524:
0.011318,1528:0.
100s of millions of
images to match
Demo
47
Performance Enhancing Techniques
Achieving best-in-class dot product implementation
▪ SIMD-powered vectorized execution
▪ Data compression
▪ Query parallelism
▪ Scale out
▪ Result: Processing at Memory Bandwidth Speed
Real-Time
Application
New
Image
Stream
Real-Time
Processing
Reference
Image Store
TensorFlow
Streaming Real Time Image Recognition Workflow
49
MemSQL gives us…
▪ Performance
▪ Scalability
▪ High concurrency
▪ Real-time (operational)
▪ Compatibility (BI, Spark, Kafka, ETL, etc)
▪ Hybrid deployment
▪ Robustness, durability, security
Q&A
Thank you
@neildahlke

More Related Content

What's hot

Managing Cassandra Databases with OpenStack Trove
Managing Cassandra Databases with OpenStack TroveManaging Cassandra Databases with OpenStack Trove
Managing Cassandra Databases with OpenStack Trove
Tesora
 
Azure SQL Data Warehouse
Azure SQL Data Warehouse Azure SQL Data Warehouse
Azure SQL Data Warehouse
Antonios Chatzipavlis
 
Change Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVHChange Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVH
Paris Data Engineers !
 
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
DataStax
 
Azure SQL DWH
Azure SQL DWHAzure SQL DWH
Azure SQL DWH
Shy Engelberg
 
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak DataClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
Altinity Ltd
 
Azure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep DiveAzure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep Dive
Ilyas F ☁☁☁
 
ETL Made Easy with Azure Data Factory and Azure Databricks
ETL Made Easy with Azure Data Factory and Azure DatabricksETL Made Easy with Azure Data Factory and Azure Databricks
ETL Made Easy with Azure Data Factory and Azure Databricks
Databricks
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
Matei Zaharia
 
REDSHIFT - Amazon
REDSHIFT - AmazonREDSHIFT - Amazon
REDSHIFT - Amazon
Douglas Bernardini
 
Google Cloud Spanner Preview
Google Cloud Spanner PreviewGoogle Cloud Spanner Preview
Google Cloud Spanner Preview
DoiT International
 
BDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
BDW16 London - William Vambenepe, Google - 3rd Generation Data PlatformBDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
BDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
Big Data Week
 
What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?
FlyData Inc.
 
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
Valery Tkachenko
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
Antonios Chatzipavlis
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax Enterprise
DataStax
 
Azure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake AnalyticsAzure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake Analytics
Waqas Idrees
 
Change Data Capture in Scylla
Change Data Capture in ScyllaChange Data Capture in Scylla
Change Data Capture in Scylla
ScyllaDB
 
Cassandra in e-commerce
Cassandra in e-commerceCassandra in e-commerce
Cassandra in e-commerce
Alexander Solovyev
 
Netflix's Big Leap from Oracle to Cassandra
Netflix's Big Leap from Oracle to CassandraNetflix's Big Leap from Oracle to Cassandra
Netflix's Big Leap from Oracle to Cassandra
Roopa Tangirala
 

What's hot (20)

Managing Cassandra Databases with OpenStack Trove
Managing Cassandra Databases with OpenStack TroveManaging Cassandra Databases with OpenStack Trove
Managing Cassandra Databases with OpenStack Trove
 
Azure SQL Data Warehouse
Azure SQL Data Warehouse Azure SQL Data Warehouse
Azure SQL Data Warehouse
 
Change Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVHChange Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVH
 
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
 
Azure SQL DWH
Azure SQL DWHAzure SQL DWH
Azure SQL DWH
 
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak DataClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
 
Azure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep DiveAzure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep Dive
 
ETL Made Easy with Azure Data Factory and Azure Databricks
ETL Made Easy with Azure Data Factory and Azure DatabricksETL Made Easy with Azure Data Factory and Azure Databricks
ETL Made Easy with Azure Data Factory and Azure Databricks
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
REDSHIFT - Amazon
REDSHIFT - AmazonREDSHIFT - Amazon
REDSHIFT - Amazon
 
Google Cloud Spanner Preview
Google Cloud Spanner PreviewGoogle Cloud Spanner Preview
Google Cloud Spanner Preview
 
BDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
BDW16 London - William Vambenepe, Google - 3rd Generation Data PlatformBDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
BDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
 
What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?
 
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax Enterprise
 
Azure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake AnalyticsAzure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake Analytics
 
Change Data Capture in Scylla
Change Data Capture in ScyllaChange Data Capture in Scylla
Change Data Capture in Scylla
 
Cassandra in e-commerce
Cassandra in e-commerceCassandra in e-commerce
Cassandra in e-commerce
 
Netflix's Big Leap from Oracle to Cassandra
Netflix's Big Leap from Oracle to CassandraNetflix's Big Leap from Oracle to Cassandra
Netflix's Big Leap from Oracle to Cassandra
 

Similar to Image Recognition on Streaming Data

Real-Time Image Recognition with Apache Spark with Nikita Shamgunov
Real-Time Image Recognition with Apache Spark with Nikita ShamgunovReal-Time Image Recognition with Apache Spark with Nikita Shamgunov
Real-Time Image Recognition with Apache Spark with Nikita Shamgunov
Databricks
 
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and SparkSpark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
SingleStore
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
ScyllaDB
 
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
SegFaultConf
 
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Databricks
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
StampedeCon
 
Rails in the Cloud
Rails in the CloudRails in the Cloud
Rails in the Cloud
iwarshak
 
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui MengChallenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Databricks
 
Challenging Web-Scale Graph Analytics with Apache Spark
Challenging Web-Scale Graph Analytics with Apache SparkChallenging Web-Scale Graph Analytics with Apache Spark
Challenging Web-Scale Graph Analytics with Apache Spark
Databricks
 
Log everything! @DC13
Log everything! @DC13Log everything! @DC13
Log everything! @DC13
DECK36
 
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr
 
Instaclustr webinar 50,000 transactions per second with Apache Spark on Apach...
Instaclustr webinar 50,000 transactions per second with Apache Spark on Apach...Instaclustr webinar 50,000 transactions per second with Apache Spark on Apach...
Instaclustr webinar 50,000 transactions per second with Apache Spark on Apach...
Instaclustr
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
Databricks
 
Ingesting streaming data into Graph Database
Ingesting streaming data into Graph DatabaseIngesting streaming data into Graph Database
Ingesting streaming data into Graph Database
Guido Schmutz
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache Spark
Cloudera, Inc.
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data Meetup
Gwen (Chen) Shapira
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Daniel Lemire
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
DataStax Academy
 
SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017
Jags Ramnarayan
 
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData
 

Similar to Image Recognition on Streaming Data (20)

Real-Time Image Recognition with Apache Spark with Nikita Shamgunov
Real-Time Image Recognition with Apache Spark with Nikita ShamgunovReal-Time Image Recognition with Apache Spark with Nikita Shamgunov
Real-Time Image Recognition with Apache Spark with Nikita Shamgunov
 
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and SparkSpark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
 
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
 
Rails in the Cloud
Rails in the CloudRails in the Cloud
Rails in the Cloud
 
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui MengChallenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
 
Challenging Web-Scale Graph Analytics with Apache Spark
Challenging Web-Scale Graph Analytics with Apache SparkChallenging Web-Scale Graph Analytics with Apache Spark
Challenging Web-Scale Graph Analytics with Apache Spark
 
Log everything! @DC13
Log everything! @DC13Log everything! @DC13
Log everything! @DC13
 
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
 
Instaclustr webinar 50,000 transactions per second with Apache Spark on Apach...
Instaclustr webinar 50,000 transactions per second with Apache Spark on Apach...Instaclustr webinar 50,000 transactions per second with Apache Spark on Apach...
Instaclustr webinar 50,000 transactions per second with Apache Spark on Apach...
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 
Ingesting streaming data into Graph Database
Ingesting streaming data into Graph DatabaseIngesting streaming data into Graph Database
Ingesting streaming data into Graph Database
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache Spark
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data Meetup
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 
SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017
 
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
 

More from SingleStore

Five ways database modernization simplifies your data life
Five ways database modernization simplifies your data lifeFive ways database modernization simplifies your data life
Five ways database modernization simplifies your data life
SingleStore
 
How Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and AnalyticsHow Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and Analytics
SingleStore
 
Architecting Data in the AWS Ecosystem
Architecting Data in the AWS EcosystemArchitecting Data in the AWS Ecosystem
Architecting Data in the AWS Ecosystem
SingleStore
 
Building the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free LifeBuilding the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free Life
SingleStore
 
Converging Database Transactions and Analytics
Converging Database Transactions and Analytics Converging Database Transactions and Analytics
Converging Database Transactions and Analytics
SingleStore
 
Introduction to MemSQL
Introduction to MemSQLIntroduction to MemSQL
Introduction to MemSQL
SingleStore
 
Building a Fault Tolerant Distributed Architecture
Building a Fault Tolerant Distributed ArchitectureBuilding a Fault Tolerant Distributed Architecture
Building a Fault Tolerant Distributed Architecture
SingleStore
 
Stream Processing with Pipelines and Stored Procedures
Stream Processing with Pipelines  and Stored ProceduresStream Processing with Pipelines  and Stored Procedures
Stream Processing with Pipelines and Stored Procedures
SingleStore
 
Curriculum Associates Strata NYC 2017
Curriculum Associates Strata NYC 2017Curriculum Associates Strata NYC 2017
Curriculum Associates Strata NYC 2017
SingleStore
 
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image RecognitionSpark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
SingleStore
 
The State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and BeyondThe State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and Beyond
SingleStore
 
Teaching Databases to Learn in the World of AI
Teaching Databases to Learn in the World of AITeaching Databases to Learn in the World of AI
Teaching Databases to Learn in the World of AI
SingleStore
 
Real-Time Analytics at Uber Scale
Real-Time Analytics at Uber ScaleReal-Time Analytics at Uber Scale
Real-Time Analytics at Uber Scale
SingleStore
 
Machines and the Magic of Fast Learning
Machines and the Magic of Fast LearningMachines and the Magic of Fast Learning
Machines and the Magic of Fast Learning
SingleStore
 
Machines and the Magic of Fast Learning - Strata Keynote
Machines and the Magic of Fast Learning - Strata KeynoteMachines and the Magic of Fast Learning - Strata Keynote
Machines and the Magic of Fast Learning - Strata Keynote
SingleStore
 
Enabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoTEnabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoT
SingleStore
 
Real-Time Analytics with Spark and MemSQL
Real-Time Analytics with Spark and MemSQLReal-Time Analytics with Spark and MemSQL
Real-Time Analytics with Spark and MemSQL
SingleStore
 
Driving the On-Demand Economy with Predictive Analytics
Driving the On-Demand Economy with Predictive AnalyticsDriving the On-Demand Economy with Predictive Analytics
Driving the On-Demand Economy with Predictive Analytics
SingleStore
 
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile AdvertisingTapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
SingleStore
 
The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
The Real-Time CDO and the Cloud-Forward Path to Predictive AnalyticsThe Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
SingleStore
 

More from SingleStore (20)

Five ways database modernization simplifies your data life
Five ways database modernization simplifies your data lifeFive ways database modernization simplifies your data life
Five ways database modernization simplifies your data life
 
How Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and AnalyticsHow Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and Analytics
 
Architecting Data in the AWS Ecosystem
Architecting Data in the AWS EcosystemArchitecting Data in the AWS Ecosystem
Architecting Data in the AWS Ecosystem
 
Building the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free LifeBuilding the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free Life
 
Converging Database Transactions and Analytics
Converging Database Transactions and Analytics Converging Database Transactions and Analytics
Converging Database Transactions and Analytics
 
Introduction to MemSQL
Introduction to MemSQLIntroduction to MemSQL
Introduction to MemSQL
 
Building a Fault Tolerant Distributed Architecture
Building a Fault Tolerant Distributed ArchitectureBuilding a Fault Tolerant Distributed Architecture
Building a Fault Tolerant Distributed Architecture
 
Stream Processing with Pipelines and Stored Procedures
Stream Processing with Pipelines  and Stored ProceduresStream Processing with Pipelines  and Stored Procedures
Stream Processing with Pipelines and Stored Procedures
 
Curriculum Associates Strata NYC 2017
Curriculum Associates Strata NYC 2017Curriculum Associates Strata NYC 2017
Curriculum Associates Strata NYC 2017
 
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image RecognitionSpark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
 
The State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and BeyondThe State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and Beyond
 
Teaching Databases to Learn in the World of AI
Teaching Databases to Learn in the World of AITeaching Databases to Learn in the World of AI
Teaching Databases to Learn in the World of AI
 
Real-Time Analytics at Uber Scale
Real-Time Analytics at Uber ScaleReal-Time Analytics at Uber Scale
Real-Time Analytics at Uber Scale
 
Machines and the Magic of Fast Learning
Machines and the Magic of Fast LearningMachines and the Magic of Fast Learning
Machines and the Magic of Fast Learning
 
Machines and the Magic of Fast Learning - Strata Keynote
Machines and the Magic of Fast Learning - Strata KeynoteMachines and the Magic of Fast Learning - Strata Keynote
Machines and the Magic of Fast Learning - Strata Keynote
 
Enabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoTEnabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoT
 
Real-Time Analytics with Spark and MemSQL
Real-Time Analytics with Spark and MemSQLReal-Time Analytics with Spark and MemSQL
Real-Time Analytics with Spark and MemSQL
 
Driving the On-Demand Economy with Predictive Analytics
Driving the On-Demand Economy with Predictive AnalyticsDriving the On-Demand Economy with Predictive Analytics
Driving the On-Demand Economy with Predictive Analytics
 
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile AdvertisingTapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
 
The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
The Real-Time CDO and the Cloud-Forward Path to Predictive AnalyticsThe Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
 

Recently uploaded

Communications Mining Series - Zero to Hero - Session 2
Communications Mining Series - Zero to Hero - Session 2Communications Mining Series - Zero to Hero - Session 2
Communications Mining Series - Zero to Hero - Session 2
DianaGray10
 
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
anilsa9823
 
The "Zen" of Python Exemplars - OTel Community Day
The "Zen" of Python Exemplars - OTel Community DayThe "Zen" of Python Exemplars - OTel Community Day
The "Zen" of Python Exemplars - OTel Community Day
Paige Cruz
 
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
SOFTTECHHUB
 
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google CloudRadically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
ScyllaDB
 
How to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
How to Optimize Call Monitoring: Automate QA and Elevate Customer ExperienceHow to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
How to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
Aggregage
 
The Strategy Behind ReversingLabs’ Massive Key-Value Migration
The Strategy Behind ReversingLabs’ Massive Key-Value MigrationThe Strategy Behind ReversingLabs’ Massive Key-Value Migration
The Strategy Behind ReversingLabs’ Massive Key-Value Migration
ScyllaDB
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
zjhamm304
 
Ubuntu Server CLI cheat sheet 2024 v6.pdf
Ubuntu Server CLI cheat sheet 2024 v6.pdfUbuntu Server CLI cheat sheet 2024 v6.pdf
Ubuntu Server CLI cheat sheet 2024 v6.pdf
TechOnDemandSolution
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
UiPathCommunity
 
Getting Started Using the National Research Platform
Getting Started Using the National Research PlatformGetting Started Using the National Research Platform
Getting Started Using the National Research Platform
Larry Smarr
 
Automation Student Developers Session 3: Introduction to UI Automation
Automation Student Developers Session 3: Introduction to UI AutomationAutomation Student Developers Session 3: Introduction to UI Automation
Automation Student Developers Session 3: Introduction to UI Automation
UiPathCommunity
 
Dev Dives: Mining your data with AI-powered Continuous Discovery
Dev Dives: Mining your data with AI-powered Continuous DiscoveryDev Dives: Mining your data with AI-powered Continuous Discovery
Dev Dives: Mining your data with AI-powered Continuous Discovery
UiPathCommunity
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
Ortus Solutions, Corp
 
Brightwell ILC Futures workshop David Sinclair presentation
Brightwell ILC Futures workshop David Sinclair presentationBrightwell ILC Futures workshop David Sinclair presentation
Brightwell ILC Futures workshop David Sinclair presentation
ILC- UK
 
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time MLMongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
ScyllaDB
 
Corporate Open Source Anti-Patterns: A Decade Later
Corporate Open Source Anti-Patterns: A Decade LaterCorporate Open Source Anti-Patterns: A Decade Later
Corporate Open Source Anti-Patterns: A Decade Later
ScyllaDB
 
Multivendor cloud production with VSF TR-11 - there and back again
Multivendor cloud production with VSF TR-11 - there and back againMultivendor cloud production with VSF TR-11 - there and back again
Multivendor cloud production with VSF TR-11 - there and back again
Kieran Kunhya
 
DynamoDB to ScyllaDB: Technical Comparison and the Path to Success
DynamoDB to ScyllaDB: Technical Comparison and the Path to SuccessDynamoDB to ScyllaDB: Technical Comparison and the Path to Success
DynamoDB to ScyllaDB: Technical Comparison and the Path to Success
ScyllaDB
 
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My IdentityCNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
Cynthia Thomas
 

Recently uploaded (20)

Communications Mining Series - Zero to Hero - Session 2
Communications Mining Series - Zero to Hero - Session 2Communications Mining Series - Zero to Hero - Session 2
Communications Mining Series - Zero to Hero - Session 2
 
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
 
The "Zen" of Python Exemplars - OTel Community Day
The "Zen" of Python Exemplars - OTel Community DayThe "Zen" of Python Exemplars - OTel Community Day
The "Zen" of Python Exemplars - OTel Community Day
 
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
 
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google CloudRadically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
 
How to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
How to Optimize Call Monitoring: Automate QA and Elevate Customer ExperienceHow to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
How to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
 
The Strategy Behind ReversingLabs’ Massive Key-Value Migration
The Strategy Behind ReversingLabs’ Massive Key-Value MigrationThe Strategy Behind ReversingLabs’ Massive Key-Value Migration
The Strategy Behind ReversingLabs’ Massive Key-Value Migration
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
 
Ubuntu Server CLI cheat sheet 2024 v6.pdf
Ubuntu Server CLI cheat sheet 2024 v6.pdfUbuntu Server CLI cheat sheet 2024 v6.pdf
Ubuntu Server CLI cheat sheet 2024 v6.pdf
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
 
Getting Started Using the National Research Platform
Getting Started Using the National Research PlatformGetting Started Using the National Research Platform
Getting Started Using the National Research Platform
 
Automation Student Developers Session 3: Introduction to UI Automation
Automation Student Developers Session 3: Introduction to UI AutomationAutomation Student Developers Session 3: Introduction to UI Automation
Automation Student Developers Session 3: Introduction to UI Automation
 
Dev Dives: Mining your data with AI-powered Continuous Discovery
Dev Dives: Mining your data with AI-powered Continuous DiscoveryDev Dives: Mining your data with AI-powered Continuous Discovery
Dev Dives: Mining your data with AI-powered Continuous Discovery
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
 
Brightwell ILC Futures workshop David Sinclair presentation
Brightwell ILC Futures workshop David Sinclair presentationBrightwell ILC Futures workshop David Sinclair presentation
Brightwell ILC Futures workshop David Sinclair presentation
 
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time MLMongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
 
Corporate Open Source Anti-Patterns: A Decade Later
Corporate Open Source Anti-Patterns: A Decade LaterCorporate Open Source Anti-Patterns: A Decade Later
Corporate Open Source Anti-Patterns: A Decade Later
 
Multivendor cloud production with VSF TR-11 - there and back again
Multivendor cloud production with VSF TR-11 - there and back againMultivendor cloud production with VSF TR-11 - there and back again
Multivendor cloud production with VSF TR-11 - there and back again
 
DynamoDB to ScyllaDB: Technical Comparison and the Path to Success
DynamoDB to ScyllaDB: Technical Comparison and the Path to SuccessDynamoDB to ScyllaDB: Technical Comparison and the Path to Success
DynamoDB to ScyllaDB: Technical Comparison and the Path to Success
 
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My IdentityCNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
 

Image Recognition on Streaming Data

Editor's Notes

  1. Google / Facebook leveraging ML to detect violating content. Expensify reading receipts Identifying what objects are in social media posts. Detecting divergence of maps for use in the intelligence communities.
  2. You may remember this slide.
  3. You can detect who is at your front door You can detect what animal your phone is pointed at You can point your phone at a building and learn attributes about it All of this is possible with MemSQL. Once you have the feature vectors stored in your database you can process and identify those which are the closest to your selected image.
  4. Facial recognition is a subject of ongoing research to efficiently extract feature vectors from images using deep learning. For the purpose of this talk, we will assume that this is a somewhat solved problem and we can efficiently extract feature vectors from any incoming image. Once those feature vectors are produced, all you need to do is insert them into a MemSQL table with the following simple schema. Once you have the model produced, you need the tools to process this data at scale. I’m not going to go into how this is done exactly, as there are tons of resources online. I’m going to talk about what happens once this lands in the database.
  5. There are two frequently used approaches to measuring the similarity between vectors: cosine similarity (cosine of the angle between the vectors) and Euclidean distance. Cosine similarity is defined as the dot product of the vectors, divided by the product of the vector norms (length of the vectors). If the vectors are normalized, the cosine similarity is simply the dot product of the vectors (since the product of the norms is 1). In this scenario, we choose the approach of normalizing each feature vector by dividing each element in the vector by the length of the vector, such that the scalar length is one. CALL OUT THAT THIS IS A FULL TABLE SCAN
  6. Dot Product is an algebraic operation that takes two equal-length sequences of numbers (usually coordinate vectors) and returns a single number. In Euclidean geometry, the dot product of the Cartesian coordinates of two vectors is widely used and often called inner product (or rarely projection product); see also inner product space. Algebraically, the dot product is the sum of the products of the corresponding entries of the two sequences of numbers. Angles between non-unit vectors (vectors with lengths not equal to 1.0) can be calculated either by first normalizing the vectors, or by dividing the dot product of the non-unit vectors by the length of each vector. Taking the dot product of a vectoragainst itself (i.e. The similarity is higher if the dot product of the two vectors is close to one. On the previous slide, we choose a constant of 0.9 for highly similar.
  7. People usually try to process this type of information using GPUs, but in this particular use case, the bottleneck is actually the bandwidth of memory. Memory bandwidth is actually roughly 48GB / sec but I’m going to give it the benefit of the doubt we’ll round up to 50 GB/s.
  8. How can MemSQL run this faster than memory bandwidth? The answer is compression of columnstore tables. Because the random vectors were normalized, they were able to be compressed from 50GB down to a size that can be read from memory in less than 0.25 seconds. Because you can perform image recognition at in-memory speed, your bottleneck for similarity computation is not necessarily compute. We realize that there are other algorithms that gain efficiency by avoiding the full table scan and only lose a small amount of accuracy. However, you can achieve good practical results with a very straightforward implementation.
  翻译: