å°Šę•¬ēš„ å¾®äæ”걇ēŽ‡ļ¼š1円 ā‰ˆ 0.046239 元 ę”Æä»˜å®ę±‡ēŽ‡ļ¼š1円 ā‰ˆ 0.04633元 [退å‡ŗē™»å½•]
SlideShare a Scribd company logo
Short and Long-Tail RDF Analytics for
Massive Webs of Data
Marcin Wylot, JigĆ© Pont, Mariusz Wiśniewski,
and Philippe CudrƩ-Mauroux
eXascale Infolab, University of Fribourg
Switzerland
International Semantic Web Conference
26th October 2011, Bonn, Germany
Motivation

ā— increasingly large semantic/LoD data sets
ā— increasingly complex queries
ā—‹ real time analytic queries
ā–  like ā€œreturning professor who supervises the most studentsā€

urgent need for more efficient and scalable
solution for RDF data management
3 recipes to speed-up
3 recipes to speed-up

ā—‹collocation
3 recipes to speed-up

ā—‹collocation
ā—‹collocation
3 recipes to speed-up

ā—‹collocation
ā—‹collocation
ā—‹collocation
Why collocation??
Because by collocating data together we
can reduce IO operations, which are
one of the biggest bottlenecks in
database systems.
Outline
ā— architecture
ā— main idea
ā— data structures
ā— basic operations (inserts, queries)
ā— evaluation & results
ā— future work
System Architecture
Main Idea - Hybrid Storage
Main Idea - data structures
Declarative Templates
Template Matching
Molecule Clusters
ā— extremely compact sub-graphs
ā— precomputed joins
List of Literals
ā— extremely compact list of sorted values
Hash Table
lexicographic tree
to encode URIs

template based
indexing

extremely compact lists of
homologous nodes
Basic operations - inserts
n-pass algorithm
Basic operations - queries - triple patterns
?x type Student.
?x takesCourse Course0.

?x type Student.
?x takesCourse Course0.
?x takesCourse Course1.

=> intersection of sorted lists
Basic operations - queries - molecule queries

?a name 'Student1'.
?a ?b ?c.
?c ?d ?e.
Basic operations - queries
aggregates and analytics
?x type Student.
?x age ?y
filter (?y < 21)
Performance Evaluation
We used the Lehigh University Benchmark.
We generated two datasets, for 10 and 100 Universities.
ā— 1 272 814 distinct triples and 315 003 distinct strings
ā— 13 876 209 distinct triples and 3 301 868 distinct strings

We compared the runtime execution for 14 LUBM queries
and 3 analytic queries inspired from BowlognaBench.
ā— returning professor who supervises the most students
ā— returning big molecule containing everything around
Student0 within scope 2
ā— returning names for all graduate students
Results - LUBM - 10 Universities
Results - LUBM - 100 Universities
Results - analytic 10 Universities
Results - analytic 100 Universities
Future work
ā— open source
ā—‹ cleaning code
ā—‹ extending code
ā— parallelising operations
ā—‹ multi-core architecture
ā—‹ cloud
ā— automated database design
Conclusions
ā— advanced data collocation
ā—‹ molecules, RDF sub-graphs
ā—‹ lists of literals, compact sorted list of values
ā—‹ hash table indexed by templates
ā— slower inserts and updates
ā—‹ compact ordered structures
ā—‹ data redundancy
ā— 30 times faster on LUBM queries
ā— 350 times faster on analytic queries
Thank you for
your attention
Update Manager - lazy updates
Transitivity

ā— Inheritance Manager
ā—‹ typeX subClassOf

ā— Query
ā—‹ ?z type typeY
ā–  ?z type typeY
ā–  ?z type typeX

ā— subClassOf
ā— subPropertyOf

typeY
Serialising Molecules

#TEMPLATES * TEMPLATE_SIZE + #TRIPLES * KEY_SIZE
#TEMPLATES - the number of templates in the molecule
TEMPLATE_SIZE - the size of a key in bytes
#TRIPLES - the number of triples in the molecule
KEY_SIZE - the size of a key in bytes, for example 8 in our case (Intel 64, Linux)

More Related Content

What's hot

Positional Data Organization and Compression in Web Inverted Indexes
Positional Data Organization and Compression in Web Inverted IndexesPositional Data Organization and Compression in Web Inverted Indexes
Positional Data Organization and Compression in Web Inverted Indexes
Leonidas Akritidis
Ā 
Normalizing Data for Migrations
Normalizing Data for MigrationsNormalizing Data for Migrations
Normalizing Data for Migrations
Kyle Banerjee
Ā 
Data structure
Data  structureData  structure
Data structure
priyanka belekar
Ā 
Introduction to mongo db
Introduction to mongo dbIntroduction to mongo db
Introduction to mongo db
Hemant Sharma
Ā 
Effective and Efficient Entity Search in RDF data
Effective and Efficient Entity Search in RDF dataEffective and Efficient Entity Search in RDF data
Effective and Efficient Entity Search in RDF data
Roi Blanco
Ā 
Analytical data processing
Analytical data processingAnalytical data processing
Analytical data processing
Polad Saruxanov
Ā 
Web Scraping using Python | Web Screen Scraping
Web Scraping using Python | Web Screen ScrapingWeb Scraping using Python | Web Screen Scraping
Web Scraping using Python | Web Screen Scraping
CynthiaCruz55
Ā 
Over view of data structures
Over view of data structuresOver view of data structures
Over view of data structures
NagajothiN1
Ā 
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.orgEC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
Jindřich Mynarz
Ā 
Intro to web scraping with Python
Intro to web scraping with PythonIntro to web scraping with Python
Intro to web scraping with Python
Maris Lemba
Ā 
Towards Integration of Web Data into a coherent Educational Data Graph
Towards Integration of Web Data into a coherent Educational Data GraphTowards Integration of Web Data into a coherent Educational Data Graph
Towards Integration of Web Data into a coherent Educational Data Graph
Besnik Fetahu
Ā 
Geant4 Model Testing Framework: From PAW to ROOT
Geant4 Model Testing Framework:  From PAW to ROOTGeant4 Model Testing Framework:  From PAW to ROOT
Geant4 Model Testing Framework: From PAW to ROOT
Roman Atachiants
Ā 

What's hot (12)

Positional Data Organization and Compression in Web Inverted Indexes
Positional Data Organization and Compression in Web Inverted IndexesPositional Data Organization and Compression in Web Inverted Indexes
Positional Data Organization and Compression in Web Inverted Indexes
Ā 
Normalizing Data for Migrations
Normalizing Data for MigrationsNormalizing Data for Migrations
Normalizing Data for Migrations
Ā 
Data structure
Data  structureData  structure
Data structure
Ā 
Introduction to mongo db
Introduction to mongo dbIntroduction to mongo db
Introduction to mongo db
Ā 
Effective and Efficient Entity Search in RDF data
Effective and Efficient Entity Search in RDF dataEffective and Efficient Entity Search in RDF data
Effective and Efficient Entity Search in RDF data
Ā 
Analytical data processing
Analytical data processingAnalytical data processing
Analytical data processing
Ā 
Web Scraping using Python | Web Screen Scraping
Web Scraping using Python | Web Screen ScrapingWeb Scraping using Python | Web Screen Scraping
Web Scraping using Python | Web Screen Scraping
Ā 
Over view of data structures
Over view of data structuresOver view of data structures
Over view of data structures
Ā 
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.orgEC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
Ā 
Intro to web scraping with Python
Intro to web scraping with PythonIntro to web scraping with Python
Intro to web scraping with Python
Ā 
Towards Integration of Web Data into a coherent Educational Data Graph
Towards Integration of Web Data into a coherent Educational Data GraphTowards Integration of Web Data into a coherent Educational Data Graph
Towards Integration of Web Data into a coherent Educational Data Graph
Ā 
Geant4 Model Testing Framework: From PAW to ROOT
Geant4 Model Testing Framework:  From PAW to ROOTGeant4 Model Testing Framework:  From PAW to ROOT
Geant4 Model Testing Framework: From PAW to ROOT
Ā 

Similar to dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of Data

Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistryOpen Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
Marcus Hanwell
Ā 
Web Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext SearchWeb Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext Search
Sawood Alam
Ā 
polystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdfpolystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdf
Rim Moussa
Ā 
Instant search - A hands-on tutorial
Instant search  - A hands-on tutorialInstant search  - A hands-on tutorial
Instant search - A hands-on tutorial
Ganesh Venkataraman
Ā 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
Mukesh Singh
Ā 
A Practical Approach to Design, Implementation, and Management A Practical Ap...
A Practical Approach to Design, Implementation, and Management A Practical Ap...A Practical Approach to Design, Implementation, and Management A Practical Ap...
A Practical Approach to Design, Implementation, and Management A Practical Ap...
Cynthia Velynne
Ā 
Research Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories MetadataResearch Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories Metadata
Ricard de la Vega
Ā 
An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEW
Shiyong Lu
Ā 
Open source data_warehousing_overview
Open source data_warehousing_overviewOpen source data_warehousing_overview
Open source data_warehousing_overview
Alex Meadows
Ā 
Henning agt talk-caise-semnet
Henning agt   talk-caise-semnetHenning agt   talk-caise-semnet
Henning agt talk-caise-semnet
caise2013vlc
Ā 
USUGM 2014 - Erin Bolstad (ChemAxon): Consultancy report - New capabilities a...
USUGM 2014 - Erin Bolstad (ChemAxon): Consultancy report - New capabilities a...USUGM 2014 - Erin Bolstad (ChemAxon): Consultancy report - New capabilities a...
USUGM 2014 - Erin Bolstad (ChemAxon): Consultancy report - New capabilities a...
ChemAxon
Ā 
How to get started in Big Data for master's students
How to get started in Big Data for master's studentsHow to get started in Big Data for master's students
How to get started in Big Data for master's students
Mohamed Nadjib MAMI
Ā 
Querying and reasoning over large scale building datasets: an outline of a pe...
Querying and reasoning over large scale building datasets: an outline of a pe...Querying and reasoning over large scale building datasets: an outline of a pe...
Querying and reasoning over large scale building datasets: an outline of a pe...
Ana Roxin
Ā 
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data WarehouseMaking Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
Justin Clark-Casey
Ā 
Db presentation google_megastore
Db presentation google_megastoreDb presentation google_megastore
Db presentation google_megastore
Alanoud Alqoufi
Ā 
Converting Scripts into Reproducible Workflow Research Objects
Converting Scripts into Reproducible Workflow Research ObjectsConverting Scripts into Reproducible Workflow Research Objects
Converting Scripts into Reproducible Workflow Research Objects
Lucas Augusto Carvalho
Ā 
Converting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objectsConverting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objects
Khalid Belhajjame
Ā 
Data Structures & Algorithms
Data Structures & AlgorithmsData Structures & Algorithms
Data Structures & Algorithms
Muhammad Jahanzaib
Ā 
LODFlow: Workflow Management System for Linked Data Processing
LODFlow: Workflow Management System for Linked Data ProcessingLODFlow: Workflow Management System for Linked Data Processing
LODFlow: Workflow Management System for Linked Data Processing
Ivan Ermilov
Ā 
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RSpark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Databricks
Ā 

Similar to dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of Data (20)

Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistryOpen Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
Ā 
Web Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext SearchWeb Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext Search
Ā 
polystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdfpolystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdf
Ā 
Instant search - A hands-on tutorial
Instant search  - A hands-on tutorialInstant search  - A hands-on tutorial
Instant search - A hands-on tutorial
Ā 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ā 
A Practical Approach to Design, Implementation, and Management A Practical Ap...
A Practical Approach to Design, Implementation, and Management A Practical Ap...A Practical Approach to Design, Implementation, and Management A Practical Ap...
A Practical Approach to Design, Implementation, and Management A Practical Ap...
Ā 
Research Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories MetadataResearch Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories Metadata
Ā 
An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEW
Ā 
Open source data_warehousing_overview
Open source data_warehousing_overviewOpen source data_warehousing_overview
Open source data_warehousing_overview
Ā 
Henning agt talk-caise-semnet
Henning agt   talk-caise-semnetHenning agt   talk-caise-semnet
Henning agt talk-caise-semnet
Ā 
USUGM 2014 - Erin Bolstad (ChemAxon): Consultancy report - New capabilities a...
USUGM 2014 - Erin Bolstad (ChemAxon): Consultancy report - New capabilities a...USUGM 2014 - Erin Bolstad (ChemAxon): Consultancy report - New capabilities a...
USUGM 2014 - Erin Bolstad (ChemAxon): Consultancy report - New capabilities a...
Ā 
How to get started in Big Data for master's students
How to get started in Big Data for master's studentsHow to get started in Big Data for master's students
How to get started in Big Data for master's students
Ā 
Querying and reasoning over large scale building datasets: an outline of a pe...
Querying and reasoning over large scale building datasets: an outline of a pe...Querying and reasoning over large scale building datasets: an outline of a pe...
Querying and reasoning over large scale building datasets: an outline of a pe...
Ā 
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data WarehouseMaking Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
Ā 
Db presentation google_megastore
Db presentation google_megastoreDb presentation google_megastore
Db presentation google_megastore
Ā 
Converting Scripts into Reproducible Workflow Research Objects
Converting Scripts into Reproducible Workflow Research ObjectsConverting Scripts into Reproducible Workflow Research Objects
Converting Scripts into Reproducible Workflow Research Objects
Ā 
Converting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objectsConverting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objects
Ā 
Data Structures & Algorithms
Data Structures & AlgorithmsData Structures & Algorithms
Data Structures & Algorithms
Ā 
LODFlow: Workflow Management System for Linked Data Processing
LODFlow: Workflow Management System for Linked Data ProcessingLODFlow: Workflow Management System for Linked Data Processing
LODFlow: Workflow Management System for Linked Data Processing
Ā 
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RSpark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Ā 

More from eXascale Infolab

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
eXascale Infolab
Ā 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
eXascale Infolab
Ā 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex Graphs
eXascale Infolab
Ā 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory map
eXascale Infolab
Ā 
Cikm 2018
Cikm 2018Cikm 2018
Cikm 2018
eXascale Infolab
Ā 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
eXascale Infolab
Ā 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
eXascale Infolab
Ā 
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
eXascale Infolab
Ā 
Crowd scheduling www2016
Crowd scheduling www2016Crowd scheduling www2016
Crowd scheduling www2016
eXascale Infolab
Ā 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference Resolution
eXascale Infolab
Ā 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data
eXascale Infolab
Ā 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data Management
eXascale Infolab
Ā 
SSSW 2015 Sense Making
SSSW 2015 Sense MakingSSSW 2015 Sense Making
SSSW 2015 Sense Making
eXascale Infolab
Ā 
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
eXascale Infolab
Ā 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web Data
eXascale Infolab
Ā 
The Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task Crowdsourcing
eXascale Infolab
Ā 
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
eXascale Infolab
Ā 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition ranking
eXascale Infolab
Ā 
OLTP-Bench
OLTP-BenchOLTP-Bench
OLTP-Bench
eXascale Infolab
Ā 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big Data
eXascale Infolab
Ā 

More from eXascale Infolab (20)

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Ā 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
Ā 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex Graphs
Ā 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory map
Ā 
Cikm 2018
Cikm 2018Cikm 2018
Cikm 2018
Ā 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
Ā 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
Ā 
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Ā 
Crowd scheduling www2016
Crowd scheduling www2016Crowd scheduling www2016
Crowd scheduling www2016
Ā 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference Resolution
Ā 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Ā 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data Management
Ā 
SSSW 2015 Sense Making
SSSW 2015 Sense MakingSSSW 2015 Sense Making
SSSW 2015 Sense Making
Ā 
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
Ā 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web Data
Ā 
The Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task Crowdsourcing
Ā 
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Ā 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition ranking
Ā 
OLTP-Bench
OLTP-BenchOLTP-Bench
OLTP-Bench
Ā 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big Data
Ā 

Recently uploaded

Top Call Girls Lucknow šŸ”„ 9079923931 šŸ”„ Real Fun With Sexual Girl Available 24/...
Top Call Girls Lucknow šŸ”„ 9079923931 šŸ”„ Real Fun With Sexual Girl Available 24/...Top Call Girls Lucknow šŸ”„ 9079923931 šŸ”„ Real Fun With Sexual Girl Available 24/...
Top Call Girls Lucknow šŸ”„ 9079923931 šŸ”„ Real Fun With Sexual Girl Available 24/...
$A19
Ā 
GBSN - Microbiology (Unit 2) Antimicrobial agents
GBSN - Microbiology (Unit 2) Antimicrobial agentsGBSN - Microbiology (Unit 2) Antimicrobial agents
GBSN - Microbiology (Unit 2) Antimicrobial agents
Areesha Ahmad
Ā 
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptxTOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
shubhijain836
Ā 
Signatures of wave erosion in Titanā€™s coasts
Signatures of wave erosion in Titanā€™s coastsSignatures of wave erosion in Titanā€™s coasts
Signatures of wave erosion in Titanā€™s coasts
SĆ©rgio Sacani
Ā 
SAP Unveils Generative AI Innovations at Annual Sapphire Conference
SAP Unveils Generative AI Innovations at Annual Sapphire ConferenceSAP Unveils Generative AI Innovations at Annual Sapphire Conference
SAP Unveils Generative AI Innovations at Annual Sapphire Conference
CGB SOLUTIONS
Ā 
the fundamental unit of life CBSE class 9.pptx
the fundamental unit of life CBSE class 9.pptxthe fundamental unit of life CBSE class 9.pptx
the fundamental unit of life CBSE class 9.pptx
parminder0808singh
Ā 
Explainable Deepfake Image/Video Detection
Explainable Deepfake Image/Video DetectionExplainable Deepfake Image/Video Detection
Explainable Deepfake Image/Video Detection
VasileiosMezaris
Ā 
Noida Call Girls Number 9999965857 Vip Call Girls Lady Of Your Dream Ready To...
Noida Call Girls Number 9999965857 Vip Call Girls Lady Of Your Dream Ready To...Noida Call Girls Number 9999965857 Vip Call Girls Lady Of Your Dream Ready To...
Noida Call Girls Number 9999965857 Vip Call Girls Lady Of Your Dream Ready To...
choudharydenunisha
Ā 
Physics Investigatory Project on transformers. Class 12th
Physics Investigatory Project on transformers. Class 12thPhysics Investigatory Project on transformers. Class 12th
Physics Investigatory Project on transformers. Class 12th
pihuart12
Ā 
The use of probiotics and antibiotics in aquaculture production.pptx
The use of probiotics and antibiotics in aquaculture production.pptxThe use of probiotics and antibiotics in aquaculture production.pptx
The use of probiotics and antibiotics in aquaculture production.pptx
MAGOTI ERNEST
Ā 
Centrifugation types and its application
Centrifugation types and its applicationCentrifugation types and its application
Centrifugation types and its application
MDAsifKilledar
Ā 
Roles and skills of administration-MON.pptx
Roles and skills of administration-MON.pptxRoles and skills of administration-MON.pptx
Roles and skills of administration-MON.pptx
DawThantMonPaing
Ā 
Buy Best T-shirts for Men Online Buy Best T-shirts for Men Online
Buy Best T-shirts for Men Online Buy Best T-shirts for Men OnlineBuy Best T-shirts for Men Online Buy Best T-shirts for Men Online
Buy Best T-shirts for Men Online Buy Best T-shirts for Men Online
ashukhanasf
Ā 
gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
Shekar Boddu
Ā 
Analysis of Polygenic Traits (GPB-602)
Analysis of Polygenic Traits (GPB-602)Analysis of Polygenic Traits (GPB-602)
Analysis of Polygenic Traits (GPB-602)
PABOLU TEJASREE
Ā 
GBSN - Biochemistry (Unit 12) Hormones
GBSN - Biochemistry (Unit 12) HormonesGBSN - Biochemistry (Unit 12) Hormones
GBSN - Biochemistry (Unit 12) Hormones
Areesha Ahmad
Ā 
seed production, Nursery & Gardening.pdf
seed production, Nursery & Gardening.pdfseed production, Nursery & Gardening.pdf
seed production, Nursery & Gardening.pdf
Nistarini College, Purulia (W.B) India
Ā 
(Shilpa) āž¤ Call Girls Lucknow šŸ”„ 9352988975 šŸ”„ Real Fun With Sexual Girl Availa...
(Shilpa) āž¤ Call Girls Lucknow šŸ”„ 9352988975 šŸ”„ Real Fun With Sexual Girl Availa...(Shilpa) āž¤ Call Girls Lucknow šŸ”„ 9352988975 šŸ”„ Real Fun With Sexual Girl Availa...
(Shilpa) āž¤ Call Girls Lucknow šŸ”„ 9352988975 šŸ”„ Real Fun With Sexual Girl Availa...
shourabjaat424
Ā 
23PH301 - Optics - Unit 2 - Interference
23PH301 - Optics - Unit 2 - Interference23PH301 - Optics - Unit 2 - Interference
23PH301 - Optics - Unit 2 - Interference
RDhivya6
Ā 
Module_1.In autotrophic nutrition ORGANISM
Module_1.In autotrophic nutrition ORGANISMModule_1.In autotrophic nutrition ORGANISM
Module_1.In autotrophic nutrition ORGANISM
rajeshwexl
Ā 

Recently uploaded (20)

Top Call Girls Lucknow šŸ”„ 9079923931 šŸ”„ Real Fun With Sexual Girl Available 24/...
Top Call Girls Lucknow šŸ”„ 9079923931 šŸ”„ Real Fun With Sexual Girl Available 24/...Top Call Girls Lucknow šŸ”„ 9079923931 šŸ”„ Real Fun With Sexual Girl Available 24/...
Top Call Girls Lucknow šŸ”„ 9079923931 šŸ”„ Real Fun With Sexual Girl Available 24/...
Ā 
GBSN - Microbiology (Unit 2) Antimicrobial agents
GBSN - Microbiology (Unit 2) Antimicrobial agentsGBSN - Microbiology (Unit 2) Antimicrobial agents
GBSN - Microbiology (Unit 2) Antimicrobial agents
Ā 
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptxTOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
Ā 
Signatures of wave erosion in Titanā€™s coasts
Signatures of wave erosion in Titanā€™s coastsSignatures of wave erosion in Titanā€™s coasts
Signatures of wave erosion in Titanā€™s coasts
Ā 
SAP Unveils Generative AI Innovations at Annual Sapphire Conference
SAP Unveils Generative AI Innovations at Annual Sapphire ConferenceSAP Unveils Generative AI Innovations at Annual Sapphire Conference
SAP Unveils Generative AI Innovations at Annual Sapphire Conference
Ā 
the fundamental unit of life CBSE class 9.pptx
the fundamental unit of life CBSE class 9.pptxthe fundamental unit of life CBSE class 9.pptx
the fundamental unit of life CBSE class 9.pptx
Ā 
Explainable Deepfake Image/Video Detection
Explainable Deepfake Image/Video DetectionExplainable Deepfake Image/Video Detection
Explainable Deepfake Image/Video Detection
Ā 
Noida Call Girls Number 9999965857 Vip Call Girls Lady Of Your Dream Ready To...
Noida Call Girls Number 9999965857 Vip Call Girls Lady Of Your Dream Ready To...Noida Call Girls Number 9999965857 Vip Call Girls Lady Of Your Dream Ready To...
Noida Call Girls Number 9999965857 Vip Call Girls Lady Of Your Dream Ready To...
Ā 
Physics Investigatory Project on transformers. Class 12th
Physics Investigatory Project on transformers. Class 12thPhysics Investigatory Project on transformers. Class 12th
Physics Investigatory Project on transformers. Class 12th
Ā 
The use of probiotics and antibiotics in aquaculture production.pptx
The use of probiotics and antibiotics in aquaculture production.pptxThe use of probiotics and antibiotics in aquaculture production.pptx
The use of probiotics and antibiotics in aquaculture production.pptx
Ā 
Centrifugation types and its application
Centrifugation types and its applicationCentrifugation types and its application
Centrifugation types and its application
Ā 
Roles and skills of administration-MON.pptx
Roles and skills of administration-MON.pptxRoles and skills of administration-MON.pptx
Roles and skills of administration-MON.pptx
Ā 
Buy Best T-shirts for Men Online Buy Best T-shirts for Men Online
Buy Best T-shirts for Men Online Buy Best T-shirts for Men OnlineBuy Best T-shirts for Men Online Buy Best T-shirts for Men Online
Buy Best T-shirts for Men Online Buy Best T-shirts for Men Online
Ā 
gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
Ā 
Analysis of Polygenic Traits (GPB-602)
Analysis of Polygenic Traits (GPB-602)Analysis of Polygenic Traits (GPB-602)
Analysis of Polygenic Traits (GPB-602)
Ā 
GBSN - Biochemistry (Unit 12) Hormones
GBSN - Biochemistry (Unit 12) HormonesGBSN - Biochemistry (Unit 12) Hormones
GBSN - Biochemistry (Unit 12) Hormones
Ā 
seed production, Nursery & Gardening.pdf
seed production, Nursery & Gardening.pdfseed production, Nursery & Gardening.pdf
seed production, Nursery & Gardening.pdf
Ā 
(Shilpa) āž¤ Call Girls Lucknow šŸ”„ 9352988975 šŸ”„ Real Fun With Sexual Girl Availa...
(Shilpa) āž¤ Call Girls Lucknow šŸ”„ 9352988975 šŸ”„ Real Fun With Sexual Girl Availa...(Shilpa) āž¤ Call Girls Lucknow šŸ”„ 9352988975 šŸ”„ Real Fun With Sexual Girl Availa...
(Shilpa) āž¤ Call Girls Lucknow šŸ”„ 9352988975 šŸ”„ Real Fun With Sexual Girl Availa...
Ā 
23PH301 - Optics - Unit 2 - Interference
23PH301 - Optics - Unit 2 - Interference23PH301 - Optics - Unit 2 - Interference
23PH301 - Optics - Unit 2 - Interference
Ā 
Module_1.In autotrophic nutrition ORGANISM
Module_1.In autotrophic nutrition ORGANISMModule_1.In autotrophic nutrition ORGANISM
Module_1.In autotrophic nutrition ORGANISM
Ā 

dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of Data

  • 1. Short and Long-Tail RDF Analytics for Massive Webs of Data Marcin Wylot, JigĆ© Pont, Mariusz Wiśniewski, and Philippe CudrĆ©-Mauroux eXascale Infolab, University of Fribourg Switzerland International Semantic Web Conference 26th October 2011, Bonn, Germany
  • 2. Motivation ā— increasingly large semantic/LoD data sets ā— increasingly complex queries ā—‹ real time analytic queries ā–  like ā€œreturning professor who supervises the most studentsā€ urgent need for more efficient and scalable solution for RDF data management
  • 3. 3 recipes to speed-up
  • 4. 3 recipes to speed-up ā—‹collocation
  • 5. 3 recipes to speed-up ā—‹collocation ā—‹collocation
  • 6. 3 recipes to speed-up ā—‹collocation ā—‹collocation ā—‹collocation
  • 7. Why collocation?? Because by collocating data together we can reduce IO operations, which are one of the biggest bottlenecks in database systems.
  • 8. Outline ā— architecture ā— main idea ā— data structures ā— basic operations (inserts, queries) ā— evaluation & results ā— future work
  • 10. Main Idea - Hybrid Storage
  • 11. Main Idea - data structures
  • 14. Molecule Clusters ā— extremely compact sub-graphs ā— precomputed joins
  • 15. List of Literals ā— extremely compact list of sorted values
  • 16. Hash Table lexicographic tree to encode URIs template based indexing extremely compact lists of homologous nodes
  • 17. Basic operations - inserts n-pass algorithm
  • 18. Basic operations - queries - triple patterns ?x type Student. ?x takesCourse Course0. ?x type Student. ?x takesCourse Course0. ?x takesCourse Course1. => intersection of sorted lists
  • 19. Basic operations - queries - molecule queries ?a name 'Student1'. ?a ?b ?c. ?c ?d ?e.
  • 20. Basic operations - queries aggregates and analytics ?x type Student. ?x age ?y filter (?y < 21)
  • 21. Performance Evaluation We used the Lehigh University Benchmark. We generated two datasets, for 10 and 100 Universities. ā— 1 272 814 distinct triples and 315 003 distinct strings ā— 13 876 209 distinct triples and 3 301 868 distinct strings We compared the runtime execution for 14 LUBM queries and 3 analytic queries inspired from BowlognaBench. ā— returning professor who supervises the most students ā— returning big molecule containing everything around Student0 within scope 2 ā— returning names for all graduate students
  • 22. Results - LUBM - 10 Universities
  • 23. Results - LUBM - 100 Universities
  • 24. Results - analytic 10 Universities
  • 25. Results - analytic 100 Universities
  • 26. Future work ā— open source ā—‹ cleaning code ā—‹ extending code ā— parallelising operations ā—‹ multi-core architecture ā—‹ cloud ā— automated database design
  • 27. Conclusions ā— advanced data collocation ā—‹ molecules, RDF sub-graphs ā—‹ lists of literals, compact sorted list of values ā—‹ hash table indexed by templates ā— slower inserts and updates ā—‹ compact ordered structures ā—‹ data redundancy ā— 30 times faster on LUBM queries ā— 350 times faster on analytic queries
  • 28. Thank you for your attention
  • 29. Update Manager - lazy updates
  • 30. Transitivity ā— Inheritance Manager ā—‹ typeX subClassOf ā— Query ā—‹ ?z type typeY ā–  ?z type typeY ā–  ?z type typeX ā— subClassOf ā— subPropertyOf typeY
  • 31. Serialising Molecules #TEMPLATES * TEMPLATE_SIZE + #TRIPLES * KEY_SIZE #TEMPLATES - the number of templates in the molecule TEMPLATE_SIZE - the size of a key in bytes #TRIPLES - the number of triples in the molecule KEY_SIZE - the size of a key in bytes, for example 8 in our case (Intel 64, Linux)
  ēæ»čƑļ¼š