尊敬的 微信汇率:1円 ≈ 0.046166 元 支付宝汇率:1円 ≈ 0.046257元 [退出登录]
SlideShare a Scribd company logo
Query Processing
Dr. Shafiq
The merge
 For query (T2 AND T3)
 (Step1): Maintain pointers of posting lists
and walk through the posting lists
simultaneously
 (Step2): At each step compare the DocID of
both pointers 2
34
128
2 4 8 16 32 64
1 2 3 5 8 13 21
128
34
2 4 8 16 32 64
1 2 3 5 8 13 21
T2
T3
Sec. 1.3
The merge
 For query (T2 AND T3)
 (Step2b): The DocID of posting list T3 is
small, the pointer of T3 posting list is
advanced
 (Step2a): if both DocIDs are same, then
put them in merge list, and advance both
pointers
3
34
128
2 4 8 16 32 64
1 2 3 5 8 13 21
128
34
2 4 8 16 32 64
1 2 3 5 8 13 21
T2
T3
Sec. 1.3
2
The merge
 For query (T2 AND T3)
 What will be the next step?
4
34
128
2 4 8 16 32 64
1 2 3 5 8 13 21
128
34
2 4 8 16 32 64
1 2 3 5 8 13 21
T2
T3
2
The merge
 Walk through the two postings
simultaneously, in time linear in the total
number of postings entries
 For query (T2 AND T3)
5
34
128
2 4 8 16 32 64
1 2 3 5 8 13 21
128
34
2 4 8 16 32 64
1 2 3 5 8 13 21
T2
T3
2 8
Merging Posting Lists
 If posting list lengths are x and y, merge
takes O(x+y) operations
 the complexity of querying is O(n)
 What is n here?
 Crucial: postings sorted by docID.
Query optimization
 What is the best order for query
processing?
 Consider a query that is an AND of n
terms.
 For each of the n terms, get its postings,
then AND them together.
T2
T3
T4
1 2 3 5 8 16 21 34
2 4 8 16 32 64 128
13 16
Query: T3 AND T2 AND T4 8
Query optimization example
 Process in order of increasing freq:
 start with smallest set, then keep cutting
further.
9
Execute the query as (T4 AND T2) AND T3.
T2
T3
T4
1 2 3 5 8 16 21 34
2 4 8 16 32 64 128
13 16 21
More general optimization
10
Exercise
 Recommend a query
processing order for
Term Freq
eyes 213312
kaleidoscope 87009
marmalade 107913
skies 271658
tangerine 46653
trees 316812
11
(tangerine OR trees) AND
(marmalade OR skies) AND
(kaleidoscope OR eyes)
FASTER POSTINGS
MERGES:
SKIP POINTERS/SKIP
LISTS
12
Recall basic merge
 Walk through the two postings
simultaneously, in time linear in the total
number of postings entries
128
31
2 4 8 41 48 64
1 2 3 8 11 17 21
T2
T3
2 8
If the list lengths are x and y, the merge takes O(x+y)
operations.
Can we do better?
Yes (if index isn’t changing too fast). 13
Augment postings with skip pointers
(at indexing time)
 Why?
 To skip postings that will not figure in the
search results.
 How?
 Where do we place skip pointers?
128
2 4 8 41 48 64
31
1 2 3 8 11 17 21
31
11
41 128
14
Query processing with skip pointers
128
2 4 8 41 48 64
31
1 2 3 8 11 17 21
31
11
41 128
Suppose we’ve stepped through the lists until we process 8
on each list. We match it and advance.
We then have 41 and 11 on the lower. 11 is smaller.
15
Where do we place skips?
 Tradeoff:
 More skips  shorter skip spans  more
likely to skip. But lots of comparisons to skip
pointers.
 Fewer skips  few pointer comparison, but
then long skip spans  few successful skips.
16
A simple heuristic for placing skips, which
has been found to work well in practice,
is that for a postings list of length p, use
√p evenly-spaced skip pointers.
Where do we place skips?
Quiz
 We have a two-word query.
 t1=[4,6,10,12,14,16,18,20,22,32,47,81,120,1
22,157,180]
 t2= [47].
 Work out how many comparisons would be done to
intersect the two postings lists with the following two
strategies. Briefly justify your answers:
 Using standard postings lists
 Using postings lists stored with skip pointers, with a skip length
18
Solution
 a) The number of comparisons would be
11 as shown in the following.
 (4,47), (6,47), (10,47), (12,47), (14,47), (16,47), (18,47), (20,47),
(22,47), (32,47),(47,47)
 b) Since the skip length of 16 = 4, the postings lists with
skip pointers would be 4 → 14, 14 → 22, 22 → 120, 120
→ 180. the number of comparisons would be 6;
 (4,47), (14,47), (22,47), (120,47), (32,47),(47,47).
19

More Related Content

What's hot

The impact of web on ir
The impact of web on irThe impact of web on ir
The impact of web on ir
Primya Tamil
 
Information retrieval-systems notes
Information retrieval-systems notesInformation retrieval-systems notes
Information retrieval-systems notes
BAIRAVI T
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
Nanthini Dominique
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
nimmyjans4
 
Automatic indexing
Automatic indexingAutomatic indexing
Automatic indexing
dhatchayaninandu
 
IoT, M2M and IoT System Management
IoT, M2M and IoT System ManagementIoT, M2M and IoT System Management
IoT, M2M and IoT System Management
Vikram Nandini
 
Signature files
Signature filesSignature files
Signature files
Deepali Raikar
 
Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systems
Selman Bozkır
 
Boolean Retrieval
Boolean RetrievalBoolean Retrieval
Boolean Retrieval
mghgk
 
Informatio retrival evaluation
Informatio retrival evaluationInformatio retrival evaluation
Informatio retrival evaluation
NidhirBiswas
 
Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)
9866825059
 
6&7-Query Languages & Operations.ppt
6&7-Query Languages & Operations.ppt6&7-Query Languages & Operations.ppt
6&7-Query Languages & Operations.ppt
BereketAraya
 
Understanding isolation levels
Understanding isolation levelsUnderstanding isolation levels
Understanding isolation levels
Hieu Nguyen Trung
 
Information retrieval 7 boolean model
Information retrieval 7 boolean modelInformation retrieval 7 boolean model
Information retrieval 7 boolean model
Vaibhav Khanna
 
Information retrieval 8 term weighting
Information retrieval 8 term weightingInformation retrieval 8 term weighting
Information retrieval 8 term weighting
Vaibhav Khanna
 
Chap 1 general introduction of information retrieval
Chap 1  general introduction of information retrievalChap 1  general introduction of information retrieval
Chap 1 general introduction of information retrieval
Malobe Lottin Cyrille Marcel
 
key word indexing and their types with example
key word indexing and their types with example key word indexing and their types with example
key word indexing and their types with example
Sourav Sarkar
 
Natural Language Processing: L02 words
Natural Language Processing: L02 wordsNatural Language Processing: L02 words
Natural Language Processing: L02 words
ananth
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval s
silambu111
 
Distributed transaction
Distributed transactionDistributed transaction
Distributed transaction
MohitKothari26
 

What's hot (20)

The impact of web on ir
The impact of web on irThe impact of web on ir
The impact of web on ir
 
Information retrieval-systems notes
Information retrieval-systems notesInformation retrieval-systems notes
Information retrieval-systems notes
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
 
Automatic indexing
Automatic indexingAutomatic indexing
Automatic indexing
 
IoT, M2M and IoT System Management
IoT, M2M and IoT System ManagementIoT, M2M and IoT System Management
IoT, M2M and IoT System Management
 
Signature files
Signature filesSignature files
Signature files
 
Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systems
 
Boolean Retrieval
Boolean RetrievalBoolean Retrieval
Boolean Retrieval
 
Informatio retrival evaluation
Informatio retrival evaluationInformatio retrival evaluation
Informatio retrival evaluation
 
Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)
 
6&7-Query Languages & Operations.ppt
6&7-Query Languages & Operations.ppt6&7-Query Languages & Operations.ppt
6&7-Query Languages & Operations.ppt
 
Understanding isolation levels
Understanding isolation levelsUnderstanding isolation levels
Understanding isolation levels
 
Information retrieval 7 boolean model
Information retrieval 7 boolean modelInformation retrieval 7 boolean model
Information retrieval 7 boolean model
 
Information retrieval 8 term weighting
Information retrieval 8 term weightingInformation retrieval 8 term weighting
Information retrieval 8 term weighting
 
Chap 1 general introduction of information retrieval
Chap 1  general introduction of information retrievalChap 1  general introduction of information retrieval
Chap 1 general introduction of information retrieval
 
key word indexing and their types with example
key word indexing and their types with example key word indexing and their types with example
key word indexing and their types with example
 
Natural Language Processing: L02 words
Natural Language Processing: L02 wordsNatural Language Processing: L02 words
Natural Language Processing: L02 words
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval s
 
Distributed transaction
Distributed transactionDistributed transaction
Distributed transaction
 

Similar to Query Processing in IR

sorting_part1.ppt
sorting_part1.pptsorting_part1.ppt
sorting_part1.ppt
ReehaamMalikArain
 
l1.ppt
l1.pptl1.ppt
l1.ppt
ImXaib
 
l1.ppt
l1.pptl1.ppt
l1.ppt
ssuser1a62e1
 
Welcome to Introduction to Algorithms, Spring 2004
Welcome to Introduction to Algorithms, Spring 2004Welcome to Introduction to Algorithms, Spring 2004
Welcome to Introduction to Algorithms, Spring 2004
jeronimored
 
Introduction of Algorithm.pdf
Introduction of Algorithm.pdfIntroduction of Algorithm.pdf
Introduction of Algorithm.pdf
LaxmiMobile1
 
Algorithim lec1.pptx
Algorithim lec1.pptxAlgorithim lec1.pptx
Algorithim lec1.pptx
rediet43
 
l1.ppt
l1.pptl1.ppt
Is sort andy-le
Is sort andy-leIs sort andy-le
Is sort andy-le
Sumedha
 
Pipelining cache
Pipelining cachePipelining cache
Pipelining cache
Ranaghat College
 
Pipelining Cache
Pipelining CachePipelining Cache
Pipelining Cache
Ranaghat College
 
Sorting Techniques
Sorting TechniquesSorting Techniques
Sorting Techniques
Rafay Farooq
 
Data Encryption standard in cryptography
Data Encryption standard in cryptographyData Encryption standard in cryptography
Data Encryption standard in cryptography
NithyasriA2
 
sorting
sortingsorting
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Nikit
NikitNikit
Mergesort
MergesortMergesort
Mergesort
SimoniShah6
 
Merge sort analysis and its real time applications
Merge sort analysis and its real time applicationsMerge sort analysis and its real time applications
Merge sort analysis and its real time applications
yazad dumasia
 
pradeepbishtLecture13 div conq
pradeepbishtLecture13 div conqpradeepbishtLecture13 div conq
pradeepbishtLecture13 div conq
Pradeep Bisht
 
Sortsearch
SortsearchSortsearch
Cryptographic algorithms
Cryptographic algorithmsCryptographic algorithms
Cryptographic algorithms
Anamika Singh
 

Similar to Query Processing in IR (20)

sorting_part1.ppt
sorting_part1.pptsorting_part1.ppt
sorting_part1.ppt
 
l1.ppt
l1.pptl1.ppt
l1.ppt
 
l1.ppt
l1.pptl1.ppt
l1.ppt
 
Welcome to Introduction to Algorithms, Spring 2004
Welcome to Introduction to Algorithms, Spring 2004Welcome to Introduction to Algorithms, Spring 2004
Welcome to Introduction to Algorithms, Spring 2004
 
Introduction of Algorithm.pdf
Introduction of Algorithm.pdfIntroduction of Algorithm.pdf
Introduction of Algorithm.pdf
 
Algorithim lec1.pptx
Algorithim lec1.pptxAlgorithim lec1.pptx
Algorithim lec1.pptx
 
l1.ppt
l1.pptl1.ppt
l1.ppt
 
Is sort andy-le
Is sort andy-leIs sort andy-le
Is sort andy-le
 
Pipelining cache
Pipelining cachePipelining cache
Pipelining cache
 
Pipelining Cache
Pipelining CachePipelining Cache
Pipelining Cache
 
Sorting Techniques
Sorting TechniquesSorting Techniques
Sorting Techniques
 
Data Encryption standard in cryptography
Data Encryption standard in cryptographyData Encryption standard in cryptography
Data Encryption standard in cryptography
 
sorting
sortingsorting
sorting
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Nikit
NikitNikit
Nikit
 
Mergesort
MergesortMergesort
Mergesort
 
Merge sort analysis and its real time applications
Merge sort analysis and its real time applicationsMerge sort analysis and its real time applications
Merge sort analysis and its real time applications
 
pradeepbishtLecture13 div conq
pradeepbishtLecture13 div conqpradeepbishtLecture13 div conq
pradeepbishtLecture13 div conq
 
Sortsearch
SortsearchSortsearch
Sortsearch
 
Cryptographic algorithms
Cryptographic algorithmsCryptographic algorithms
Cryptographic algorithms
 

Recently uploaded

LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
New ThousandEyes Product Features and Release Highlights: June 2024
New ThousandEyes Product Features and Release Highlights: June 2024New ThousandEyes Product Features and Release Highlights: June 2024
New ThousandEyes Product Features and Release Highlights: June 2024
ThousandEyes
 
Elasticity vs. State? Exploring Kafka Streams Cassandra State Store
Elasticity vs. State? Exploring Kafka Streams Cassandra State StoreElasticity vs. State? Exploring Kafka Streams Cassandra State Store
Elasticity vs. State? Exploring Kafka Streams Cassandra State Store
ScyllaDB
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My IdentityCNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
Cynthia Thomas
 
Day 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data ManipulationDay 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data Manipulation
UiPathCommunity
 
Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0
Neeraj Kumar Singh
 
An All-Around Benchmark of the DBaaS Market
An All-Around Benchmark of the DBaaS MarketAn All-Around Benchmark of the DBaaS Market
An All-Around Benchmark of the DBaaS Market
ScyllaDB
 
ScyllaDB Real-Time Event Processing with CDC
ScyllaDB Real-Time Event Processing with CDCScyllaDB Real-Time Event Processing with CDC
ScyllaDB Real-Time Event Processing with CDC
ScyllaDB
 
APJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes WebinarAPJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes Webinar
ThousandEyes
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
zjhamm304
 
MongoDB to ScyllaDB: Technical Comparison and the Path to Success
MongoDB to ScyllaDB: Technical Comparison and the Path to SuccessMongoDB to ScyllaDB: Technical Comparison and the Path to Success
MongoDB to ScyllaDB: Technical Comparison and the Path to Success
ScyllaDB
 
intra-mart Accel series 2024 Spring updates_En
intra-mart Accel series 2024 Spring updates_Enintra-mart Accel series 2024 Spring updates_En
intra-mart Accel series 2024 Spring updates_En
NTTDATA INTRAMART
 
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc
 
So You've Lost Quorum: Lessons From Accidental Downtime
So You've Lost Quorum: Lessons From Accidental DowntimeSo You've Lost Quorum: Lessons From Accidental Downtime
So You've Lost Quorum: Lessons From Accidental Downtime
ScyllaDB
 
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
dipikamodels1
 
Introduction to ThousandEyes AMER Webinar
Introduction  to ThousandEyes AMER WebinarIntroduction  to ThousandEyes AMER Webinar
Introduction to ThousandEyes AMER Webinar
ThousandEyes
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
Ortus Solutions, Corp
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
UiPathCommunity
 
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time MLMongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
ScyllaDB
 

Recently uploaded (20)

LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
New ThousandEyes Product Features and Release Highlights: June 2024
New ThousandEyes Product Features and Release Highlights: June 2024New ThousandEyes Product Features and Release Highlights: June 2024
New ThousandEyes Product Features and Release Highlights: June 2024
 
Elasticity vs. State? Exploring Kafka Streams Cassandra State Store
Elasticity vs. State? Exploring Kafka Streams Cassandra State StoreElasticity vs. State? Exploring Kafka Streams Cassandra State Store
Elasticity vs. State? Exploring Kafka Streams Cassandra State Store
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My IdentityCNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
 
Day 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data ManipulationDay 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data Manipulation
 
Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0
 
An All-Around Benchmark of the DBaaS Market
An All-Around Benchmark of the DBaaS MarketAn All-Around Benchmark of the DBaaS Market
An All-Around Benchmark of the DBaaS Market
 
ScyllaDB Real-Time Event Processing with CDC
ScyllaDB Real-Time Event Processing with CDCScyllaDB Real-Time Event Processing with CDC
ScyllaDB Real-Time Event Processing with CDC
 
APJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes WebinarAPJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes Webinar
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
 
MongoDB to ScyllaDB: Technical Comparison and the Path to Success
MongoDB to ScyllaDB: Technical Comparison and the Path to SuccessMongoDB to ScyllaDB: Technical Comparison and the Path to Success
MongoDB to ScyllaDB: Technical Comparison and the Path to Success
 
intra-mart Accel series 2024 Spring updates_En
intra-mart Accel series 2024 Spring updates_Enintra-mart Accel series 2024 Spring updates_En
intra-mart Accel series 2024 Spring updates_En
 
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
 
So You've Lost Quorum: Lessons From Accidental Downtime
So You've Lost Quorum: Lessons From Accidental DowntimeSo You've Lost Quorum: Lessons From Accidental Downtime
So You've Lost Quorum: Lessons From Accidental Downtime
 
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
 
Introduction to ThousandEyes AMER Webinar
Introduction  to ThousandEyes AMER WebinarIntroduction  to ThousandEyes AMER Webinar
Introduction to ThousandEyes AMER Webinar
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
 
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time MLMongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
 

Query Processing in IR

  • 2. The merge  For query (T2 AND T3)  (Step1): Maintain pointers of posting lists and walk through the posting lists simultaneously  (Step2): At each step compare the DocID of both pointers 2 34 128 2 4 8 16 32 64 1 2 3 5 8 13 21 128 34 2 4 8 16 32 64 1 2 3 5 8 13 21 T2 T3 Sec. 1.3
  • 3. The merge  For query (T2 AND T3)  (Step2b): The DocID of posting list T3 is small, the pointer of T3 posting list is advanced  (Step2a): if both DocIDs are same, then put them in merge list, and advance both pointers 3 34 128 2 4 8 16 32 64 1 2 3 5 8 13 21 128 34 2 4 8 16 32 64 1 2 3 5 8 13 21 T2 T3 Sec. 1.3 2
  • 4. The merge  For query (T2 AND T3)  What will be the next step? 4 34 128 2 4 8 16 32 64 1 2 3 5 8 13 21 128 34 2 4 8 16 32 64 1 2 3 5 8 13 21 T2 T3 2
  • 5. The merge  Walk through the two postings simultaneously, in time linear in the total number of postings entries  For query (T2 AND T3) 5 34 128 2 4 8 16 32 64 1 2 3 5 8 13 21 128 34 2 4 8 16 32 64 1 2 3 5 8 13 21 T2 T3 2 8
  • 6.
  • 7. Merging Posting Lists  If posting list lengths are x and y, merge takes O(x+y) operations  the complexity of querying is O(n)  What is n here?  Crucial: postings sorted by docID.
  • 8. Query optimization  What is the best order for query processing?  Consider a query that is an AND of n terms.  For each of the n terms, get its postings, then AND them together. T2 T3 T4 1 2 3 5 8 16 21 34 2 4 8 16 32 64 128 13 16 Query: T3 AND T2 AND T4 8
  • 9. Query optimization example  Process in order of increasing freq:  start with smallest set, then keep cutting further. 9 Execute the query as (T4 AND T2) AND T3. T2 T3 T4 1 2 3 5 8 16 21 34 2 4 8 16 32 64 128 13 16 21
  • 11. Exercise  Recommend a query processing order for Term Freq eyes 213312 kaleidoscope 87009 marmalade 107913 skies 271658 tangerine 46653 trees 316812 11 (tangerine OR trees) AND (marmalade OR skies) AND (kaleidoscope OR eyes)
  • 13. Recall basic merge  Walk through the two postings simultaneously, in time linear in the total number of postings entries 128 31 2 4 8 41 48 64 1 2 3 8 11 17 21 T2 T3 2 8 If the list lengths are x and y, the merge takes O(x+y) operations. Can we do better? Yes (if index isn’t changing too fast). 13
  • 14. Augment postings with skip pointers (at indexing time)  Why?  To skip postings that will not figure in the search results.  How?  Where do we place skip pointers? 128 2 4 8 41 48 64 31 1 2 3 8 11 17 21 31 11 41 128 14
  • 15. Query processing with skip pointers 128 2 4 8 41 48 64 31 1 2 3 8 11 17 21 31 11 41 128 Suppose we’ve stepped through the lists until we process 8 on each list. We match it and advance. We then have 41 and 11 on the lower. 11 is smaller. 15
  • 16. Where do we place skips?  Tradeoff:  More skips  shorter skip spans  more likely to skip. But lots of comparisons to skip pointers.  Fewer skips  few pointer comparison, but then long skip spans  few successful skips. 16
  • 17. A simple heuristic for placing skips, which has been found to work well in practice, is that for a postings list of length p, use √p evenly-spaced skip pointers. Where do we place skips?
  • 18. Quiz  We have a two-word query.  t1=[4,6,10,12,14,16,18,20,22,32,47,81,120,1 22,157,180]  t2= [47].  Work out how many comparisons would be done to intersect the two postings lists with the following two strategies. Briefly justify your answers:  Using standard postings lists  Using postings lists stored with skip pointers, with a skip length 18
  • 19. Solution  a) The number of comparisons would be 11 as shown in the following.  (4,47), (6,47), (10,47), (12,47), (14,47), (16,47), (18,47), (20,47), (22,47), (32,47),(47,47)  b) Since the skip length of 16 = 4, the postings lists with skip pointers would be 4 → 14, 14 → 22, 22 → 120, 120 → 180. the number of comparisons would be 6;  (4,47), (14,47), (22,47), (120,47), (32,47),(47,47). 19
  翻译: