尊敬的 微信汇率:1円 ≈ 0.046166 元 支付宝汇率:1円 ≈ 0.046257元 [退出登录]
SlideShare a Scribd company logo
Ms. T. Primya
Assistant Professor
Department of Computer Science and Engineering
Dr. N. G. P. Institute of Technology
Coimbatore
 A retrieval model can be a description of either the
computational process or the human process of
retrieval
 the process of choosing documents for retrieval
 the process by which information needs are first
articulated and then refined.
 Boolean Models
 Vector Space Models
 Probabilistic Models
 Models based on Belief nets
 Models based on Language Models
 A document is represented as a set of keywords.
 Index terms are considered to be either present or absent in a
document and to provide equal evidence with respect to information
needs.
 Queries are Boolean expressions of keywords, connected by AND,
OR, and NOT, including the use of brackets to indicate scope.
[[Rio & Brazil] | [Hilo & Hawaii]] & hotel & !Hilton]
 Output: Document is relevant or not. No partial matches or ranking.
 User need: I’m interested in learning about vitamins
other than vitamin e that are anti-oxidants.
 User’s Boolean query: antioxidant AND vitamin
AND NOT vitamin e
 For each retrieval model, there explicit three
components:
 Document representation d
 Query q
 Ranking function R(d, q)
 An IR strategy is a technique by which a relevance
measure is obtained between a query and a document.
 Retrieve documents that make the query true.
 Boolean-Documents either match or don’t.
 Good for expert users with precise understanding of
their needs and of the collection.
 Also good for applications: Applications can easily
consume 1000s of results.
 Not good for the majority of users
 This is particularly true of web search.
 Boolean queries often have either too few or too many results.
Query 1
standard AND user AND dlink AND 650
→ 200,000 hits Feast!
Query 2
standard AND user AND dlink AND 650 AND no AND card AND found
→ 0 hits Famine!
 In Boolean retrieval, it takes a lot of skill to come up with a query that
produces a manageable number of hits.
 In ranked retrieval, “feast or famine” is less of a problem.
 Condition: Results that are more relevant are ranked higher than results that
are less relevant. (i.e., the ranking algorithm works.)
 A commonly used measure of overlap of two sets
 Let A and B be two sets
 Jaccard coefficient:
jaccard(A,B) = |A∩B| |A∪B|
 jaccard(A,A) = 1
 jaccard(A,B) = 0 if A∩B = 0
 A and B don’t have to be the same size. Always
assigns a number between 0 and 1.
What is the query-document match score that the Jaccard
coefficient computes for:
 Query
“ides of March”
 Document
“Caesar died in March”
jaccard(q,d) = 1/6
 It doesn’t consider term frequency (how many
occurrences a term has).
 Rare terms are more informative than frequent terms.
 Jaccard does not consider this information.
Advantages
 Can use very restrictive search
 Makes experienced users happy
 Clear formalism
 Simplicity
 It is still used in small scale searches like searching e-
mails, files from local hard drives
Disadvantages
 Simple queries do not work well.
 Complex query language, confusing to end users
 Difficult to control the number of documents
retrieved.
◦ All matched documents will be returned.
 Difficult to rank output.
◦ All matched documents logically satisfy the query.
 Difficult to perform relevance feedback.
◦ If a document is identified by the user as relevant or
irrelevant, how should the query be modified?
 Vector space model or term vector model is an
algebraic model for representing text documents (and
any objects, in general) as vectors of identifiers, such
as, for example, index terms.
 It is used in information filtering, information
retrieval, indexing and relevancy rankings.
The basis vectors correspond to the dimensions or
directions of the vector space
A vector is a point in a vector space and has length
(from the origin to the point) and direction
 A 2-dimensional vector can be written as [x, y]
 A 3-dimensional vector can be written as [x, y, z]
 Let V denote the size of the indexed vocabulary
 Any arbitrary span of text (i.e., a document, or a
query) can be represented as a vector in V-
dimensional space
 let’s assume three index terms: dog, bite, man (i.e.,
V=3)
1 = the term appears at least once
0 = the term does not appear
A query is a vector in V-dimensional space, where
V is the number of terms in the vocabulary
 The vector space model ranks documents based on
the vector-space similarity between the query vector
and the document vector
 There are many ways to compute the similarity
between two vectors
 One way is to compute the inner product
Multiply corresponding components and then sum
of those products
Pros and Cons
 The inner-product doesn’t account for the fact that
documents have widely varying lengths
 All things being equal, longer documents are more
likely to have the query-terms
 So, the inner-product favours long documents
 Document represented as a vector:
d =< d1; d2; …. dn >
 Query represented as a vector: q =< q1; q2;…. qn >
 Ranking function (retrieval status value):
 The cosine similarity between two vectors (or two
documents on the Vector Space) is a measure that
calculates the cosine of the angle between them.
 the cosine similarity equation is to solve the equation
of the dot product for the :
 The numerator is the inner product
 The denominator is the product of the two vector-
lengths
 Ranges from 0 to 1 (equals 1 if the vectors are
identical)
 a =[1, 2, 3]
 b =[4,-5,6]
a with b is dpab = 1*4 + 2*-5 + 3*6 = 12
a with itself is dpaa = 1*1 + 2*2 + 3*3 = 14
b with itself is dpbb = 4*4 + -5*-5 + 6*6 = 77
la = (dpaa) ½ = (14) ½ = 3.74; i.e., the length of a.
lb = (dpbb) ½ = (77)½ = 8.77; i.e., the length of b.
la*lb = (dpaa) ½ * (dpbb) ½ = 32.83;
i.e., the length product (lpab) of a and b.
dot product/length product ratio is
 The vector space model procedure can be divided
into three stages.
 The first stage is the document indexing where
content bearing terms are extracted from the
document text.
 The second stage is the weighting of the indexed
terms to enhance retrieval of document relevant to the
user.
 The last stage ranks the document with respect to the
query according to a similarity measure.

More Related Content

What's hot

Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systems
Selman Bozkır
 
Term weighting
Term weightingTerm weighting
Term weighting
Primya Tamil
 
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I PPT IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDFCS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I PPT IN PDF
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
Information Retrieval Evaluation
Information Retrieval EvaluationInformation Retrieval Evaluation
Information Retrieval Evaluation
José Ramón Ríos Viqueira
 
Probabilistic retrieval model
Probabilistic retrieval modelProbabilistic retrieval model
Probabilistic retrieval model
baradhimarch81
 
Ppt evaluation of information retrieval system
Ppt evaluation of information retrieval systemPpt evaluation of information retrieval system
Ppt evaluation of information retrieval system
silambu111
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
Anandh Arumugakan
 
Information retrieval-systems notes
Information retrieval-systems notesInformation retrieval-systems notes
Information retrieval-systems notes
BAIRAVI T
 
WEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEMWEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEM
Sai Kumar Ale
 
The impact of web on ir
The impact of web on irThe impact of web on ir
The impact of web on ir
Primya Tamil
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
Krish_ver2
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_Habib
El Habib NFAOUI
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
nimmyjans4
 
Parallel and Distributed Information Retrieval System
Parallel and Distributed Information Retrieval SystemParallel and Distributed Information Retrieval System
Parallel and Distributed Information Retrieval System
vimalsura
 
Evaluation in Information Retrieval
Evaluation in Information RetrievalEvaluation in Information Retrieval
Evaluation in Information Retrieval
Dishant Ailawadi
 
Signature files
Signature filesSignature files
Signature files
Deepali Raikar
 
Information retrieval 8 term weighting
Information retrieval 8 term weightingInformation retrieval 8 term weighting
Information retrieval 8 term weighting
Vaibhav Khanna
 
Text mining
Text miningText mining
Text mining
Koshy Geoji
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
Nanthini Dominique
 
Automatic indexing
Automatic indexingAutomatic indexing
Automatic indexing
dhatchayaninandu
 

What's hot (20)

Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systems
 
Term weighting
Term weightingTerm weighting
Term weighting
 
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I PPT IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDFCS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I PPT IN PDF
 
Information Retrieval Evaluation
Information Retrieval EvaluationInformation Retrieval Evaluation
Information Retrieval Evaluation
 
Probabilistic retrieval model
Probabilistic retrieval modelProbabilistic retrieval model
Probabilistic retrieval model
 
Ppt evaluation of information retrieval system
Ppt evaluation of information retrieval systemPpt evaluation of information retrieval system
Ppt evaluation of information retrieval system
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
 
Information retrieval-systems notes
Information retrieval-systems notesInformation retrieval-systems notes
Information retrieval-systems notes
 
WEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEMWEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEM
 
The impact of web on ir
The impact of web on irThe impact of web on ir
The impact of web on ir
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_Habib
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
 
Parallel and Distributed Information Retrieval System
Parallel and Distributed Information Retrieval SystemParallel and Distributed Information Retrieval System
Parallel and Distributed Information Retrieval System
 
Evaluation in Information Retrieval
Evaluation in Information RetrievalEvaluation in Information Retrieval
Evaluation in Information Retrieval
 
Signature files
Signature filesSignature files
Signature files
 
Information retrieval 8 term weighting
Information retrieval 8 term weightingInformation retrieval 8 term weighting
Information retrieval 8 term weighting
 
Text mining
Text miningText mining
Text mining
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
 
Automatic indexing
Automatic indexingAutomatic indexing
Automatic indexing
 

Similar to Boolean,vector space retrieval Models

Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Sean Golliher
 
4-IR Models_new.ppt
4-IR Models_new.ppt4-IR Models_new.ppt
4-IR Models_new.ppt
BereketAraya
 
4-IR Models_new.ppt
4-IR Models_new.ppt4-IR Models_new.ppt
4-IR Models_new.ppt
BereketAraya
 
Ir models
Ir modelsIr models
Ir models
Ambreen Angel
 
Data Mining Theory and Python Project.pptx
Data Mining Theory and Python Project.pptxData Mining Theory and Python Project.pptx
Data Mining Theory and Python Project.pptx
GaziMdNoorHossain
 
Search Engines
Search EnginesSearch Engines
Search Engines
butest
 
Lec 4,5
Lec 4,5Lec 4,5
Lec 4,5
alaa223
 
IRT Unit_ 2.pptx
IRT Unit_ 2.pptxIRT Unit_ 2.pptx
IRT Unit_ 2.pptx
thenmozhip8
 
Document ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspaceDocument ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspace
Prakash Dubey
 
Chapter 4 IR Models.pdf
Chapter 4 IR Models.pdfChapter 4 IR Models.pdf
Chapter 4 IR Models.pdf
Habtamu100
 
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Jonathon Hare
 
Ir 08
Ir   08Ir   08
Search Engines
Search EnginesSearch Engines
Search Engines
butest
 
Speech recognition using vector quantization through modified k means lbg alg...
Speech recognition using vector quantization through modified k means lbg alg...Speech recognition using vector quantization through modified k means lbg alg...
Speech recognition using vector quantization through modified k means lbg alg...
Alexander Decker
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
guest0edcaf
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
Datamining Tools
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
DataminingTools Inc
 
A Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and DistributionsA Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and Distributions
Rebecca Bilbro
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
PyData
 
UNIT 3 IRT.docx
UNIT 3 IRT.docxUNIT 3 IRT.docx
UNIT 3 IRT.docx
thenmozhip8
 

Similar to Boolean,vector space retrieval Models (20)

Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
 
4-IR Models_new.ppt
4-IR Models_new.ppt4-IR Models_new.ppt
4-IR Models_new.ppt
 
4-IR Models_new.ppt
4-IR Models_new.ppt4-IR Models_new.ppt
4-IR Models_new.ppt
 
Ir models
Ir modelsIr models
Ir models
 
Data Mining Theory and Python Project.pptx
Data Mining Theory and Python Project.pptxData Mining Theory and Python Project.pptx
Data Mining Theory and Python Project.pptx
 
Search Engines
Search EnginesSearch Engines
Search Engines
 
Lec 4,5
Lec 4,5Lec 4,5
Lec 4,5
 
IRT Unit_ 2.pptx
IRT Unit_ 2.pptxIRT Unit_ 2.pptx
IRT Unit_ 2.pptx
 
Document ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspaceDocument ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspace
 
Chapter 4 IR Models.pdf
Chapter 4 IR Models.pdfChapter 4 IR Models.pdf
Chapter 4 IR Models.pdf
 
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
 
Ir 08
Ir   08Ir   08
Ir 08
 
Search Engines
Search EnginesSearch Engines
Search Engines
 
Speech recognition using vector quantization through modified k means lbg alg...
Speech recognition using vector quantization through modified k means lbg alg...Speech recognition using vector quantization through modified k means lbg alg...
Speech recognition using vector quantization through modified k means lbg alg...
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
A Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and DistributionsA Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and Distributions
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
 
UNIT 3 IRT.docx
UNIT 3 IRT.docxUNIT 3 IRT.docx
UNIT 3 IRT.docx
 

Recently uploaded

Creativity for Innovation and Speechmaking
Creativity for Innovation and SpeechmakingCreativity for Innovation and Speechmaking
Creativity for Innovation and Speechmaking
MattVassar1
 
Contiguity Of Various Message Forms - Rupam Chandra.pptx
Contiguity Of Various Message Forms - Rupam Chandra.pptxContiguity Of Various Message Forms - Rupam Chandra.pptx
Contiguity Of Various Message Forms - Rupam Chandra.pptx
Kalna College
 
Keynote given on June 24 for MASSP at Grand Traverse City
Keynote given on June 24 for MASSP at Grand Traverse CityKeynote given on June 24 for MASSP at Grand Traverse City
Keynote given on June 24 for MASSP at Grand Traverse City
PJ Caposey
 
220711130088 Sumi Basak Virtual University EPC 3.pptx
220711130088 Sumi Basak Virtual University EPC 3.pptx220711130088 Sumi Basak Virtual University EPC 3.pptx
220711130088 Sumi Basak Virtual University EPC 3.pptx
Kalna College
 
220711130097 Tulip Samanta Concept of Information and Communication Technology
220711130097 Tulip Samanta Concept of Information and Communication Technology220711130097 Tulip Samanta Concept of Information and Communication Technology
220711130097 Tulip Samanta Concept of Information and Communication Technology
Kalna College
 
Get Success with the Latest UiPath UIPATH-ADPV1 Exam Dumps (V11.02) 2024
Get Success with the Latest UiPath UIPATH-ADPV1 Exam Dumps (V11.02) 2024Get Success with the Latest UiPath UIPATH-ADPV1 Exam Dumps (V11.02) 2024
Get Success with the Latest UiPath UIPATH-ADPV1 Exam Dumps (V11.02) 2024
yarusun
 
Non-Verbal Communication for Tech Professionals
Non-Verbal Communication for Tech ProfessionalsNon-Verbal Communication for Tech Professionals
Non-Verbal Communication for Tech Professionals
MattVassar1
 
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptxCapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapitolTechU
 
INTRODUCTION TO HOSPITALS & AND ITS ORGANIZATION
INTRODUCTION TO HOSPITALS & AND ITS ORGANIZATION INTRODUCTION TO HOSPITALS & AND ITS ORGANIZATION
INTRODUCTION TO HOSPITALS & AND ITS ORGANIZATION
ShwetaGawande8
 
78 Microsoft-Publisher - Sirin Sultana Bora.pptx
78 Microsoft-Publisher - Sirin Sultana Bora.pptx78 Microsoft-Publisher - Sirin Sultana Bora.pptx
78 Microsoft-Publisher - Sirin Sultana Bora.pptx
Kalna College
 
Creation or Update of a Mandatory Field is Not Set in Odoo 17
Creation or Update of a Mandatory Field is Not Set in Odoo 17Creation or Update of a Mandatory Field is Not Set in Odoo 17
Creation or Update of a Mandatory Field is Not Set in Odoo 17
Celine George
 
Science-9-Lesson-1-The Bohr Model-NLC.pptx pptx
Science-9-Lesson-1-The Bohr Model-NLC.pptx pptxScience-9-Lesson-1-The Bohr Model-NLC.pptx pptx
Science-9-Lesson-1-The Bohr Model-NLC.pptx pptx
Catherine Dela Cruz
 
220711130082 Srabanti Bag Internet Resources For Natural Science
220711130082 Srabanti Bag Internet Resources For Natural Science220711130082 Srabanti Bag Internet Resources For Natural Science
220711130082 Srabanti Bag Internet Resources For Natural Science
Kalna College
 
Creating Images and Videos through AI.pptx
Creating Images and Videos through AI.pptxCreating Images and Videos through AI.pptx
Creating Images and Videos through AI.pptx
Forum of Blended Learning
 
Talking Tech through Compelling Visual Aids
Talking Tech through Compelling Visual AidsTalking Tech through Compelling Visual Aids
Talking Tech through Compelling Visual Aids
MattVassar1
 
Diversity Quiz Prelims by Quiz Club, IIT Kanpur
Diversity Quiz Prelims by Quiz Club, IIT KanpurDiversity Quiz Prelims by Quiz Club, IIT Kanpur
Diversity Quiz Prelims by Quiz Club, IIT Kanpur
Quiz Club IIT Kanpur
 
Erasmus + DISSEMINATION ACTIVITIES Croatia
Erasmus + DISSEMINATION ACTIVITIES CroatiaErasmus + DISSEMINATION ACTIVITIES Croatia
Erasmus + DISSEMINATION ACTIVITIES Croatia
whatchangedhowreflec
 
What are the new features in the Fleet Odoo 17
What are the new features in the Fleet Odoo 17What are the new features in the Fleet Odoo 17
What are the new features in the Fleet Odoo 17
Celine George
 
Accounting for Restricted Grants When and How To Record Properly
Accounting for Restricted Grants  When and How To Record ProperlyAccounting for Restricted Grants  When and How To Record Properly
Accounting for Restricted Grants When and How To Record Properly
TechSoup
 
Library news letter Kitengesa Uganda June 2024
Library news letter Kitengesa Uganda June 2024Library news letter Kitengesa Uganda June 2024
Library news letter Kitengesa Uganda June 2024
Friends of African Village Libraries
 

Recently uploaded (20)

Creativity for Innovation and Speechmaking
Creativity for Innovation and SpeechmakingCreativity for Innovation and Speechmaking
Creativity for Innovation and Speechmaking
 
Contiguity Of Various Message Forms - Rupam Chandra.pptx
Contiguity Of Various Message Forms - Rupam Chandra.pptxContiguity Of Various Message Forms - Rupam Chandra.pptx
Contiguity Of Various Message Forms - Rupam Chandra.pptx
 
Keynote given on June 24 for MASSP at Grand Traverse City
Keynote given on June 24 for MASSP at Grand Traverse CityKeynote given on June 24 for MASSP at Grand Traverse City
Keynote given on June 24 for MASSP at Grand Traverse City
 
220711130088 Sumi Basak Virtual University EPC 3.pptx
220711130088 Sumi Basak Virtual University EPC 3.pptx220711130088 Sumi Basak Virtual University EPC 3.pptx
220711130088 Sumi Basak Virtual University EPC 3.pptx
 
220711130097 Tulip Samanta Concept of Information and Communication Technology
220711130097 Tulip Samanta Concept of Information and Communication Technology220711130097 Tulip Samanta Concept of Information and Communication Technology
220711130097 Tulip Samanta Concept of Information and Communication Technology
 
Get Success with the Latest UiPath UIPATH-ADPV1 Exam Dumps (V11.02) 2024
Get Success with the Latest UiPath UIPATH-ADPV1 Exam Dumps (V11.02) 2024Get Success with the Latest UiPath UIPATH-ADPV1 Exam Dumps (V11.02) 2024
Get Success with the Latest UiPath UIPATH-ADPV1 Exam Dumps (V11.02) 2024
 
Non-Verbal Communication for Tech Professionals
Non-Verbal Communication for Tech ProfessionalsNon-Verbal Communication for Tech Professionals
Non-Verbal Communication for Tech Professionals
 
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptxCapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
 
INTRODUCTION TO HOSPITALS & AND ITS ORGANIZATION
INTRODUCTION TO HOSPITALS & AND ITS ORGANIZATION INTRODUCTION TO HOSPITALS & AND ITS ORGANIZATION
INTRODUCTION TO HOSPITALS & AND ITS ORGANIZATION
 
78 Microsoft-Publisher - Sirin Sultana Bora.pptx
78 Microsoft-Publisher - Sirin Sultana Bora.pptx78 Microsoft-Publisher - Sirin Sultana Bora.pptx
78 Microsoft-Publisher - Sirin Sultana Bora.pptx
 
Creation or Update of a Mandatory Field is Not Set in Odoo 17
Creation or Update of a Mandatory Field is Not Set in Odoo 17Creation or Update of a Mandatory Field is Not Set in Odoo 17
Creation or Update of a Mandatory Field is Not Set in Odoo 17
 
Science-9-Lesson-1-The Bohr Model-NLC.pptx pptx
Science-9-Lesson-1-The Bohr Model-NLC.pptx pptxScience-9-Lesson-1-The Bohr Model-NLC.pptx pptx
Science-9-Lesson-1-The Bohr Model-NLC.pptx pptx
 
220711130082 Srabanti Bag Internet Resources For Natural Science
220711130082 Srabanti Bag Internet Resources For Natural Science220711130082 Srabanti Bag Internet Resources For Natural Science
220711130082 Srabanti Bag Internet Resources For Natural Science
 
Creating Images and Videos through AI.pptx
Creating Images and Videos through AI.pptxCreating Images and Videos through AI.pptx
Creating Images and Videos through AI.pptx
 
Talking Tech through Compelling Visual Aids
Talking Tech through Compelling Visual AidsTalking Tech through Compelling Visual Aids
Talking Tech through Compelling Visual Aids
 
Diversity Quiz Prelims by Quiz Club, IIT Kanpur
Diversity Quiz Prelims by Quiz Club, IIT KanpurDiversity Quiz Prelims by Quiz Club, IIT Kanpur
Diversity Quiz Prelims by Quiz Club, IIT Kanpur
 
Erasmus + DISSEMINATION ACTIVITIES Croatia
Erasmus + DISSEMINATION ACTIVITIES CroatiaErasmus + DISSEMINATION ACTIVITIES Croatia
Erasmus + DISSEMINATION ACTIVITIES Croatia
 
What are the new features in the Fleet Odoo 17
What are the new features in the Fleet Odoo 17What are the new features in the Fleet Odoo 17
What are the new features in the Fleet Odoo 17
 
Accounting for Restricted Grants When and How To Record Properly
Accounting for Restricted Grants  When and How To Record ProperlyAccounting for Restricted Grants  When and How To Record Properly
Accounting for Restricted Grants When and How To Record Properly
 
Library news letter Kitengesa Uganda June 2024
Library news letter Kitengesa Uganda June 2024Library news letter Kitengesa Uganda June 2024
Library news letter Kitengesa Uganda June 2024
 

Boolean,vector space retrieval Models

  • 1. Ms. T. Primya Assistant Professor Department of Computer Science and Engineering Dr. N. G. P. Institute of Technology Coimbatore
  • 2.  A retrieval model can be a description of either the computational process or the human process of retrieval  the process of choosing documents for retrieval  the process by which information needs are first articulated and then refined.
  • 3.  Boolean Models  Vector Space Models  Probabilistic Models  Models based on Belief nets  Models based on Language Models
  • 4.  A document is represented as a set of keywords.  Index terms are considered to be either present or absent in a document and to provide equal evidence with respect to information needs.  Queries are Boolean expressions of keywords, connected by AND, OR, and NOT, including the use of brackets to indicate scope. [[Rio & Brazil] | [Hilo & Hawaii]] & hotel & !Hilton]  Output: Document is relevant or not. No partial matches or ranking.
  • 5.  User need: I’m interested in learning about vitamins other than vitamin e that are anti-oxidants.  User’s Boolean query: antioxidant AND vitamin AND NOT vitamin e
  • 6.  For each retrieval model, there explicit three components:  Document representation d  Query q  Ranking function R(d, q)
  • 7.  An IR strategy is a technique by which a relevance measure is obtained between a query and a document.  Retrieve documents that make the query true.
  • 8.  Boolean-Documents either match or don’t.  Good for expert users with precise understanding of their needs and of the collection.  Also good for applications: Applications can easily consume 1000s of results.  Not good for the majority of users  This is particularly true of web search.
  • 9.  Boolean queries often have either too few or too many results. Query 1 standard AND user AND dlink AND 650 → 200,000 hits Feast! Query 2 standard AND user AND dlink AND 650 AND no AND card AND found → 0 hits Famine!  In Boolean retrieval, it takes a lot of skill to come up with a query that produces a manageable number of hits.  In ranked retrieval, “feast or famine” is less of a problem.  Condition: Results that are more relevant are ranked higher than results that are less relevant. (i.e., the ranking algorithm works.)
  • 10.  A commonly used measure of overlap of two sets  Let A and B be two sets  Jaccard coefficient: jaccard(A,B) = |A∩B| |A∪B|  jaccard(A,A) = 1  jaccard(A,B) = 0 if A∩B = 0  A and B don’t have to be the same size. Always assigns a number between 0 and 1.
  • 11. What is the query-document match score that the Jaccard coefficient computes for:  Query “ides of March”  Document “Caesar died in March” jaccard(q,d) = 1/6
  • 12.  It doesn’t consider term frequency (how many occurrences a term has).  Rare terms are more informative than frequent terms.  Jaccard does not consider this information.
  • 13. Advantages  Can use very restrictive search  Makes experienced users happy  Clear formalism  Simplicity  It is still used in small scale searches like searching e- mails, files from local hard drives
  • 14. Disadvantages  Simple queries do not work well.  Complex query language, confusing to end users  Difficult to control the number of documents retrieved. ◦ All matched documents will be returned.  Difficult to rank output. ◦ All matched documents logically satisfy the query.  Difficult to perform relevance feedback. ◦ If a document is identified by the user as relevant or irrelevant, how should the query be modified?
  • 15.  Vector space model or term vector model is an algebraic model for representing text documents (and any objects, in general) as vectors of identifiers, such as, for example, index terms.  It is used in information filtering, information retrieval, indexing and relevancy rankings.
  • 16. The basis vectors correspond to the dimensions or directions of the vector space
  • 17. A vector is a point in a vector space and has length (from the origin to the point) and direction
  • 18.  A 2-dimensional vector can be written as [x, y]  A 3-dimensional vector can be written as [x, y, z]
  • 19.  Let V denote the size of the indexed vocabulary  Any arbitrary span of text (i.e., a document, or a query) can be represented as a vector in V- dimensional space  let’s assume three index terms: dog, bite, man (i.e., V=3)
  • 20. 1 = the term appears at least once 0 = the term does not appear
  • 21. A query is a vector in V-dimensional space, where V is the number of terms in the vocabulary
  • 22.  The vector space model ranks documents based on the vector-space similarity between the query vector and the document vector  There are many ways to compute the similarity between two vectors  One way is to compute the inner product
  • 23. Multiply corresponding components and then sum of those products
  • 24. Pros and Cons  The inner-product doesn’t account for the fact that documents have widely varying lengths  All things being equal, longer documents are more likely to have the query-terms  So, the inner-product favours long documents
  • 25.  Document represented as a vector: d =< d1; d2; …. dn >  Query represented as a vector: q =< q1; q2;…. qn >  Ranking function (retrieval status value):
  • 26.  The cosine similarity between two vectors (or two documents on the Vector Space) is a measure that calculates the cosine of the angle between them.  the cosine similarity equation is to solve the equation of the dot product for the :  The numerator is the inner product  The denominator is the product of the two vector- lengths  Ranges from 0 to 1 (equals 1 if the vectors are identical)
  • 27.  a =[1, 2, 3]  b =[4,-5,6] a with b is dpab = 1*4 + 2*-5 + 3*6 = 12 a with itself is dpaa = 1*1 + 2*2 + 3*3 = 14 b with itself is dpbb = 4*4 + -5*-5 + 6*6 = 77 la = (dpaa) ½ = (14) ½ = 3.74; i.e., the length of a. lb = (dpbb) ½ = (77)½ = 8.77; i.e., the length of b. la*lb = (dpaa) ½ * (dpbb) ½ = 32.83; i.e., the length product (lpab) of a and b.
  • 29.  The vector space model procedure can be divided into three stages.  The first stage is the document indexing where content bearing terms are extracted from the document text.  The second stage is the weighting of the indexed terms to enhance retrieval of document relevant to the user.  The last stage ranks the document with respect to the query according to a similarity measure.
  翻译: