尊敬的 微信汇率:1円 ≈ 0.046166 元 支付宝汇率:1円 ≈ 0.046257元 [退出登录]
SlideShare a Scribd company logo
INFORMATION RETRIEVAL
Information Retrieval
• Information retrieval is the task of finding documents that are
relevant to a user’s need for information.
• The best-known examples of information retrieval systems are search
engines on the World Wide Web.
• A Web user can type a query such as [AI book] into a search engine
and see a list of relevant pages.
• In this section, we will see how such systems are built.
An IR system can be characterized by
1. A corpus of documents. Each system must decide what it wants to treat
as a document: a paragraph, a page, or a multipage text.
2. Queries posed in a query language. A query specifies what the user
wants to know.
3. A result set. This is the subset of documents that the IR system judges to
be relevant to the query.
4. A presentation of the result set. This can be as simple as a ranked list of
document titles.
The earliest IR systems worked on a Boolean keyword model.
Cont.,
• This model has the advantage of being
• simple to explain and implement.
• Disadvantage:
• First, the degree of relevance of a document is a single bit, so there is
no guidance as to how to order the relevant documents for
presentation.
• Second, Boolean expressions are unfamiliar to users who are not
programmers or logicians.
• Third, it can be hard to formulate an appropriate query, even for a
skilled user.
IR scoring functions
• Most IR systems have abandoned the Boolean model and use models
based on the statistics of word counts.
• We describe the BM25 scoring function, which comes from the Okapi
project of Stephen Robertson and Karen Sparck Jones at London’s City
College, and has been used in search engines such as the open-source
Lucene project.
• A scoring function takes a document and a query and returns a numeric
score; the most relevant documents have the highest scores.
• In the BM25 function, the score is a linear weighted combination of scores
for each of the words that make up the query.
Three factors affect the weight of a query term:
• First, the frequency with which a query term appears in a document:
• For the query [farming in Kansas], documents that mention “farming”
frequently will have higher scores.
• Second, the inverse document frequency of the term, or IDF.
• The word “in” appears in almost every document, so it has a high document
frequency, and thus a low inverse document frequency, and thus it is not as
important to the query as “farming” or “Kansas.”
• Third, the length of the document.
• A million-word document will probably mention all the query words, but may
not actually be about the query. A short document that mentions all the
words is a much better candidate.
Cont.,
The BM25 function takes all three of these into account. Then, given a document
dj and a query consisting of the words q1:N, we have
TF(qi, dj ), the count of the number of times word qi appears in document dj .
DF(qi), that gives the number of documents that contain the word qi.
We have two parameters, k and b, that can be tuned by cross-validation; typical
values are k = 2.0 and b = 0.75.
Cont.,
• L is the average document length
• |dj | is the length of document dj in words
• IDF(qi) is the inverse document frequency of word qi, given by
IR system evaluation
• How do we know whether an IR system is performing well?
• We undertake an experiment in which the system is given a set of
queries and the result sets are scored with respect to human
relevance judgments.
• Traditionally, there have been two measures used in the scoring:
• recall
• precision.
Cont.,
• For example, Imagine that an IR system has returned a result set for a
single query, for which we know which documents are and are not
relevant, out of a corpus of 100 documents. The document counts in
each category are given in the following table:
Cont.,
• Precision measures the proportion of documents in the
result set that are actually relevant.
• In our example, the precision is 30/(30 + 10)=.75.
• The false positive rate is 1 − .75=.25.
• Recall measures the proportion of all the relevant
documents in the collection that are in the result set.
• In our example, recall is 30/(30 + 20)=.60.
• The false negative rate is 1 − .60=.40.
IR refinements
• There are many possible refinements to the system described here,
and indeed Web search engines are continually updating their
algorithms as they discover new approaches and as the Web grows
and changes.
• One common refinement is a better model of the effect of
document length on relevance.
The PageRank algorithm
• PageRank was one of the two original ideas that set Google’s search apart from
other Web search engines when it was introduced in 1997.
• PageRank was invented to solve the problem of the tyranny of TF scores: if the
query is [IBM], how do we make sure that IBM’s home page, ibm.com, is the first
result, even if another page mentions the term “IBM” more frequently?
• The idea is that ibm.com has many in-links (links to the page), so it should be
ranked higher:
• the PageRank algorithm is designed to weight links from high-quality sites more
heavily. What is a highquality site? One that is linked to by other high-quality
sites.
The HITS algorithm
• The Hyperlink-Induced Topic Search algorithm, also known as “Hubs
and Authorities” or HITS, is another influential link-analysis algorithm.
• HITS differs from PageRank in several ways.
• First, it is a query-dependent measure: it rates pages with respect to a
query.
• Given a query, HITS first finds a set of pages that are relevant to the
query.
Cont.,
• It does that by intersecting hit lists of query words, and then adding
pages in the link neighborhood of these pages—pages that link to or
are linked from one of the pages in the original relevant set.
• Each page in this set is considered an authority on the query to the
degree that other HUB pages in the relevant set point to it.
• A page is considered a hub to the degree that it points to other
authoritative pages in the relevant set.
Question answering
• Information retrieval is the task of finding documents that are
relevant to a query, where the query may be a question, or just a
topic area or concept.
• Question answering is a somewhat different task, in which the query
really is a question, and the answer is not a ranked list of documents
but rather a short response—a sentence, or even just a phrase.
• There have been question-answering NLP (natural language
processing) systems since the 1960s, but only since 2001 have such
systems used Web information retrieval to radically increase their
breadth of coverage.

More Related Content

What's hot

Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systems
Selman Bozkır
 
Information retrieval system
Information retrieval systemInformation retrieval system
Information retrieval system
Leslie Vargas
 
Inverted index
Inverted indexInverted index
Inverted index
Krishna Gehlot
 
Information retrival system and PageRank algorithm
Information retrival system and PageRank algorithmInformation retrival system and PageRank algorithm
Information retrival system and PageRank algorithm
Rupali Bhatnagar
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
nimmyjans4
 
Zipf's law
Zipf's lawZipf's law
Zipf's law
Mayur Pakhale
 
Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)
9866825059
 
Information retrieval 9 tf idf weights
Information retrieval 9 tf idf weightsInformation retrieval 9 tf idf weights
Information retrieval 9 tf idf weights
Vaibhav Khanna
 
Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrieval
KU Leuven
 
Probabilistic retrieval model
Probabilistic retrieval modelProbabilistic retrieval model
Probabilistic retrieval model
baradhimarch81
 
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I PPT IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDFCS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I PPT IN PDF
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
Isis tutorial2
Isis tutorial2Isis tutorial2
Isis tutorial2
onodiz2
 
Introduction to Information Retrieval & Models
Introduction to Information Retrieval & ModelsIntroduction to Information Retrieval & Models
Introduction to Information Retrieval & Models
Mounia Lalmas-Roelleke
 
Information retrieval-systems notes
Information retrieval-systems notesInformation retrieval-systems notes
Information retrieval-systems notes
BAIRAVI T
 
Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic Similarity
Saswat Padhi
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
Robert Lujo
 
Evaluation in Information Retrieval
Evaluation in Information RetrievalEvaluation in Information Retrieval
Evaluation in Information Retrieval
Dishant Ailawadi
 
Boolean Retrieval
Boolean RetrievalBoolean Retrieval
Boolean Retrieval
mghgk
 
Mining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactionalMining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactional
ramya marichamy
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
Nanthini Dominique
 

What's hot (20)

Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systems
 
Information retrieval system
Information retrieval systemInformation retrieval system
Information retrieval system
 
Inverted index
Inverted indexInverted index
Inverted index
 
Information retrival system and PageRank algorithm
Information retrival system and PageRank algorithmInformation retrival system and PageRank algorithm
Information retrival system and PageRank algorithm
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
 
Zipf's law
Zipf's lawZipf's law
Zipf's law
 
Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)
 
Information retrieval 9 tf idf weights
Information retrieval 9 tf idf weightsInformation retrieval 9 tf idf weights
Information retrieval 9 tf idf weights
 
Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrieval
 
Probabilistic retrieval model
Probabilistic retrieval modelProbabilistic retrieval model
Probabilistic retrieval model
 
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I PPT IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDFCS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I PPT IN PDF
 
Isis tutorial2
Isis tutorial2Isis tutorial2
Isis tutorial2
 
Introduction to Information Retrieval & Models
Introduction to Information Retrieval & ModelsIntroduction to Information Retrieval & Models
Introduction to Information Retrieval & Models
 
Information retrieval-systems notes
Information retrieval-systems notesInformation retrieval-systems notes
Information retrieval-systems notes
 
Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic Similarity
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
Evaluation in Information Retrieval
Evaluation in Information RetrievalEvaluation in Information Retrieval
Evaluation in Information Retrieval
 
Boolean Retrieval
Boolean RetrievalBoolean Retrieval
Boolean Retrieval
 
Mining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactionalMining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactional
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
 

Similar to Information Retrieval

Search quality in practice
Search quality in practiceSearch quality in practice
Search quality in practice
Alexander Sibiryakov
 
Chapter 7.pdf
Chapter 7.pdfChapter 7.pdf
Chapter 7.pdf
Habtamu100
 
qury.pdf
qury.pdfqury.pdf
qury.pdf
Habtamu100
 
Algorithms and Data Structures
Algorithms and Data StructuresAlgorithms and Data Structures
Algorithms and Data Structures
sonykhan3
 
Information retrieval systems irt ppt do
Information retrieval systems irt ppt doInformation retrieval systems irt ppt do
Information retrieval systems irt ppt do
PonnuthuraiSelvaraj1
 
Indexing Techniques: Their Usage in Search Engines for Information Retrieval
Indexing Techniques: Their Usage in Search Engines for Information RetrievalIndexing Techniques: Their Usage in Search Engines for Information Retrieval
Indexing Techniques: Their Usage in Search Engines for Information Retrieval
Vikas Bhushan
 
Phrase based Indexing and Information Retrieval
Phrase based Indexing and Information RetrievalPhrase based Indexing and Information Retrieval
Phrase based Indexing and Information Retrieval
Bala Abirami
 
Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologies
enterprisesearchmeetup
 
'A critique of testing' UK TMF forum January 2015
'A critique of testing' UK TMF forum January 2015 'A critique of testing' UK TMF forum January 2015
'A critique of testing' UK TMF forum January 2015
Georgina Tilby
 
Query Dependent Pseudo-Relevance Feedback based on Wikipedia
Query Dependent Pseudo-Relevance Feedback based on WikipediaQuery Dependent Pseudo-Relevance Feedback based on Wikipedia
Query Dependent Pseudo-Relevance Feedback based on Wikipedia
YI-JHEN LIN
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
S. Diana Hu
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
Joaquin Delgado PhD.
 
Information retrieval 6 ir models
Information retrieval 6 ir modelsInformation retrieval 6 ir models
Information retrieval 6 ir models
Vaibhav Khanna
 
Chapter 5 Query Evaluation.pdf
Chapter 5 Query Evaluation.pdfChapter 5 Query Evaluation.pdf
Chapter 5 Query Evaluation.pdf
Habtamu100
 
Interface for Finding Close Matches from Translation Memory
Interface for Finding Close Matches from Translation MemoryInterface for Finding Close Matches from Translation Memory
Interface for Finding Close Matches from Translation Memory
Priyatham Bollimpalli
 
An Advanced IR System of Relational Keyword Search Technique
An Advanced IR System of Relational Keyword Search TechniqueAn Advanced IR System of Relational Keyword Search Technique
An Advanced IR System of Relational Keyword Search Technique
paperpublications3
 
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.comPersonalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
Lucidworks
 
A Gentle Introduction to Text Analysis I
A Gentle Introduction to Text Analysis IA Gentle Introduction to Text Analysis I
A Gentle Introduction to Text Analysis I
UNCResearchHub
 
Combining IR with Relevance Feedback for Concept Location
Combining IR with Relevance Feedback for Concept LocationCombining IR with Relevance Feedback for Concept Location
Combining IR with Relevance Feedback for Concept Location
Sonia Haiduc
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Simon Hughes
 

Similar to Information Retrieval (20)

Search quality in practice
Search quality in practiceSearch quality in practice
Search quality in practice
 
Chapter 7.pdf
Chapter 7.pdfChapter 7.pdf
Chapter 7.pdf
 
qury.pdf
qury.pdfqury.pdf
qury.pdf
 
Algorithms and Data Structures
Algorithms and Data StructuresAlgorithms and Data Structures
Algorithms and Data Structures
 
Information retrieval systems irt ppt do
Information retrieval systems irt ppt doInformation retrieval systems irt ppt do
Information retrieval systems irt ppt do
 
Indexing Techniques: Their Usage in Search Engines for Information Retrieval
Indexing Techniques: Their Usage in Search Engines for Information RetrievalIndexing Techniques: Their Usage in Search Engines for Information Retrieval
Indexing Techniques: Their Usage in Search Engines for Information Retrieval
 
Phrase based Indexing and Information Retrieval
Phrase based Indexing and Information RetrievalPhrase based Indexing and Information Retrieval
Phrase based Indexing and Information Retrieval
 
Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologies
 
'A critique of testing' UK TMF forum January 2015
'A critique of testing' UK TMF forum January 2015 'A critique of testing' UK TMF forum January 2015
'A critique of testing' UK TMF forum January 2015
 
Query Dependent Pseudo-Relevance Feedback based on Wikipedia
Query Dependent Pseudo-Relevance Feedback based on WikipediaQuery Dependent Pseudo-Relevance Feedback based on Wikipedia
Query Dependent Pseudo-Relevance Feedback based on Wikipedia
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
Information retrieval 6 ir models
Information retrieval 6 ir modelsInformation retrieval 6 ir models
Information retrieval 6 ir models
 
Chapter 5 Query Evaluation.pdf
Chapter 5 Query Evaluation.pdfChapter 5 Query Evaluation.pdf
Chapter 5 Query Evaluation.pdf
 
Interface for Finding Close Matches from Translation Memory
Interface for Finding Close Matches from Translation MemoryInterface for Finding Close Matches from Translation Memory
Interface for Finding Close Matches from Translation Memory
 
An Advanced IR System of Relational Keyword Search Technique
An Advanced IR System of Relational Keyword Search TechniqueAn Advanced IR System of Relational Keyword Search Technique
An Advanced IR System of Relational Keyword Search Technique
 
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.comPersonalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
 
A Gentle Introduction to Text Analysis I
A Gentle Introduction to Text Analysis IA Gentle Introduction to Text Analysis I
A Gentle Introduction to Text Analysis I
 
Combining IR with Relevance Feedback for Concept Location
Combining IR with Relevance Feedback for Concept LocationCombining IR with Relevance Feedback for Concept Location
Combining IR with Relevance Feedback for Concept Location
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
 

More from ssbd6985

UNIT-3 Servlet
UNIT-3 ServletUNIT-3 Servlet
UNIT-3 Servlet
ssbd6985
 
Best methods of staff selection and motivation
Best methods of staff selection and motivationBest methods of staff selection and motivation
Best methods of staff selection and motivation
ssbd6985
 
Information Extraction
Information ExtractionInformation Extraction
Information Extraction
ssbd6985
 
Information Extraction
Information ExtractionInformation Extraction
Information Extraction
ssbd6985
 
information retrieval
information retrievalinformation retrieval
information retrieval
ssbd6985
 
Information Extraction
Information ExtractionInformation Extraction
Information Extraction
ssbd6985
 
Expert System Full Details
Expert System Full DetailsExpert System Full Details
Expert System Full Details
ssbd6985
 

More from ssbd6985 (7)

UNIT-3 Servlet
UNIT-3 ServletUNIT-3 Servlet
UNIT-3 Servlet
 
Best methods of staff selection and motivation
Best methods of staff selection and motivationBest methods of staff selection and motivation
Best methods of staff selection and motivation
 
Information Extraction
Information ExtractionInformation Extraction
Information Extraction
 
Information Extraction
Information ExtractionInformation Extraction
Information Extraction
 
information retrieval
information retrievalinformation retrieval
information retrieval
 
Information Extraction
Information ExtractionInformation Extraction
Information Extraction
 
Expert System Full Details
Expert System Full DetailsExpert System Full Details
Expert System Full Details
 

Recently uploaded

06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
Timothy Spann
 
Fabric Engineering Deep Dive Keynote from Fabric Engineering Roadshow
Fabric Engineering Deep Dive Keynote from Fabric Engineering RoadshowFabric Engineering Deep Dive Keynote from Fabric Engineering Roadshow
Fabric Engineering Deep Dive Keynote from Fabric Engineering Roadshow
Gabi Münster
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
GeorgiiSteshenko
 
Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...
Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...
Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...
mparmparousiskostas
 
High Profile Call Girls Navi Mumbai ✅ 9833363713 FULL CASH PAYMENT
High Profile Call Girls Navi Mumbai ✅ 9833363713 FULL CASH PAYMENTHigh Profile Call Girls Navi Mumbai ✅ 9833363713 FULL CASH PAYMENT
High Profile Call Girls Navi Mumbai ✅ 9833363713 FULL CASH PAYMENT
ranjeet3341
 
❣VIP Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai Escorts S...
❣VIP Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai Escorts S...❣VIP Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai Escorts S...
❣VIP Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai Escorts S...
jasodak99
 
❻❸❼⓿❽❻❷⓿⓿❼KALYAN MATKA CHART FINAL OPEN JODI PANNA FIXXX DPBOSS MATKA RESULT ...
❻❸❼⓿❽❻❷⓿⓿❼KALYAN MATKA CHART FINAL OPEN JODI PANNA FIXXX DPBOSS MATKA RESULT ...❻❸❼⓿❽❻❷⓿⓿❼KALYAN MATKA CHART FINAL OPEN JODI PANNA FIXXX DPBOSS MATKA RESULT ...
❻❸❼⓿❽❻❷⓿⓿❼KALYAN MATKA CHART FINAL OPEN JODI PANNA FIXXX DPBOSS MATKA RESULT ...
#kalyanmatkaresult #dpboss #kalyanmatka #satta #matka #sattamatka
 
SAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content DocumentSAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content Document
newdirectionconsulta
 
CAP Excel Formulas & Functions July - Copy (4).pdf
CAP Excel Formulas & Functions July - Copy (4).pdfCAP Excel Formulas & Functions July - Copy (4).pdf
CAP Excel Formulas & Functions July - Copy (4).pdf
frp60658
 
Essential Skills for Family Assessment - Marital and Family Therapy and Couns...
Essential Skills for Family Assessment - Marital and Family Therapy and Couns...Essential Skills for Family Assessment - Marital and Family Therapy and Couns...
Essential Skills for Family Assessment - Marital and Family Therapy and Couns...
PsychoTech Services
 
Pune Call Girls <BOOK> 😍 Call Girl Pune Escorts Service
Pune Call Girls <BOOK> 😍 Call Girl Pune Escorts ServicePune Call Girls <BOOK> 😍 Call Girl Pune Escorts Service
Pune Call Girls <BOOK> 😍 Call Girl Pune Escorts Service
vashimk775
 
Bangalore ℂall Girl 000000 Bangalore Escorts Service
Bangalore ℂall Girl 000000 Bangalore Escorts ServiceBangalore ℂall Girl 000000 Bangalore Escorts Service
Bangalore ℂall Girl 000000 Bangalore Escorts Service
nhero3888
 
Difference in Differences - Does Strict Speed Limit Restrictions Reduce Road ...
Difference in Differences - Does Strict Speed Limit Restrictions Reduce Road ...Difference in Differences - Does Strict Speed Limit Restrictions Reduce Road ...
Difference in Differences - Does Strict Speed Limit Restrictions Reduce Road ...
ThinkInnovation
 
machine learning notes by Andrew Ng and Tengyu Ma
machine learning notes by Andrew Ng and Tengyu Mamachine learning notes by Andrew Ng and Tengyu Ma
machine learning notes by Andrew Ng and Tengyu Ma
Vijayabaskar Uthirapathy
 
Call Girls In Tirunelveli 👯‍♀️ 7339748667 🔥 Safe Housewife Call Girl Service ...
Call Girls In Tirunelveli 👯‍♀️ 7339748667 🔥 Safe Housewife Call Girl Service ...Call Girls In Tirunelveli 👯‍♀️ 7339748667 🔥 Safe Housewife Call Girl Service ...
Call Girls In Tirunelveli 👯‍♀️ 7339748667 🔥 Safe Housewife Call Girl Service ...
wwefun9823#S0007
 
Delhi Call Girls Karol Bagh 👉 9711199012 👈 unlimited short high profile full ...
Delhi Call Girls Karol Bagh 👉 9711199012 👈 unlimited short high profile full ...Delhi Call Girls Karol Bagh 👉 9711199012 👈 unlimited short high profile full ...
Delhi Call Girls Karol Bagh 👉 9711199012 👈 unlimited short high profile full ...
mona lisa $A12
 
Mumbai Call Girls service 9920874524 Call Girl service in Mumbai Mumbai Call ...
Mumbai Call Girls service 9920874524 Call Girl service in Mumbai Mumbai Call ...Mumbai Call Girls service 9920874524 Call Girl service in Mumbai Mumbai Call ...
Mumbai Call Girls service 9920874524 Call Girl service in Mumbai Mumbai Call ...
hanshkumar9870
 
Direct Lake Deep Dive slides from Fabric Engineering Roadshow
Direct Lake Deep Dive slides from Fabric Engineering RoadshowDirect Lake Deep Dive slides from Fabric Engineering Roadshow
Direct Lake Deep Dive slides from Fabric Engineering Roadshow
Gabi Münster
 
PCI-DSS-Data Security Standard v4.0.1.pdf
PCI-DSS-Data Security Standard v4.0.1.pdfPCI-DSS-Data Security Standard v4.0.1.pdf
PCI-DSS-Data Security Standard v4.0.1.pdf
incitbe
 
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call GirlCall Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
sapna sharmap11
 

Recently uploaded (20)

06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
 
Fabric Engineering Deep Dive Keynote from Fabric Engineering Roadshow
Fabric Engineering Deep Dive Keynote from Fabric Engineering RoadshowFabric Engineering Deep Dive Keynote from Fabric Engineering Roadshow
Fabric Engineering Deep Dive Keynote from Fabric Engineering Roadshow
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
 
Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...
Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...
Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...
 
High Profile Call Girls Navi Mumbai ✅ 9833363713 FULL CASH PAYMENT
High Profile Call Girls Navi Mumbai ✅ 9833363713 FULL CASH PAYMENTHigh Profile Call Girls Navi Mumbai ✅ 9833363713 FULL CASH PAYMENT
High Profile Call Girls Navi Mumbai ✅ 9833363713 FULL CASH PAYMENT
 
❣VIP Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai Escorts S...
❣VIP Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai Escorts S...❣VIP Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai Escorts S...
❣VIP Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai Escorts S...
 
❻❸❼⓿❽❻❷⓿⓿❼KALYAN MATKA CHART FINAL OPEN JODI PANNA FIXXX DPBOSS MATKA RESULT ...
❻❸❼⓿❽❻❷⓿⓿❼KALYAN MATKA CHART FINAL OPEN JODI PANNA FIXXX DPBOSS MATKA RESULT ...❻❸❼⓿❽❻❷⓿⓿❼KALYAN MATKA CHART FINAL OPEN JODI PANNA FIXXX DPBOSS MATKA RESULT ...
❻❸❼⓿❽❻❷⓿⓿❼KALYAN MATKA CHART FINAL OPEN JODI PANNA FIXXX DPBOSS MATKA RESULT ...
 
SAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content DocumentSAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content Document
 
CAP Excel Formulas & Functions July - Copy (4).pdf
CAP Excel Formulas & Functions July - Copy (4).pdfCAP Excel Formulas & Functions July - Copy (4).pdf
CAP Excel Formulas & Functions July - Copy (4).pdf
 
Essential Skills for Family Assessment - Marital and Family Therapy and Couns...
Essential Skills for Family Assessment - Marital and Family Therapy and Couns...Essential Skills for Family Assessment - Marital and Family Therapy and Couns...
Essential Skills for Family Assessment - Marital and Family Therapy and Couns...
 
Pune Call Girls <BOOK> 😍 Call Girl Pune Escorts Service
Pune Call Girls <BOOK> 😍 Call Girl Pune Escorts ServicePune Call Girls <BOOK> 😍 Call Girl Pune Escorts Service
Pune Call Girls <BOOK> 😍 Call Girl Pune Escorts Service
 
Bangalore ℂall Girl 000000 Bangalore Escorts Service
Bangalore ℂall Girl 000000 Bangalore Escorts ServiceBangalore ℂall Girl 000000 Bangalore Escorts Service
Bangalore ℂall Girl 000000 Bangalore Escorts Service
 
Difference in Differences - Does Strict Speed Limit Restrictions Reduce Road ...
Difference in Differences - Does Strict Speed Limit Restrictions Reduce Road ...Difference in Differences - Does Strict Speed Limit Restrictions Reduce Road ...
Difference in Differences - Does Strict Speed Limit Restrictions Reduce Road ...
 
machine learning notes by Andrew Ng and Tengyu Ma
machine learning notes by Andrew Ng and Tengyu Mamachine learning notes by Andrew Ng and Tengyu Ma
machine learning notes by Andrew Ng and Tengyu Ma
 
Call Girls In Tirunelveli 👯‍♀️ 7339748667 🔥 Safe Housewife Call Girl Service ...
Call Girls In Tirunelveli 👯‍♀️ 7339748667 🔥 Safe Housewife Call Girl Service ...Call Girls In Tirunelveli 👯‍♀️ 7339748667 🔥 Safe Housewife Call Girl Service ...
Call Girls In Tirunelveli 👯‍♀️ 7339748667 🔥 Safe Housewife Call Girl Service ...
 
Delhi Call Girls Karol Bagh 👉 9711199012 👈 unlimited short high profile full ...
Delhi Call Girls Karol Bagh 👉 9711199012 👈 unlimited short high profile full ...Delhi Call Girls Karol Bagh 👉 9711199012 👈 unlimited short high profile full ...
Delhi Call Girls Karol Bagh 👉 9711199012 👈 unlimited short high profile full ...
 
Mumbai Call Girls service 9920874524 Call Girl service in Mumbai Mumbai Call ...
Mumbai Call Girls service 9920874524 Call Girl service in Mumbai Mumbai Call ...Mumbai Call Girls service 9920874524 Call Girl service in Mumbai Mumbai Call ...
Mumbai Call Girls service 9920874524 Call Girl service in Mumbai Mumbai Call ...
 
Direct Lake Deep Dive slides from Fabric Engineering Roadshow
Direct Lake Deep Dive slides from Fabric Engineering RoadshowDirect Lake Deep Dive slides from Fabric Engineering Roadshow
Direct Lake Deep Dive slides from Fabric Engineering Roadshow
 
PCI-DSS-Data Security Standard v4.0.1.pdf
PCI-DSS-Data Security Standard v4.0.1.pdfPCI-DSS-Data Security Standard v4.0.1.pdf
PCI-DSS-Data Security Standard v4.0.1.pdf
 
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call GirlCall Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
 

Information Retrieval

  • 2. Information Retrieval • Information retrieval is the task of finding documents that are relevant to a user’s need for information. • The best-known examples of information retrieval systems are search engines on the World Wide Web. • A Web user can type a query such as [AI book] into a search engine and see a list of relevant pages. • In this section, we will see how such systems are built.
  • 3. An IR system can be characterized by 1. A corpus of documents. Each system must decide what it wants to treat as a document: a paragraph, a page, or a multipage text. 2. Queries posed in a query language. A query specifies what the user wants to know. 3. A result set. This is the subset of documents that the IR system judges to be relevant to the query. 4. A presentation of the result set. This can be as simple as a ranked list of document titles. The earliest IR systems worked on a Boolean keyword model.
  • 4. Cont., • This model has the advantage of being • simple to explain and implement. • Disadvantage: • First, the degree of relevance of a document is a single bit, so there is no guidance as to how to order the relevant documents for presentation. • Second, Boolean expressions are unfamiliar to users who are not programmers or logicians. • Third, it can be hard to formulate an appropriate query, even for a skilled user.
  • 5. IR scoring functions • Most IR systems have abandoned the Boolean model and use models based on the statistics of word counts. • We describe the BM25 scoring function, which comes from the Okapi project of Stephen Robertson and Karen Sparck Jones at London’s City College, and has been used in search engines such as the open-source Lucene project. • A scoring function takes a document and a query and returns a numeric score; the most relevant documents have the highest scores. • In the BM25 function, the score is a linear weighted combination of scores for each of the words that make up the query.
  • 6. Three factors affect the weight of a query term: • First, the frequency with which a query term appears in a document: • For the query [farming in Kansas], documents that mention “farming” frequently will have higher scores. • Second, the inverse document frequency of the term, or IDF. • The word “in” appears in almost every document, so it has a high document frequency, and thus a low inverse document frequency, and thus it is not as important to the query as “farming” or “Kansas.” • Third, the length of the document. • A million-word document will probably mention all the query words, but may not actually be about the query. A short document that mentions all the words is a much better candidate.
  • 7. Cont., The BM25 function takes all three of these into account. Then, given a document dj and a query consisting of the words q1:N, we have TF(qi, dj ), the count of the number of times word qi appears in document dj . DF(qi), that gives the number of documents that contain the word qi. We have two parameters, k and b, that can be tuned by cross-validation; typical values are k = 2.0 and b = 0.75.
  • 8. Cont., • L is the average document length • |dj | is the length of document dj in words • IDF(qi) is the inverse document frequency of word qi, given by
  • 9. IR system evaluation • How do we know whether an IR system is performing well? • We undertake an experiment in which the system is given a set of queries and the result sets are scored with respect to human relevance judgments. • Traditionally, there have been two measures used in the scoring: • recall • precision.
  • 10. Cont., • For example, Imagine that an IR system has returned a result set for a single query, for which we know which documents are and are not relevant, out of a corpus of 100 documents. The document counts in each category are given in the following table:
  • 11. Cont., • Precision measures the proportion of documents in the result set that are actually relevant. • In our example, the precision is 30/(30 + 10)=.75. • The false positive rate is 1 − .75=.25. • Recall measures the proportion of all the relevant documents in the collection that are in the result set. • In our example, recall is 30/(30 + 20)=.60. • The false negative rate is 1 − .60=.40.
  • 12. IR refinements • There are many possible refinements to the system described here, and indeed Web search engines are continually updating their algorithms as they discover new approaches and as the Web grows and changes. • One common refinement is a better model of the effect of document length on relevance.
  • 13. The PageRank algorithm • PageRank was one of the two original ideas that set Google’s search apart from other Web search engines when it was introduced in 1997. • PageRank was invented to solve the problem of the tyranny of TF scores: if the query is [IBM], how do we make sure that IBM’s home page, ibm.com, is the first result, even if another page mentions the term “IBM” more frequently? • The idea is that ibm.com has many in-links (links to the page), so it should be ranked higher: • the PageRank algorithm is designed to weight links from high-quality sites more heavily. What is a highquality site? One that is linked to by other high-quality sites.
  • 14. The HITS algorithm • The Hyperlink-Induced Topic Search algorithm, also known as “Hubs and Authorities” or HITS, is another influential link-analysis algorithm. • HITS differs from PageRank in several ways. • First, it is a query-dependent measure: it rates pages with respect to a query. • Given a query, HITS first finds a set of pages that are relevant to the query.
  • 15. Cont., • It does that by intersecting hit lists of query words, and then adding pages in the link neighborhood of these pages—pages that link to or are linked from one of the pages in the original relevant set. • Each page in this set is considered an authority on the query to the degree that other HUB pages in the relevant set point to it. • A page is considered a hub to the degree that it points to other authoritative pages in the relevant set.
  • 16. Question answering • Information retrieval is the task of finding documents that are relevant to a query, where the query may be a question, or just a topic area or concept. • Question answering is a somewhat different task, in which the query really is a question, and the answer is not a ranked list of documents but rather a short response—a sentence, or even just a phrase. • There have been question-answering NLP (natural language processing) systems since the 1960s, but only since 2001 have such systems used Web information retrieval to radically increase their breadth of coverage.
  翻译: