尊敬的 微信汇率:1円 ≈ 0.046166 元 支付宝汇率:1円 ≈ 0.046257元 [退出登录]
SlideShare a Scribd company logo
Submitted by: Iqra Tamseela
Roll no: 20 21
Topic
Normalization
Presentation
Information Retrieval
Technique
Our Content
• IR tool
Why use information retrieval tools
Retrieval Tools. Systems created for retrieval of information.Retrieval
tools are essential as basic building blocks for a system that will organize
recorded information that is collected by libraries, archives, museums, etc.
What is Normalization
The process of organizing data to minimize
redundancy. Normalization usually involves dividing a database into
two or more tables and defining relationships between the tables. ...
For example, in an employee list, each table would contain only one
birthdate field. normalization and its Types. Normalization is the
process of organizing data into a related table; it also eliminates
redundancy and increases the integrity which improves performance
of the query. Mar 15, 2011
Types of Normalization
• Normalization Avoids:
• Duplication of Data - The same data is listed in multiple lines of the database
• Insert Anomaly - A record about an entity cannot be inserted into the table without first
inserting information about another entity - Cannot enter a customer without a sales order
• Delete Anomaly - A record cannot be deleted without deleting a record about a related entity.
Cannot delete a sales order without deleting all of the customer's information.
• Update Anomaly - Cannot update information without changing information in many places.
To update customer information, it must be updated for each sales order the customer has placed
•
• Normalization ensure that the database is structured in the best possible way.
• To achieve control over data redundancy .There should be no necessary
duplication of data in different tables.
• To ensure tables have flexible.
• Searching,sorting,and creating indexes is faster, since tables are narrower, and
more rows fit on a data page.
• You usually have more tables
• Index searching is often faster
• A common misunderstanding is the term "frequency". To some, it seems to
be the count of objects. But usually, frequency is a relative value. TF/IDF
usually is a two-fold normalization. First, each document is normalized to
length 1, so there is no bias for longer or shorter documents
• Formula
• =tfi= tfi/tfmax
• More complicated SQL required for multitable sub queries and
joins.
• Extra work for DBMS can mean slower applications
• First Normal form(1NF)
• Second Normal form(2NF)
• Third Normal form(3NF)
• Fourth Normal form(4NF)
• Fifth Normal form(5NF)
•
Types of Normalization
•
• Document length normalization adjusts the term frequency or the
relevance score in order to normalize the effect of document length on the
document ranking.
. we may need to “normalize” words in indexed text as well as query words into the same
form
. we want to match U.S.A and USA
Token are transformed to terms which are then entered into the index
A term is a(normalized)word type ,which is an entry in our IR system dictionary
We most commonly implicitly define equivalence class of terms by e.g.,
deleting periods to form a term
U.S.A, USA(USA
. deleting hyphens to form a term
.anti-discriminatory, antidiscriminatory (antidiscriminatory
• Accents: e.g., French résumés. resume
• Simple remedy remove accent but not good in case of Resume
with and without accent.
Thanks for paying attention

More Related Content

What's hot

The database applications
The database applicationsThe database applications
The database applications
Dolat Ram
 
File organization and introduction of DBMS
File organization and introduction of DBMSFile organization and introduction of DBMS
File organization and introduction of DBMS
VrushaliSolanke
 
Information retrieval system
Information retrieval systemInformation retrieval system
Information retrieval system
Leslie Vargas
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval s
silambu111
 
The impact of web on ir
The impact of web on irThe impact of web on ir
The impact of web on ir
Primya Tamil
 
Etl process in data warehouse
Etl process in data warehouseEtl process in data warehouse
Etl process in data warehouse
Komal Choudhary
 
Information Retrieval Models
Information Retrieval ModelsInformation Retrieval Models
Information Retrieval Models
Nisha Arankandath
 
Deadlock management
Deadlock managementDeadlock management
Deadlock management
Ahmed kasim
 
Red black tree
Red black treeRed black tree
Red black tree
Dr Sandeep Kumar Poonia
 
Distributed database management system
Distributed database management  systemDistributed database management  system
Distributed database management system
Pooja Dixit
 
File organization
File organizationFile organization
File organization
RituBhargava7
 
Information retrieval-systems notes
Information retrieval-systems notesInformation retrieval-systems notes
Information retrieval-systems notes
BAIRAVI T
 
Less08 managing data and concurrency
Less08 managing data and concurrencyLess08 managing data and concurrency
Less08 managing data and concurrency
Imran Ali
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
Roi Blanco
 
automatic classification in information retrieval
automatic classification in information retrievalautomatic classification in information retrieval
automatic classification in information retrieval
Basma Gamal
 
Web search vs ir
Web search vs irWeb search vs ir
Web search vs ir
Primya Tamil
 
FILE STRUCTURE IN DBMS
FILE STRUCTURE IN DBMSFILE STRUCTURE IN DBMS
FILE STRUCTURE IN DBMS
Abhishek Dutta
 
Functions of information retrival system(1)
Functions of information retrival system(1)Functions of information retrival system(1)
Functions of information retrival system(1)
silambu111
 
Introduction to Information Retrieval & Models
Introduction to Information Retrieval & ModelsIntroduction to Information Retrieval & Models
Introduction to Information Retrieval & Models
Mounia Lalmas-Roelleke
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
SOMASUNDARAM T
 

What's hot (20)

The database applications
The database applicationsThe database applications
The database applications
 
File organization and introduction of DBMS
File organization and introduction of DBMSFile organization and introduction of DBMS
File organization and introduction of DBMS
 
Information retrieval system
Information retrieval systemInformation retrieval system
Information retrieval system
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval s
 
The impact of web on ir
The impact of web on irThe impact of web on ir
The impact of web on ir
 
Etl process in data warehouse
Etl process in data warehouseEtl process in data warehouse
Etl process in data warehouse
 
Information Retrieval Models
Information Retrieval ModelsInformation Retrieval Models
Information Retrieval Models
 
Deadlock management
Deadlock managementDeadlock management
Deadlock management
 
Red black tree
Red black treeRed black tree
Red black tree
 
Distributed database management system
Distributed database management  systemDistributed database management  system
Distributed database management system
 
File organization
File organizationFile organization
File organization
 
Information retrieval-systems notes
Information retrieval-systems notesInformation retrieval-systems notes
Information retrieval-systems notes
 
Less08 managing data and concurrency
Less08 managing data and concurrencyLess08 managing data and concurrency
Less08 managing data and concurrency
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
automatic classification in information retrieval
automatic classification in information retrievalautomatic classification in information retrieval
automatic classification in information retrieval
 
Web search vs ir
Web search vs irWeb search vs ir
Web search vs ir
 
FILE STRUCTURE IN DBMS
FILE STRUCTURE IN DBMSFILE STRUCTURE IN DBMS
FILE STRUCTURE IN DBMS
 
Functions of information retrival system(1)
Functions of information retrival system(1)Functions of information retrival system(1)
Functions of information retrival system(1)
 
Introduction to Information Retrieval & Models
Introduction to Information Retrieval & ModelsIntroduction to Information Retrieval & Models
Introduction to Information Retrieval & Models
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 

Similar to information retrieval Techniques and normalization

Data dictionary
Data dictionaryData dictionary
Data dictionary
Ravi Shekhar
 
Databases
DatabasesDatabases
Data base and data entry presentation by mj n somya
Data base and data entry presentation by mj n somyaData base and data entry presentation by mj n somya
Data base and data entry presentation by mj n somya
Mukesh Jaiswal
 
Lec20.pptx introduction to data bases and information systems
Lec20.pptx introduction to data bases and information systemsLec20.pptx introduction to data bases and information systems
Lec20.pptx introduction to data bases and information systems
samiullahamjad06
 
Info systems databases
Info systems databasesInfo systems databases
Info systems databases
MR Z
 
Presentation DBMS (1)
Presentation DBMS (1)Presentation DBMS (1)
Presentation DBMS (1)
Ali Raza
 
2.7 use of ict in data management
2.7 use of ict in data management2.7 use of ict in data management
2.7 use of ict in data management
Haa'Meem Mohiyuddin
 
MIS-3rd Unit.pptx
MIS-3rd Unit.pptxMIS-3rd Unit.pptx
MIS-3rd Unit.pptx
ssuser5e8d69
 
MIS-3rd Unit.pptx
MIS-3rd Unit.pptxMIS-3rd Unit.pptx
MIS-3rd Unit.pptx
Sumit Kumar
 
1. Introduction to Data Structure.pptx
1. Introduction to Data Structure.pptx1. Introduction to Data Structure.pptx
1. Introduction to Data Structure.pptx
RahikAhmed
 
data warehousing
data warehousingdata warehousing
data warehousing
Tirath Mulani
 
Lecture 2 Data Structure Introduction
Lecture 2 Data Structure IntroductionLecture 2 Data Structure Introduction
Lecture 2 Data Structure Introduction
Abirami A
 
database1.pdf
database1.pdfdatabase1.pdf
database1.pdf
prashanna13
 
Data Management
Data ManagementData Management
Data Management
Mufaddal Nullwala
 
Chapter 6
Chapter 6Chapter 6
Chapter 6
meisaina
 
System Analysis And Design
System Analysis And DesignSystem Analysis And Design
System Analysis And Design
Lijo Stalin
 
MS-CIT Unit 9.pptx
MS-CIT Unit 9.pptxMS-CIT Unit 9.pptx
MS-CIT Unit 9.pptx
SHRIBALAJIINFOTECH
 
The Art of Requesting Data from IT
The Art of Requesting Data from ITThe Art of Requesting Data from IT
The Art of Requesting Data from IT
Brad Adams
 
Data Structure - Complete Basic Overview.ppt
Data Structure - Complete Basic Overview.pptData Structure - Complete Basic Overview.ppt
Data Structure - Complete Basic Overview.ppt
ak8820
 
Module 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptxModule 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptx
nikshaikh786
 

Similar to information retrieval Techniques and normalization (20)

Data dictionary
Data dictionaryData dictionary
Data dictionary
 
Databases
DatabasesDatabases
Databases
 
Data base and data entry presentation by mj n somya
Data base and data entry presentation by mj n somyaData base and data entry presentation by mj n somya
Data base and data entry presentation by mj n somya
 
Lec20.pptx introduction to data bases and information systems
Lec20.pptx introduction to data bases and information systemsLec20.pptx introduction to data bases and information systems
Lec20.pptx introduction to data bases and information systems
 
Info systems databases
Info systems databasesInfo systems databases
Info systems databases
 
Presentation DBMS (1)
Presentation DBMS (1)Presentation DBMS (1)
Presentation DBMS (1)
 
2.7 use of ict in data management
2.7 use of ict in data management2.7 use of ict in data management
2.7 use of ict in data management
 
MIS-3rd Unit.pptx
MIS-3rd Unit.pptxMIS-3rd Unit.pptx
MIS-3rd Unit.pptx
 
MIS-3rd Unit.pptx
MIS-3rd Unit.pptxMIS-3rd Unit.pptx
MIS-3rd Unit.pptx
 
1. Introduction to Data Structure.pptx
1. Introduction to Data Structure.pptx1. Introduction to Data Structure.pptx
1. Introduction to Data Structure.pptx
 
data warehousing
data warehousingdata warehousing
data warehousing
 
Lecture 2 Data Structure Introduction
Lecture 2 Data Structure IntroductionLecture 2 Data Structure Introduction
Lecture 2 Data Structure Introduction
 
database1.pdf
database1.pdfdatabase1.pdf
database1.pdf
 
Data Management
Data ManagementData Management
Data Management
 
Chapter 6
Chapter 6Chapter 6
Chapter 6
 
System Analysis And Design
System Analysis And DesignSystem Analysis And Design
System Analysis And Design
 
MS-CIT Unit 9.pptx
MS-CIT Unit 9.pptxMS-CIT Unit 9.pptx
MS-CIT Unit 9.pptx
 
The Art of Requesting Data from IT
The Art of Requesting Data from ITThe Art of Requesting Data from IT
The Art of Requesting Data from IT
 
Data Structure - Complete Basic Overview.ppt
Data Structure - Complete Basic Overview.pptData Structure - Complete Basic Overview.ppt
Data Structure - Complete Basic Overview.ppt
 
Module 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptxModule 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptx
 

Recently uploaded

An All-Around Benchmark of the DBaaS Market
An All-Around Benchmark of the DBaaS MarketAn All-Around Benchmark of the DBaaS Market
An All-Around Benchmark of the DBaaS Market
ScyllaDB
 
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
leebarnesutopia
 
Fuxnet [EN] .pdf
Fuxnet [EN]                                   .pdfFuxnet [EN]                                   .pdf
Fuxnet [EN] .pdf
Overkill Security
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
Facilitation Skills - When to Use and Why.pptx
Facilitation Skills - When to Use and Why.pptxFacilitation Skills - When to Use and Why.pptx
Facilitation Skills - When to Use and Why.pptx
Knoldus Inc.
 
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google CloudRadically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
ScyllaDB
 
ScyllaDB Leaps Forward with Dor Laor, CEO of ScyllaDB
ScyllaDB Leaps Forward with Dor Laor, CEO of ScyllaDBScyllaDB Leaps Forward with Dor Laor, CEO of ScyllaDB
ScyllaDB Leaps Forward with Dor Laor, CEO of ScyllaDB
ScyllaDB
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
ScyllaDB
 
Day 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data ManipulationDay 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data Manipulation
UiPathCommunity
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
Guidelines for Effective Data Visualization
Guidelines for Effective Data VisualizationGuidelines for Effective Data Visualization
Guidelines for Effective Data Visualization
UmmeSalmaM1
 
Introduction to ThousandEyes AMER Webinar
Introduction  to ThousandEyes AMER WebinarIntroduction  to ThousandEyes AMER Webinar
Introduction to ThousandEyes AMER Webinar
ThousandEyes
 
Building a Semantic Layer of your Data Platform
Building a Semantic Layer of your Data PlatformBuilding a Semantic Layer of your Data Platform
Building a Semantic Layer of your Data Platform
Enterprise Knowledge
 
Multivendor cloud production with VSF TR-11 - there and back again
Multivendor cloud production with VSF TR-11 - there and back againMultivendor cloud production with VSF TR-11 - there and back again
Multivendor cloud production with VSF TR-11 - there and back again
Kieran Kunhya
 
Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0
Neeraj Kumar Singh
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
AlexanderRichford
 
intra-mart Accel series 2024 Spring updates_En
intra-mart Accel series 2024 Spring updates_Enintra-mart Accel series 2024 Spring updates_En
intra-mart Accel series 2024 Spring updates_En
NTTDATA INTRAMART
 

Recently uploaded (20)

An All-Around Benchmark of the DBaaS Market
An All-Around Benchmark of the DBaaS MarketAn All-Around Benchmark of the DBaaS Market
An All-Around Benchmark of the DBaaS Market
 
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
 
Fuxnet [EN] .pdf
Fuxnet [EN]                                   .pdfFuxnet [EN]                                   .pdf
Fuxnet [EN] .pdf
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
Facilitation Skills - When to Use and Why.pptx
Facilitation Skills - When to Use and Why.pptxFacilitation Skills - When to Use and Why.pptx
Facilitation Skills - When to Use and Why.pptx
 
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google CloudRadically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
 
ScyllaDB Leaps Forward with Dor Laor, CEO of ScyllaDB
ScyllaDB Leaps Forward with Dor Laor, CEO of ScyllaDBScyllaDB Leaps Forward with Dor Laor, CEO of ScyllaDB
ScyllaDB Leaps Forward with Dor Laor, CEO of ScyllaDB
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
 
Day 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data ManipulationDay 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data Manipulation
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
Guidelines for Effective Data Visualization
Guidelines for Effective Data VisualizationGuidelines for Effective Data Visualization
Guidelines for Effective Data Visualization
 
Introduction to ThousandEyes AMER Webinar
Introduction  to ThousandEyes AMER WebinarIntroduction  to ThousandEyes AMER Webinar
Introduction to ThousandEyes AMER Webinar
 
Building a Semantic Layer of your Data Platform
Building a Semantic Layer of your Data PlatformBuilding a Semantic Layer of your Data Platform
Building a Semantic Layer of your Data Platform
 
Multivendor cloud production with VSF TR-11 - there and back again
Multivendor cloud production with VSF TR-11 - there and back againMultivendor cloud production with VSF TR-11 - there and back again
Multivendor cloud production with VSF TR-11 - there and back again
 
Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
 
intra-mart Accel series 2024 Spring updates_En
intra-mart Accel series 2024 Spring updates_Enintra-mart Accel series 2024 Spring updates_En
intra-mart Accel series 2024 Spring updates_En
 

information retrieval Techniques and normalization

  • 1. Submitted by: Iqra Tamseela Roll no: 20 21 Topic Normalization
  • 4. Why use information retrieval tools Retrieval Tools. Systems created for retrieval of information.Retrieval tools are essential as basic building blocks for a system that will organize recorded information that is collected by libraries, archives, museums, etc.
  • 5. What is Normalization The process of organizing data to minimize redundancy. Normalization usually involves dividing a database into two or more tables and defining relationships between the tables. ... For example, in an employee list, each table would contain only one birthdate field. normalization and its Types. Normalization is the process of organizing data into a related table; it also eliminates redundancy and increases the integrity which improves performance of the query. Mar 15, 2011
  • 6. Types of Normalization • Normalization Avoids: • Duplication of Data - The same data is listed in multiple lines of the database • Insert Anomaly - A record about an entity cannot be inserted into the table without first inserting information about another entity - Cannot enter a customer without a sales order • Delete Anomaly - A record cannot be deleted without deleting a record about a related entity. Cannot delete a sales order without deleting all of the customer's information. • Update Anomaly - Cannot update information without changing information in many places. To update customer information, it must be updated for each sales order the customer has placed •
  • 7. • Normalization ensure that the database is structured in the best possible way. • To achieve control over data redundancy .There should be no necessary duplication of data in different tables. • To ensure tables have flexible.
  • 8. • Searching,sorting,and creating indexes is faster, since tables are narrower, and more rows fit on a data page. • You usually have more tables • Index searching is often faster
  • 9. • A common misunderstanding is the term "frequency". To some, it seems to be the count of objects. But usually, frequency is a relative value. TF/IDF usually is a two-fold normalization. First, each document is normalized to length 1, so there is no bias for longer or shorter documents • Formula • =tfi= tfi/tfmax
  • 10. • More complicated SQL required for multitable sub queries and joins. • Extra work for DBMS can mean slower applications
  • 11. • First Normal form(1NF) • Second Normal form(2NF) • Third Normal form(3NF) • Fourth Normal form(4NF) • Fifth Normal form(5NF)
  • 12.
  • 13.
  • 15. • Document length normalization adjusts the term frequency or the relevance score in order to normalize the effect of document length on the document ranking.
  • 16. . we may need to “normalize” words in indexed text as well as query words into the same form . we want to match U.S.A and USA Token are transformed to terms which are then entered into the index A term is a(normalized)word type ,which is an entry in our IR system dictionary We most commonly implicitly define equivalence class of terms by e.g., deleting periods to form a term U.S.A, USA(USA . deleting hyphens to form a term .anti-discriminatory, antidiscriminatory (antidiscriminatory
  • 17. • Accents: e.g., French résumés. resume • Simple remedy remove accent but not good in case of Resume with and without accent.
  • 18.
  • 19. Thanks for paying attention
  翻译: