information retrieval Techniques and normalization

Submitted by: Iqra Tamseela
Roll no: 20 21
Topic
Normalization

Presentation
Information Retrieval
Technique

Why use information retrieval tools
Retrieval Tools. Systems created for retrieval of information.Retrieval
tools are essential as basic building blocks for a system that will organize
recorded information that is collected by libraries, archives, museums, etc.

What is Normalization
The process of organizing data to minimize
redundancy. Normalization usually involves dividing a database into
two or more tables and defining relationships between the tables. ...
For example, in an employee list, each table would contain only one
birthdate field. normalization and its Types. Normalization is the
process of organizing data into a related table; it also eliminates
redundancy and increases the integrity which improves performance
of the query. Mar 15, 2011

Types of Normalization
• Normalization Avoids:
• Duplication of Data - The same data is listed in multiple lines of the database
• Insert Anomaly - A record about an entity cannot be inserted into the table without first
inserting information about another entity - Cannot enter a customer without a sales order
• Delete Anomaly - A record cannot be deleted without deleting a record about a related entity.
Cannot delete a sales order without deleting all of the customer's information.
• Update Anomaly - Cannot update information without changing information in many places.
To update customer information, it must be updated for each sales order the customer has placed
•

• Normalization ensure that the database is structured in the best possible way.
• To achieve control over data redundancy .There should be no necessary
duplication of data in different tables.
• To ensure tables have flexible.

• Searching,sorting,and creating indexes is faster, since tables are narrower, and
more rows fit on a data page.
• You usually have more tables
• Index searching is often faster

• A common misunderstanding is the term "frequency". To some, it seems to
be the count of objects. But usually, frequency is a relative value. TF/IDF
usually is a two-fold normalization. First, each document is normalized to
length 1, so there is no bias for longer or shorter documents
• Formula
• =tfi= tfi/tfmax

• More complicated SQL required for multitable sub queries and
joins.
• Extra work for DBMS can mean slower applications

• First Normal form(1NF)
• Second Normal form(2NF)
• Third Normal form(3NF)
• Fourth Normal form(4NF)
• Fifth Normal form(5NF)

• Document length normalization adjusts the term frequency or the
relevance score in order to normalize the effect of document length on the
document ranking.

. we may need to “normalize” words in indexed text as well as query words into the same
form
. we want to match U.S.A and USA
Token are transformed to terms which are then entered into the index
A term is a(normalized)word type ,which is an entry in our IR system dictionary
We most commonly implicitly define equivalence class of terms by e.g.,
deleting periods to form a term
U.S.A, USA(USA
. deleting hyphens to form a term
.anti-discriminatory, antidiscriminatory (antidiscriminatory

• Accents: e.g., French résumés. resume
• Simple remedy remove accent but not good in case of Resume
with and without accent.

information retrieval Techniques and normalization

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to information retrieval Techniques and normalization

Similar to information retrieval Techniques and normalization (20)

Recently uploaded

Recently uploaded (20)

information retrieval Techniques and normalization