IT6701 Information Management - Unit III

IT6701 – Information Management
Unit III – Information Governance
By
Kaviya.P, AP/IT
Kamaraj College of Engineering & Technology
1

Unit III – Information Governance
Master Data Management (MDM) – Overview, Need
for MDM, Privacy, regulatory requirements and
compliance. Data Governance – Synchronization and
data quality management.
2

Master Data Management (MDM) - Introduction
• It is a problem for almost every enterprise to create and manage a single
version of data with good quality.
• The large amount of inconsistency with poor quality may generate
unexpected and unacceptable outcome.
• Therefore Master Data Management (MDM) is needed to resolve the
uncertainty of data and to make a single version of truth across the
enterprise.
• Single version of truth: Having only one physical record in a database
representing customer, product, location, etc.
• Identifying candidates for master data items and managing them is done
by MDM system.
3

Master Data Management (MDM) - Introduction
• In any enterprise or company data is of various types such as unstructured,
transactional, metadata, hierarchical and master data.
• Unstructured data: Does not have any particular format. There are text
format. Eg: PDF files, articles, white papers, E-Mails.
• Transactional data: Invoices, claims, sales deliveries, monetary and non-
monetary data.
• Meta data: Data about data which resides in repositories. Meta data exists
in structured or unstructured format. Eg: log files, XML documents, etc.
• Hierarchical data: Relationship among different data entities
• Master data: Categorized with respect to people, things, places and
concepts.
4

Need for Master Data Management (MDM)
• A large amount of data gets collected over a period of time .
• To keep the data intact accurate, updated and complete is major
challenge in most business applications.
• Data exist in different types and it has to be stored in different forms.
• The correct data version should be available, and accessible and the
outdated version discarded and inaccessible.
• Major issues with data management is data inconsistency. There exists
multiple and redundant copies of data.
• The business suffers when the critical data is not available to its stake
holders when they need it, or in the format that they can use it.
5

• As a result, the business fails to:
– Acquire and retain customers
– Leverage operational efficiency as a competitive differentiator
– Accelerate speed to value from acquisitions
– Support informed decision making
• In such an environment, data collection, data accessing, and data storage
has become complex due to its multidimensionality in terms of data types,
data storage forms, data management and data access.
• All stakeholders of business units and industries should have access to
complete, accurate, timely and secure data or information.
6

• The need to create and access a complete set of key data entities and their
corresponding relationship which are accurate and updated in a timely
manner.
• Objective of MDM: Single solution to all data requirements and focusing
on efficient management and growth of business.
• MDM aims to create and maintain consistent and integrated
management of accurate and timely updated “system of records” of the
business in a specific domain considering all the stakeholders and business
entities. (without compromising its quality)
• Enterprises should have strategic policies for classifying and prioritizing
data as per their usage and its value, and MDM comes with those.
7

Master Data Management (MDM) - Definition
• Master Data Management (MDM) is “the framework of processes and
technologies aimed at creating and maintaining an authoritative, reliable,
sustainable, accurate and secure data environment. It represent a "single
and holistic version of the truth for master data and its relationships, and is
an accepted benchmark used within an enterprise and across enterprises. It
spans a diverse set of application system lines of business channels and user
communities”.
• Master data is the official consistent set of identifiers, extended attributes and
hierarchies of the enterprise.
• MDM is the workflow process in which business and IT work together to
ensure uniformity, accuracy, stewardship, and accountability of the
enterprise’s official, shared information assets.
8

Characteristics & Benefits of MDM
• It provides a single version of truth.
• It provides an increased consistency by reducing redundancy and data discrepancies.
• It facilitates analysis across departments.
• It facilitates data governance and data stewardships.
• It facilitates support for multiple domains.
• It manages the relationship between domains efficiently.
• It supports easy configuration and administration of master data entities.
• It separates master data from individual applications.
• It acts as a central application independent resources.
• It simplifies ongoing integration tasks and reduces the development time for new applications.
• It ensures consistent master information across transactional and analytical systems.
• It addresses key issues such as latency and data quality feedback proactively rather than “after
the fact” in the data warehouse (DW).
• It provides safeguards and regulatory compliance.
• It improves operations and efficiency at low cost with increasing growth. 9

Master Data Management (MDM) Vs Data Warehouse (DW)
10
• MDM and DW have common process such as extraction, transformation and
loading (ETL).
• The difference between the two lies with respect to their goals, type of data, usage
of data and reporting needs and usage.
• Both MDM and DW have different goals for ensuring data consistency.
Master Data Management Data Warehouse
Ensures consistency at the source
level
Ensures a consistent view of data at the
warehouse level
Master data in MDM is normalized DW is mostly dependent on specialized design
such as star schemas to improve analytical
performance.
Applied only on entities and it affects
only dimensional tables.
Applied on transactional and non-transactional
data, and it affects both dimensional tables and
fact tables.
MDM works on current data DW works on historical data

Master Data Management (MDM) Vs Data Warehouse (DW)
11
Master Data Management Data Warehouse
In MDM, reports are based on data
governance, data quality and
compliance
In DW, reports are generated to facilitate analysis
In MDM the original data source gets
affected to maintain a single version
of accurate data
In DW, data is used by either the application or
systems to which the DW is accessible directly
without affecting the original data sources
MDM provides real-time data
correction
In DW there is a wait for correction until the
information is available
MDM is suitable for transactional
purpose
DW is more suitable for analytical purpose
MDM ensures that only correct data
is entered into the system
DW has no such kind of mechanism or facility
• MDM enhances the performance of DW by providing various benefits such as
integrity and consistency, ensuring its success.

Stages of MDM Implementation
 Identify sources of master data: Identifying sources that produce master data is an
activity which needs to be carried out thoroughly. Although some data sources can be
easily identified, a few sources that contain a huge amount of data remain hidden and
unnoticed, leading to an incomplete and ineffective MDM solution.
 Identify the provider and consumer of master data: The application producing
master data and the application using master data are identified. Whether the
application should update the master data or the changes to be done at the database level,
is an important decision to be taken.
 Collect and analyze metadata for your master data: The master data entities are
identified. The metadata of this master data such as the attribute of entities,
relationships, constraints, dependencies, the owner of data entities are identified.
 Appoint data stewards: Domain experts having knowledge of the current source data
and the ability to determine how to transform the source data into the master data
format have to be appointed. 12

 Implement a data governance program and data governance council: This council is
responsible for taking decisions with their knowledge and authority. To take decisions, the group
should have answers to questions like: What are the master data entities? What is the life span of a
particular data? When and how to authorize and audit the data?
 Develop the master data model: To develop the master data model, one should have a complete
knowledge of the format of master records, their attributes, their size, constraints on values to be
allowed etc. The most crucial activity is to perform the mapping between the master data model and
the current data sources. There should be a perfect balance, and master data should be designed
such that it should not lead to inconsistencies and at the same time give optimum performance.
 Choose a toolset: Cleaning, transforming and merging the source data to create master list with the
help of tools. The tools used to perform cleaning and merging of data are different for different data
types. Customer Data Integration (CDI) tools for creating the master data of customers and
Products Information Management (PIM) tools for creating the master data for products. The
toolset should be capable of finding and fixing data quality issues and maintaining versions and
hierarchies.
13

 Design the infrastructure: The major concern while designing the infrastructure is maintaining
availability, reliability and scalability. A lot of thought process is needed to design the
infrastructure once the clean and consistent master data is ready.
 Generate and test the master data: Interfacing and mapping of proper data source with the
master data list is done. This is an iterative and interactive process. After every mapping, results
are verified for their correctness which depends on the perfect match of data sources and master
data list.
• Modify the producing and consuming systems: The master data whether used by the source
system or any other system should always remain consistent and updated. The MDM can function
more effectively and the application itself manages data quality. As part of MDM strategy, all three
pillars of data management need to be looked into: data origination, data management and data
consumption.
 Implement the maintenance processes: MDM is iterative and incremental in nature. MDM
implementations include processes, tools and people for maintaining data quality. All data must
have a data steward who is responsible for ensuring the quality of master data.
14

MDM Architectural Dimensions
• MDM is multidimensional and comprises of a huge amount of data, various data types
and formats, technical and operational complexities.
• To manage and organize these complexities, proper classification and characterization
is a must.
• The three types of MDM
architectural dimensions are:
1. Design and deployment
2. Use pattern
3. Information scope / Data domain
15

Design and Deployment Dimension
• It is done with respect to architectural styles to support various MDM implementations.
• The principle behind MDM architectural style includes MDM data hub and data models
that manage all the data attributes of a particular domain.
• The MDM data hub is a database with software to manage the master data stored in the
database and keep it synchronized with the transactional systems that use the master data.
• The MDM hub contains functions and tools required to keep MDM entities and hierarchies
consistent and accurate.
• The design and deployment dimension include the following architectural styles:
 Registry style
 External Reference style
 Reconciliation engine style
 Transactional hub style
16

• Registry style:
– The registry style of MDM data hub represents a registry of master entity identifiers that are
created using identity attributes.
– The registry maintains identifying attributes.
– The identifying attributes are used by entity resolution service to identify the master records.
– The data hub is responsible for creating and maintaining links with data source to obtain
attributes.
• External Reference style:
– It maintains a MDM reference database that points to all source data stores.
– Sometimes MDM data hub may not have a reference pointer to the actual data of the given
domain
– The data hub may contain only a reference to the source records that continues to reside on a
legacy data store that needs to be updated.
17

• Reconciliation engine style:
– It maintains a system of records for all entity attributes.
– It is responsible for providing active synchronization between MDM data hub and
legacy system.
– The MDM data hub becomes the master for all data attributes that supports authoring
of master data contents.
– The reconciliation engine data hub relies on the source system for maintaining data
attributes.
– Limitation: Master data handled by some applications may have to be changed based on
business processes.
• Transactional hub style:
– The data hub becomes the primary source of records for the entire master data domain
with reference pointer.
– The data hub becomes the master of all entities and attributes.
– The data hub has to manage the complete transactional environment that maintains data
integrity.
– Limitation: It needs synchronization mechanism to propagate the changes from data hub
to system.
18

Use Pattern Dimension
• The pattern is a reusable approach to the solution that has been successfully implemented in
the real world to solve specific problem space.
• Patterns are observations that are documented for successful real life implementations.
• Analytical Master Data Management:
– It is composed of different business processes and applications that use master data for
analyzing the business performance.
– It also provides appropriate reports based on analytics by interfacing the business
intelligence(BI) and packages.
• Operational Master Data Management:
– It is intended to collect and change master data for processing business transactions.
– It is designed to maintain consistency and integrity of master data affected by
transactional activity.
– It is also responsible for maintaining a single and accurate copy of data in a data hub.
• Collaborative Master Data Management:
– It uses a process to create and maintain the master data associated with meta data.
– It allows users to author the master data objects.
– The collaborative process involve cleaning and updating operations to maintain the
accurate master data.
19

Information Scope or Data Domain Dimension
• It deals with primary data domain managed by the MDM solution.
• The different domains of MDM are customer data domain using customer data
integration, product data domain using product information management and
organisation data domain using organisation information management.
• Architectural Implications:
– Privacy and security concern put risk on the given data domain.
– Difficult to acquire and manage external references to entry.
– Complex design for entry resolution and identification.
• Assessing and understanding the present mechanism for MDM different forms of data
governance, data quality and architectural management, metadata and other data
integration mechanism are essential for choosing a suitable MDM solution for any
organisation.
20

Steps to implement MDM solution
1. Discovery: This step includes identifying data source, defining metadata, modelling
business data, documenting process for data utilisation.
2. Analysis: This step includes defining rules for transforming and evaluating the
dataflow, identifying data stewards, refining and defining metadata and data quality
requirement for master data.
3. Construction: MDM database is constructed as per the MDM architecture.
4. Implementation: This step include gathering the master data and its meta data
according to the subject or domain, configuring success rights, reviewing the quality
levels of the MDM and deciding rules and policies for change management process.
5. Sustainment: The MDM solution should be designed in such a way that it sustains the
internal iteration of changes done internally to the system along with parallel
deployment of the similar iteration until the whole MDM solution is used.
21

MDM Reference Architecture
• The MDM reference architecture is an abstraction of technical solutions to particular
problems domain.
• It has a set of services, components and interfaces arranged in functional layers.
• Each layer provides services to layers above it and consumes services from layers below.
• It provides a detailed architectural information in a common format such that solutions
can be repeatedly designed and deployed in a consistent, high-quality, supportable
fashion.
• The MDM reference architecture has five layers:
– Service layer
– Data quality layer
– Data rule layer
– Data management layer and
– Business process layer
22

Layer 1: Service Abstraction Layer
 The service abstraction layer is responsible for providing system level services to layer above
it service event management, security management, transaction management, state
management, synchronization and service orchestration.
Layer 2 : Data Quality Layer
• This layer is responsible for maintaining data quality using various services.
• The services of this layer are designed to validate the data quality rules, resolve entity
identification, and perform data standardization and reconciliation.
• The other services provided by this layer are data quality management, data transformation,
Guid management and data reporting.
Layer 3 : Data Rule Layer
• The data rule layer includes key services driven by business-defined rules for entity resolution,
aggregation, synchronization, privacy and transformation.
• The different rules provided by this layer are synchronization rules, aggregation rules,
visibility rules and transformation rules. 24

Layer 4: Data Management Layer
• This layer is responsible for providing many services for data management.
• It is composed of authoring service for creating, managing and approving definitions of master
data, interface service for publishing consistent entry point to MDM services, entity resolution
service for entity recognition and identification, search service for searching the information on
the MDM data hub and metadata management service for creating, manipulating and maintaining
metadata of the MDM data hub.
Layer 5 : Business Process Layer
• The business process layer deals with management activities.
• It is composed of various management services such as contact management, campaign
management, relationship management and document management.
• Business considerations include management style, organizational structure and governance.
• Technical considerations include vendor affinity policy, middleware architecture and modelling
capabilities.
25

Layer 5 : Business Process Layer
• MDM should seamlessly integrate with the existing infrastructure such as DW’s, enterprise performance
management(EPM), BI systems to manage the master data across the enterprise for furnishing the right
information to the right entity at the right time.
• DM solution has to support data governance. Data governance defines quality rules, access rights, data
definition and standards.
• MDM architecture addresses multiple architectural and management concerns as follows:
– Creation and management of the core data stores
– Management of processes that implement data governance and data quality
– Metadata management
– Extraction, transformation and loading of data from sources to target
– Backup and recovery
– Customer analytics
– Security and visibility
– Synchronization and persistence of data changes
– Transaction Management
– Entity matching and generation of unique identifiers 26

Privacy, Regulatory Requirements and Compliance
• Regulations define rules for protecting consumers and companies against poor
management of sensitive data or information.
• Compliance implies standards to conform the specification, policy or law.
• The adaptation of regulations and compliance requires better IRM.
1. The Sarbanes-Oxley Act
• The Sarbanes-Oxley Act was introduced in 2002 to address the business risk
management concerns and their compliance.
• SOX intended to address issues of accounting fraud by attempting to improve both the
accuracy and reliability of corporate disclosures.
• It was developed to make corporate reporting much more transparent to the
consumers.
27

• The act mandates the company’s CEO or CFO to prepare a quarterly or annual report to be
submitted to the government, agreeing to the following requirements:
– The report has been reviewed by the CEOs and CFOs.
– The CEOs and CFOs are responsible for maintaining any non-disclosure information.
– The report does not contain any untrue or misleading information.
– Financial information should be fairly presented in the report.
– The report can be disclosed to the company’s audit committee and external auditors to find out
significant deficiencies and weakness in Internal Control over Financial Reporting (ICFR).
– Each annual report must define the management’s responsibility for establishing and managing
ICFR.
– The report should specify a framework for the evaluation of ICFR.
– The report must contain the management’s assessment of ICFR as of the end of the company’s
fiscal year.
28

• The act mandates the company’s CEO or CFO to prepare a quarterly or annual report to be
submitted to the government, agreeing to the following requirements:
– The report should state that the company’s external auditor has issued an attestation report on
the management’s assessment.
– The companies have to take certain actions in the event of change in control.
– The management’s internal control assessment should be reported by the company’s external
auditors.
– The company should evaluate controls designed to prevent or detect fraud, including
management override of controls.
– The company should perform a fraud risk assessment.
– The report should conclude on the adequacy of internal control over financial reporting.
• Both the management and independent auditors are responsible for performing their assessment in
the context of risk assessment, which requires the management to use both the scope of its
assessment and evidence gathered on risk.
29

1. The Sarbanes-Oxley Act - Advantages
• Reduction of financial statement fraud
• Strengthening corporate governance
• Reliability of financial information
• Improving the liquidity
• Model for private and non-profit companies
30

2. Gramm-Leach-Bliley Act
• The Gramm-Leach-Bliley Act (GLBA), also named as the Financial Modernization Act of 1999, was signed into
law on November 12, 1999.
• It includes protection of non-public information, personal information, obligation with respect to disclosure of
customer’s personal information, disclosure of organization’s privacy policy and other requirements.
• Section 501 of GLBA defines the data protection rules and safeguards designated to ensure that security and
confidentiality of user’s data, protect against unauthorized access and protect against any threats or hazards
to security or integrity of data.
• According to the GLB Act,
• Every financial institution has an affirmative and obligation to respect the privacy of their customers.
• The financial institutions have to protect the security and confidentiality of customers’ non-public
information.
• The financial institutions should protect against the unauthorized access of confidential information,
which could result in substantial harm and inconvenience to the customers.
• The non-public personal information (NPI) means PII provided by the customer to the financial
institution that can be derived from any transaction with the customer or any service performed with
the customer.
31

3. Health Information Technology and Health Insurance Portability and
Accountability Act
• In 2009, the US government invented a federal privacy/security law to protect the
patient health information called Health Insurance Portability and Accountability Act
(HIPAA).
• This act is applicable to the insurance agencies, healthcare providers and healthcare
clearing houses that transmit health information of patients in an electronic form in
connection with a transaction.
• HIPAA specifies many PHI identifiers including the name of patient, phone number, fax
number, email address, social security number, medical record number, date of birth,
etc., which should not be disclosed by any organization.
• In case of disclosure, the organization can be penalized under the HIPAA act.
• HIPAA mandates encrypting patients’ health information stored on data store or while
transmitted over the internet.
32

4. USA Patriot Act
• The Providing Appropriate Tools to Restrict, Intercept and Obstruct Terrorism (Patriot) act in USA
provides a number of legislations that deal with the issue of money laundering.
• The US government has enforced the anti-money laundering (AML) and know your customer
(KYC) provisions in the Patriot act.
• The Patriot act requires information sharing among the government and financial institutions,
verification of customers’ identity programs and implementation of money laundering programs
across financial services industries.
• The requirements of the USA Patriot act are as follows:
– Development of policies and procedures related to anti-money laundering.
– Establishment of training programs for employees of financial institutions.
– Designation of a Compliance Officer.
– Establishment of corporate audit.
33

4. USA Patriot Act
• The requirements of the USA Patriot act are as follows:
– Identification of private bank accounts of non-citizens to keep track of the owner and source
of funds.
– Procedure for knowing an organization’s customers when opening and maintaining accounts.
– Information sharing by financial institutions to concerned security agencies on potential
money-laundering activities with other institutions to facilitate government action.
• The USA Patriot act requires banks to check a terrorist list provided by financial crimes
enforcement network (FinCEN) using technical capabilities.
• They use tools and logs such as workflow tool to facilitate efficient compliance procedure,
analytical tool to support the ongoing detection of hidden relationship or transactions by customer
and full audit trails used for outside investigation.
34

5. Office of the Comptroller of Currency 2001-47
• The Office of the Comptroller of Currency (OCC) has defined rules for financial institutions that
plan to share their sensitive data with unaffiliated vendors. The OCC makes an organization
responsible for non-compliance, even if breaches in security and data privacy are caused by
outsiders.
• The requirements of the OCC are as follows:
– Perform risk assessment to identify the organization’s needs and requirements.
– Implement a core process to identify and select a third-party provider.
– Define the responsibilities of the parties involved.
– Monitor third parties and their activities.
– Financial institutions should take appropriate steps to protect the in-house sensitive data that
it provides to outside service providers, regardless of their access.
– The management should implement rigorous analytical process to identify, monitor, measure
and establish controls to manage risks associated with third-party relationship.
35

6. Base II Technical Accord
• The Basel Committee on Banking Supervision was established by the Central Bank
Governors of the G10 countries in 1974.
• It was developed to ensure that banks operate in a safe and sound manner, and they
hold sufficient capital and reserves to support the risks that arise in their business.
• The Basel II accord used a three pillar concept, where pillar I expresses the minimum
capital requirement, pillar II is based on supervisory review which allows supervisors
to evaluate a bank’s assessment of its own risks and determine whether the assessment
is reasonable and finally pillar III is market discipline which is needed for the effective
use of disclosure to strengthen the market discipline as a complement to supervisory
efforts.
36

7. Federal Financial Institutions Examination Council Compliance and
Requirement
• The Federal Financial Institutions Examination Council (FFIEC) issued a guidance on customers’
authentication for online banking service.
• According to FFIEC, the authentication techniques provided by financial institution should be
appropriate and accurate such hat the associated risk is minimum.
• It involves two methods:
– Risk assessment and
– Risk-based authentication
• The requirements of FFIEC are follows:
– The multifactor authentication should be provided for high-risk transactions.
– Monitoring and reporting capability should be embedded into an operational system.
– Implementation of a layered security model.
– Strengthen of authentication should be based on the degree of risk involved.
– The reverse authentication should be tested to ensure that the customer is communicating with the
right institution rather than a fraudulent site.
37

8. California’s SB1386
• It states that the companies dealing with the data of California state residents can
disclose any breach of security of the system following the discovery or notification of
the breach in the security of the data.
9. SB1386
• The committees of sponsoring organization of trade way commission (COSO)
established a framework for the effectiveness and compliance with SOX act.
• According to COSO, companies would require to identify and analyse risks, establish a
plan to mitigate the risks and have well-defined policies to ensure that the management
objectives are achieved and risk mitigation strategies executed.
38

10. Other Regulatory Compliances
• Opt-out legislation allows financial institutions to share or sell customer’s data freely
to other companies unless and until the customer informs them to stop.
• Opt-in, which prohibits financial institutions from sharing or selling customer’s data
unless the customer agrees to allow such actions.
• The National DNC (Do Not Call) registry is a government organization that maintains
and protects registry of individual’s phone numbers. The owners of organization have to
make this registry unavailable for telemarketing activities.
39

Implications of Data Security and Privacy Regulations on Master Data Management
• Most of the regulations are related to privacy of customers’ information for financial institutions.
• But the regulations such as SOX, GLB and others enforce the implications on MDM and other data
management solutions in terms of architecture and infrastructure.
• Therefore, MDM should have the following implication requirements:
– It should support policy, role-based and flexible multifactor authorization.
– It should support real-time analysis and reporting of customer’s data.
– It should support event and workflow management.
– It should support data integrity and confidentiality with intrusion detection and prevention
solutions.
– It should have the ability to protect the in-transit data over the network managed by MDM.
– It should have auditability feature for user’s transactions and data.
– It should have the ability to provide details about personal profiles and financial data to
authorized users only.
– It should support layered framework for security.
40

Data Governance
 The purpose of master data is to ensure data quality
using consistency and accuracy and using a set of
guidelines defined by MDM.
 Data Governance specifies the framework for
decision rights and accountabilities to encourage a
desirable behaviour in the use of data.
 To promote a desirable behaviour, Data
Governance develops and implements data
policies, guidelines and standards that are
consistent with the organisations mission, strategy,
values, norms and culture.
 The DATA GOVERNANCE INSTITUTE (DGI) is
an independent organisation that works on data
governance and defines the standards, principles
and framework for data governance. 41

Data Governance
 According to DGI, Data Governance can be defined as “A system of decision
rights and accountabilities for information-related processes, executed according to
agreed-upon models which describe who can take what actions with what
information and when under what circumstances, using what methods”.
 As per IBM data governance council, “Data Governance is the quality control
discipline for accessing, managing and protecting the organization’s data”.
 Data Governance has the capability of decision making on managed data with
minimum cost, less complexity, managed risk, ensuring compliance with legal and
regulatory requirements.
 Data Governance is needed for creating data quality standards, metrics and
measures for delivering quality data to the customer applications.
42

Data Governance
Goals of Data Governance
• Enable better decision making.
• Reduce operational friction.
• Protect the needs of data stakeholders.
• Train management and staff to adopt common approaches to data issues.
• Build standards, repeatable processes.
• Reduce costs and increase effectiveness through coordination of efforts.
• Ensure transparency of processes
43

Data Governance
Categories defined by IBM Data Governance Council
• Organizational Structure and Awareness
• Stewardship
• Policy
• Value Generation
• Data Risk Management and Compliance
• Information Security and Privacy
• Data Architecture
• Data Quality Management
• Classifications and Metadata
44

Data Governance
Data Governance Maturity Model
• Level 1: Initial – Ad hoc operations that rely on
individuals’ knowledge and decision making.
• Level 2: Managed – Projects are managed but lack
cross-project and cross-organizational consistency
and repeatability.
• Level 3: Defined – Consistency in standards across
projects and organizational units is achieved.
• Level 4: Quantitatively Managed – The
organization sets quantitative quality goals leveraging
statistical/quantitative techniques.
• Level 5: Optimizing – Quantitative process
improvement objectives are firmly established and
continuously revised to manage process
improvement. 45

Data Governance
Three Phases of Data Governance
 Initiate Data Governance Process
 It includes series of activities for data management and data quality
improvement that involves elimination of duplicate entries and creation of
linking and matching keys.
 As the data hub is attached to the integrated data management environment, the
data governance process defined the mechanism for creating and maintaining
the cross reference information using metadata.
 Selection and Implementation of Data Management and Data Delivery
Solutions
 It involves the selection and implementation of data management tools and data
delivery solutions for the MDM solution regardless of design patterns.
46

Data Governance
Three Phases of Data Governance
 Facilitate Auditability and Accountability
 Auditability is to provide a complete record of data access by means of audit
records.
 Auditability helps achieve compliance by means of audit records.
 Accountability provides a record of several data governance roles within the
organisation including data owners and data stewards.
 The data owners are those individuals or groups who have significant control over
data that is they can create, modify or delete data.
 The data stewards work with data architects, owners, database administrators to
implement usage policies and data quality metrics.
47

Data Synchronization
 Data synchronization is a mater-slave activity that needs to be done periodically
when data contents at the master site changes as per business requirement.
 In MDM, the data hub is the master of some or all attributes of entities where
synchronization flows from data hub towards other system components.
 There is no clear master role assigned to the data hub for attributes and entities.
Thus, attributes in the data hub need to be shared among all the entities, which
need complex business rules and reconciliation logic.
 For example, suppose a customer’s database has a non-key attribute contact number
residing in legacy customer information file (CIF) of the CRM system and also in
the data hub where it is used for matching and linking records.
 The problem in shared environment arises when customer changes his/her
contact number through the online portal that updates CIF, customer contact
service centre and informs the customer representative to update his/her number.
48

 The customer representative uses CRM application to update his/her profile but
mistypes the number. As a result, CIF and CRM now contain different information.
 When both systems send their changes are received simultaneously, then the data
hub needs to decide which information is correct or should take precedent before
changes are applied to the hub.
 If changes arrive at a certain interval, then the data hub needs to decide which
change needs to be overridden, first or second.
 This scenario is extended if a new application uses the information from the data
hub, which may receive two copies of the change record and has to decide which
one applies.
 Therefore, there is a need of conceptual data hub components that can perform
data synchronization and reconciliation actions in accordance with business rules
enforced by the business rule engine (BRE).
49

 BRE is a software responsible for managing and executing business rules in
runtime environment.
 To detect inconsistencies in data, BRE uses various rules sets.
 The rule set is a collection of rules that are applied to events for detecting
inconsistencies.
 In the context of MDM data hub synchronization, BRE provides certain rule that
define how to reconcile the conflicts.
 The BRE is composed of four components. The rule engine is responsible for
enforcing and executing rules, business rules repository that stores the business
rules in the database defined by users, query and reporting components that allows
users and administrators to query and report on existing rules and business rules
designers who provide user interface that allows users to define, design and
document the business rules.
50

 The types of rules provided by BRE can be interface rules or reaction rules.
 The interface rules are executed by interface engine, which supports complex
rules requiring an answer to be inferred based on conditions and parameters.
 Reaction rules engine evaluates reaction rules automatically in the context of
events. It provides automatic reactions in the form of feedback or alert to the
designated users.
 The advanced BRE supports conflict detection, resolution and simulation of
business rules.
51

Data Quality Management
 The data in the MDM hub is collected from different internal and external
sources, so there is a need to maintain data quality effectively and efficiently.
 Managing the data with low quality is always a challenge. When data quality is
poor, then matching and linking records will result in low accuracy and produce
unacceptable number of false negative or false positive outcomes.
 The data quality management is a task for managing and maintain good quality
data by cleansing poor quality data using various tools that can be provided to
different systems.
 The key challenge of data quality management is unclear and incomplete sematic
definitions with timeless requirements.
 These semantic definitions are stored in metadata repository.
 There is a need for different approaches for measurement and improvement of data
quality and to resolve the semantics stored in the different metadata repository.
52

Data Quality Process
At high level, MDM approaches data quality by defining two key continuous processes:
 MDM benchmark development: The creation and maintenance of the data quality
Benchmark Master. Eg: a benchmark or high-quality authoritative source for
customer, product, and location data. The MDM benchmark also includes the
relationships between master entities.
 MDM benchmark proliferation: Proliferation of the benchmark data to other
systems, which occur through the interaction of the enterprise systems with the
MDM Data Hub via messages, Web Service calls, API calls, or batch processing.
53

Data Quality Improvement Lifecycle
54

 To maintain data quality, different tools can be used to perform a series of
operations such as cleaning, extracting, loading and auditing the existing data
stored on the data hub to target environment.
 The different data Quality management tools are as follows:
 Data cleansing tool: Maps the data from the data source to set of the business
rules and domain constraints stored in the metadata repository. The cleansing tool
improves the data quality and adds new accurate contents to make it meaningful.
 Data parsing tool: Used to decompose the records into parts and are formatted
into consistent layouts based on standards that can be used in subsequent steps.
55

 Data profiling tools: Used for discovering and analyzing the data quality. They
enhance the accuracy and correctness of data by finding patterns, correcting the
missing values, character sets and other characteristics of incomplete data values.
They are also used to identify the data quality issues and thus, generate a report.
 Data matching tools: Used for identifying, linking and merging related entries
within or across data sets.
 Data standardization tools: Used to convert data attributes in a canonical format
to a standard format, and are used by data acquisition process and target data hub.
 Data extract, transform and load (ETL) tools: Designed to extract the data from
a valid data source, transform the data from the input format to the target data store
format and loading the transformed data into a target data environment.
56

IT6701 Information Management - Unit III

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to IT6701 Information Management - Unit III

Similar to IT6701 Information Management - Unit III (20)

More from pkaviya

More from pkaviya (20)

Recently uploaded

Recently uploaded (20)

IT6701 Information Management - Unit III