Data Warehouse

Published on December 2016 | Categories: Documents | Downloads: 79 | Comments: 0 | Views: 480
of 7
Download PDF   Embed   Report

Comments

Content

All DSS use DATA, INFORMATION, and/or KNOWLEDGE

DATA
1. Items about things, events, activities, and transactions

2. Unorganized, i.e do not convey any specific meaning 3. Numeric, Alfanumeric, Figures, Sounds or Images INFORMATION 1. Organized Data, Meaningful Data, Results KNOWLEDGE 1. Data items or Information that convey understanding, experience, accumulated learning, and expertise 2. Application of data and information in making a decision.

Need of two types of data:
1. 2. Current representation of state of the business Historical point-in time (snapshots) and prediction

A data warehouse is data management and data analysis Goal: is to integrate enterprise wide corporate data into a single reository from which users can easily run queries Def :- A DW is a subject oriented, integrated, time variant and non volatile collection of data in support of management’s decision making process (Immon) 1. Subject Oriented 1. Organized around major subjects, such as customer, product, sales. 2. Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing. 2. Integrated 1. 2. Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process. Constructed by integrating multiple, heterogeneous data sources 2.1. relational databases, flat files, on-line transaction records 3. Data cleaning and data integration techniques are applied. 3.1. Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources E.g., Hotel price: currency, tax, breakfast covered, etc. 3.2. When data is moved to the warehouse, it is converted. 3. Time Variant 1. The source data in the WH is only accurate and valid at some point in time or over some time interval. The time-variance of the data warehouse is also shown in the extended time that the data is held, the implicit or explicit association of time with all data, and the fact that the data represents a series of snapshots

2. The time horizon for the data warehouse is significantly longer than that of operational systems. 3. Operational database: current value data. 3.1. Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years) 4. Every key structure in the data warehouse contains an element of time, explicitly or implicitly but the key of operational data may or may not contain “time element”.

4.Non Volatile 1. 2. A physically separate store of data transformed from the operational environment. Operational update of data does not occur in the data warehouse environment. -Does not require transaction processing, recovery, and concurrency control mechanisms -Requires only two operations in data accessing: initial loading of data and access of data.

2.1.1. 2.1.2. 2.1.3.

Data Warehouse v/s Database
• Most databases place an emphasis on a single application, based on transactions. Data is analyzed within a single domain. Some of the separate units that may be comprised within a database include payroll or inventory. Each system will place an emphasis on one subject, and it will not deal with other areas. In contrast, data warehouses deal with multiple domains simultaneously. The data warehouse finds connections between multiple subject areas. This allows the data warehouse to show how the company is performing as a whole, rather than in individual areas. Another powerful aspect of data warehouses is their ability to support the analysis of trends. Data warehouses are not volatile, as compared to common database. • There are two types of data: operational data and decision support data. The purpose, format, and structure of these two data types are quite different. The operational data is generally placed in a relational database. In the relational database, tables are frequently used, and they may be normalized. Every time an item is sold to a customer by the company, a record must be made of it. As can be expected, this data will be updated on a frequent basis. To ensure the efficiency of the system, the data must be placed in a certain number of tables, and the tables must have fields. While this system may be highly efficient in an operational database, it is not conducive to queries. In this situation, decision support data is often useful. The managers within an organization will need information that is summarized at various degrees. Data warehouses have become more important in the Information Age, and they are a necessity for many large corporations, as well as some medium sized businesses. They are much more elaborate than a mere database, and they can find connections in data that cannot be readily found within most databases.



Conceptual modelling of DW
1. 2. 3. Represent facts and their properties Connect the temporal dimension to facts Represent objects, capture their properties and the associations among them

4. 5.

Record the associations between objects and facts Distinguish dimensions and categorize them into hierarchies

Characteristics of DW
Subject oriented Summarized Client/server Integrated Not normalized Time-variant (time series) Metadata Nonvolatile Web based (relational/multi-dimensional)

Real-time and/or right-time (active)

Generic DW Architectures
• Three-tier architecture 1. Data acquisition software (back-end) 2. The data warehouse that contains the data & software 3. Client (front-end) software that allows users to access and analyze data from the warehouse • Two-tier architecture … sometime there is only one tier?

Operational data stores (ODS)
A type of database often used as an interim area for a data warehouse • • • Oper marts :- An operational data mart. Enterprise data warehouse (EDW) :- A data warehouse for the enterprise. Metadata :- Data about data. In a data warehouse, metadata describe the contents of a data warehouse and the manner of its acquisition and use

The importance of managing meta-data(integration)
1. The integration of meta-data, that is ”data about data” 2. Meta-data is used for a variety of purposes and the management of it is a critical issue in achieving a fully integrated data warehouse 3. The major purpose of meta-data is to show the pathway back to where the data began, so that the warehouse administrators know the history of any item in the warehouse 4. The meta-data associated with data transformation and loading must describe the source data and any changes that were made to the data 5. The meta-data associated with data management describes the data as it is stored in the warehouse 6. The meta-data is required by the query manager to generate appropriate queries, also is associated with the user of queries

7. The major integration issue is how to synchronize the various types of meta-data use throughout the data warehouse. The challenge is to synchronize meta-data between different products from different vendors using different meta-data stores. 8. Two major standards for meta-data and modeling in the areas of data warehousing and component-based development-MDC(Meta Data Coalition) and OMG(Object Management Group)

data martīƒ  a subset of a data warehouse that supports the requirements of particular department or business function 1) The characteristics that differentiate data marts and data warehouses include: a) a data mart focuses on only the requirements of users associated with one department or business function b) data marts do not normally contain detailed operational data, unlike data warehouses c) as data marts contain less data compared with data warehouses, data marts are more easily understood and navigated A departmental data warehouse that stores only relevant data – Dependent data mart

A subset that is created directly from a data warehouse – Independent data mart

A small data warehouse designed for a strategic business unit or a department

Reasons for creating a data mart :1) To give users access to the data they need to analyze most often 2) To provide data in a form that matches the collective view of the data by a group of users in a department or business function 3) To improve end-user response time due to the reduction in the volume of data to be accessed 4) To provide appropriately structured data as ditated by the requirements of end-user access tools 5) Normally use less data so tasks such as data cleansing, loading, transformation, and integration are far easier, and hence implementing and setting up a data mart is simpler than establishing a corporate data warehouse 6) The cost of implementing data marts is normally less than that required to establish a data warehouse 7) The potential users of a data mart are more clearly defined and can be more easily targeted to obtain support for a data mart project rather than a corporate data warehouse project

Data Integration and the Extraction, Transformation, and Load (ETL) Process
• • Data integration :- Integration that comprises three major processes: data access, data federation, and change capture. Enterprise application integration (EAI):- A technology that provides a vehicle for pushing data from source systems into a data warehouse

• •

Enterprise information integration (EII):- An evolving tool space that promises real-time data integration from a variety of sources Service-oriented architecture (SOA):- A new way of integrating information systems Extraction, transformation, and load (ETL) process

ETL
• Issues affecting the purchase of and ETL tool – Data transformation tools are expensive – Data transformation tools may have a long learning curve Important criteria in selecting an ETL tool – Ability to read from and write to an unlimited number of data sources/architectures – Automatic capturing and delivery of metadata – A history of conforming to open standards – An easy-to-use interface for the developer and the functional user



Benefits of DW
Direct benefits of a data warehouse
1. Allows end users to perform extensive analysis 2. Allows a consolidated view of corporate data 3. Better and more timely information 4. Enhanced system performance 5. Simplification of data access

• Indirect benefits of data warehouse
1. Enhance business knowledge 2. Present competitive advantage 3. Enhance customer service and satisfaction 4. Facilitate decision making

5.

Help in reforming business processes

Data Warehouse Development
Data warehouse development approaches

a) Top-down approach (EDW approach) i) DW is the center of the analytic environment

ii) Provides an integrated, flexible architecture to support analytic data structures b) Data mart approach (bottom-up) i) Goal: to deliver business value quickly by deploying multidimensional Data Marts, which are later organized into DW

c) Hybrid approach i) Blend of 1st two approaches.

d) Federation approach i) Creates and maintain logical view of a single warehouse

ii) It involves integrating disparate systems

Data warehouse structure
The Star Schema vs. Relational • Star Schema – – • Based upon the concept of DIMENTIONAL MODELING Dimensional modeling is a retrieval based model that supports high volume of query access

Data cube :- A 2D, 3D, or higher-dimensional object in which each dimension of the data represents a measure of interest

DW Structure: Star Schema (a.k.a. Dimensional Modeling)
A Star Schema contains: 1. A central Fact Table which contains I. II. III. the attributes needed to perform decision analysis Descriptive attributes for query reporting Foreign keys to link to dimension tables

2. Dimension Tables which surround the fact table and are linked via foreign keys. 3 Contain attributes that describe the data contained within fact table

Best Practices for Implementing DW
1. The project must fit with corporate strategy 2. There must be complete buy-in to the project 3. It is important to manage user expectations 4. The data warehouse must be built incrementally

5. Adaptability must be built in from the start 6. The project must be managed by both IT and business professionals (a business–supplier relationship must be developed) 7. Only load data that have been cleansed/high quality 8. Do not overlook training requirements 9. Be politically aware.

Risks in Implementing DW
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. No mission or objective Quality of source data unknown Skills not in place Inadequate budget Lack of supporting software Source data not understood Weak sponsor Users not computer literate Political problems or turf wars Unrealistic user expectations Architectural and design risks Scope creep and changing requirements Vendors out of control Multiple platforms Key people leaving the project Loss of the sponsor Too much new technology Having to fix an operational system Geographically distributed environment Team geography and language culture

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close