Data Warehouse Audits Promote and Sustain Reporting System Value
By Brian Thomas, CISA, CISSP
ata warehouses and their related reported capabilities deliver immense strategic value. Data warehouses amass and organize vast volumes of detailed information from systems throughout the company. These massive data sets support data mining analytics that generate insightful reports for evaluating market segments, productivity, profits, sales and other elements critical to organizational success. This capability enables managers to identify emerging opportunities or underlying causes for concern. An international retailer, for example, operates thousands of stores, and each store stocks thousands of different items. Cultural factors affect overall sales at various locations, as well as sales of individual items. A variety of other factors, including climate, local customer demographics, and nearby pedestrian and vehicle traffic, also influence sales. Data warehouses capture enormous amounts of sales data from all of these locations. Using data mining tools that gather and analyze the most relevant information, data warehouses can generate concise reports and graphs that depict the overall sales trends and patterns within different countries, regions or cities; merchandize items popular at all stores; and allow for differing sales characteristics among diverse locations. These reports drive decisions in marketing, purchasing, distribution and other functions, and enable the retailer to effectively respond to local customer preferences while retaining strong brand identity and enhancing overall efficiency and company performance. Reliance on such critical high-level reports, however, also highlights the importance of evaluating and monitoring data warehouses themselves. The best data mining and analytical functions are useless without accurate, reliable underlying data. Data warehouses contain critical and proprietary information. Acute potential risks accompany vulnerabilities to data manipulation or unauthorized access. Data warehouses need vast amounts of relevant data to produce valid reports. That requires continually replacing outdated data with newer information. Any changes in upstream data sources or the data warehouse itself require thorough analysis, testing and documentation. Recognizing the fundamental characteristics of an effective data warehouse helps companies understand aspects that require evaluation during its development and ongoing operations.
They combine data from multiple upstream systems, which enables managers to standardize data structures and provides a means for analysis across multiple systems. Data warehouse reports typically take a companywide view and address issues that affect the entire organization. While a data mart—a data warehouse with a narrower scope—focuses on a particular department, business unit or market segment, its reports take a much broader perspective than those produced by an operational application or database. Data mining tools for extracting and analyzing supporting information consume considerable processing power. Locating the data warehouse on a separate server lets managers generate reports without straining the IT systems used for operational purposes. Segregating the data warehouse serves other purposes, too. Data warehouses provide a centralized location for storing historical data. This frees storage capacity on production servers and lets managers archive valuable data that reside on legacy systems or hardware scheduled for retirement. Maintaining the data warehouse on a separate system also enables auditors to more clearly define their evaluation scopes. Data warehouses store historical information in read-only format. Since the data are used for analytical functions, they do not need to be transformed to support live operational processes. The read-only format also protects crucial information from data manipulation. Data warehouses deploy “extract, transform, load” (ETL) technology to harvest data from upstream systems. This involves data mapping, as well as scrubbing for inaccurate, repetitive or inconsistent data entries. While data warehouses require immense data sets to generate useful reports, information continually becomes obsolete and loses its importance as newer data enter the warehouse. To reduce IT storage costs and sustain optimal performance, secure disposal policies provide direction for removing older, unnecessary data. Throughout the data warehouse’s operations, defined user provisions and restrictions assure that various levels of data access and reporting capabilities remain limited to individuals whose organizational responsibilities require such access. Though smaller in scope, data marts merit the same audit considerations during development. Such examination increases the likelihood that they will perform effectively, efficiently and securely.
Data Warehouse Characteristics
Data warehouses produce information for decision-making purposes, rather than for executing operational processes.
Auditing Data Warehouse Development
Proper development provides a foundation for effective data warehouse operations and reduces future potential risk.
Development audits should address data warehouse planning, security policies ensure that such information receives secure architecture, implementation, operations and creation of any and proper disposal once it outlives its purpose. data marts. During planning, senior management support provides Auditing an Existing Data Warehouse strong evidence of sufficient resources and high priority for While auditing during development reduces the potential data warehouse development. With any IT system, operational for difficulties, all data warehouses require periodic auditing. difficulties and potential risks arise when actual uses and Such auditing evaluates adherence to existing policies and users do not align with original intentions. Participation in controls, and examines the effectiveness of these policies and planning from all departments that supply data and expect to controls. This examination illuminates vulnerabilities benefit from the data warehouse’s reporting capabilities helps requiring mitigation. Audit areas include security, data mitigate such an audit concern. integrity, change management and testing of reports. Project management methodology imposes discipline and Data-security auditing begins with identifying and documentation on development efforts. Such methodology classifying the types of information in the data warehouse. reduces risks associated with ad hoc planning and Data warehouses may hold sensitive corporate financial data, undocumented activity. A development audit evaluates Social Security numbers, national ID numbers, bank account whether planning incorporates sound methodology. numbers, other personal information for customers or System architecture concerns address whether to develop a employees, or patient treatment records for healthcare two-layer or three-layer data warehouse. A two-layer providers. In addition to evaluating overall security measures, architecture encompasses the upstream data sources as well as compliance with laws and regulations such as the derived, read-only historical data. The three-layer design US Health Insurance Portability and Accountability Act, includes an additional data reconciliation layer for scrubbing US Gramm-Leach-Bliley Act, US Sarbanes-Oxley Act of data and resolving inconsistencies that arise when various 2002, Basel II and other similar standards may require systems contribute identical data entered in different formats. particular controls for specific types of data. Such inconsistencies often arise between legacy and newer Data classification and their level of significance systems used for the same operational purposes. determine which individuals require access and their Development audits examine whether the proposed appropriate levels of access, based on work-related duties and architecture is appropriate for the intended purpose. For a responsibilities. A particular business unit, function, small organization or a business unit developing a data mart, department, location or segment of customers defines access assimilating data from disparate systems or resolving parameters. Access restrictions may also be based on rows, inconsistencies that arise from different data sources may not columns or tables for significant data sets. Similar be crucial issues. Two-layer architecture may work fine for considerations apply to setting report-function limitations. such entities. Upstream systems supply information to the data The data integrity risks associated with that architecture, warehouse, and these data transfers are crucial data integrity however, compound for much larger businesses seeking concerns. Software systems and data warehouses developed enterprise-level reports based on data from a diverse range of by the same vendor, such as SAP or PeopleSoft, incorporate systems and sources. The additional layer for scrubbing and tighter data integration and reduce opportunities for errors or reconciliation is necessary, with test reports evaluated to inconsistencies to arise. ensure that data conversions take place as expected. A custom data warehouse that consolidates data from a Development audits also evaluate the data warehouse’s range of software system sources, however, presents much intended online analytical processing greater potential for errors or data (OLAP) capabilities. These capabilities inconsistencies. Two disparate systems, for drive data mining, querying and reporting Changes in upstream systems example, may deploy differing state functions, and enable managers to review abbreviations—for example, based on US affect data and transfers charts, graphs, tables and other visual Postal Service or the Associated Press summations of company performance. to the data warehouse. styles—for address fields. Parts codes Proposed OLAP capabilities must meet supplied by two systems may use slightly reporting expectations and accommodate different alphanumeric sequences for those the data warehouse’s scope and capacity. field entries. Legacy system limitations may restrict lengths of Data warehouse development evaluations address current other entries in upstream sources. data storage and processing requirements as well as anticipated Such situations call attention to the importance of future needs. Audits also assess needs for data mirroring and evaluating the data mapping processes and mapping engine other means for recovering data following a disaster. used to transfer and scrub data. Critical issues include Development audits require examining proposed security whether data mapping occurs before or after data enter the measures, too. Planned access and reporting privileges must warehouse, what restrictions and maintenance apply to correspond to defined individual work roles, responsibilities mappings, whether or not data are encrypted before they are and user provisions. Safely storing and retrieving critical, sent to the warehouse, and management of the data engine. confidential information require establishing appropriate data IT elements exist in dynamic environments subject to encryption procedures. Applying read-only restriction to continual change, and change management practices must historical information prevents data manipulation. Additional address data warehouse operations. Changes in upstream
systems affect data and transfers to the data warehouse, and must be acknowledged. The data construct within the data warehouse must be documented so that changes to mappings or tables are properly understood. Established policies restricting modifications of reports ensure that users receive valid, accurate assessments. A data warehouse exists to generate reports, and audits must identify the most critical reports and the most crucial report attributes. A parallel testing environment enables auditors to test how critical reports behave under various conditions, and whether or not they generate the expected results. When performed regularly, such audits ensure that data warehouses remain secure, deliver accurate reports and add continual value to organizational decision making.
With such monitoring and oversight, organizations realize the benefits of data warehouse reporting capabilities and the enhanced performance potential that accompanies continual technological advancement.
Levine, M.; Performing a Data Warehouse Audit, Audit Serve Inc., 2006 Browder Edwards, K.; G. Lumpkin.; Security and the Data Warehouse; An Oracle White Paper, Oracle Corp., April 2005 Keay, E.; J. Tedesco.; “Keeping Your Data Warehouse Data Accurate,” DM Direct Newsletter, 7 January 2005 Gonzales, M.; The Data Quality Audit, Intelligent Enterprise, July 2004 Brink, D.; Auditing the Development of a Data Warehouse, ISACA Hong Kong Chapter, 24 January 2001, www.isaca.org.hk/document/cisa_slide/Auditing%20Data%20 Warehouse/ Warigon, S.; “Data Warehouse Control and Security,” Association of College and University Auditors LEDGER, vol. 41, no. 2, April 1997, p. 3-7 Miszczyk, J.; N. Harris.; P. Kocinski; J. Stice; K. Unger; DB2/400: Mastering Data Warehousing Functions, IBM Corporation, International Technical Support Organization, September 1998 Brian Thomas, CISA, CISSP is the IT advisory partner in the Houston, Texas, USA, office of Weaver and Tidwell LLP (www.weaverandtidwell.com). Thomas can be contacted at +1.713.800.1050.
Technological advances enable data warehouses to continually store greater amounts of information. Based on such immense volumes of data, data mining tools let managers generate reports that provide more accurate, more expansive assessments of organizational performance. These reports enhance decision-making skills, benefiting the entire organization. Understanding the fundamental characteristics of a data warehouse and auditing its development apply a proactive approach toward addressing potential difficulties that may arise once the system has been implemented. Continually auditing existing data warehouses highlights vulnerabilities requiring mitigation, as well as opportunities for improving the operation. Auditor attention also emphasizes the importance of regularly heeding established policies and processes for data warehouse use.