A Gestão de Projetos
Por: thiago143 • 30/4/2018 • Projeto de pesquisa • 6.462 Palavras (26 Páginas) • 144 Visualizações
Master ETL Architecture Design
For
Vendor Report Card
Submitted to
Target Corporation
[pic 5]
Submitted By
[pic 6]
Change History
[pic 7]
This document is located in Share Point at the following URL:
The following lists important changes that have been made to this document:
Version | Date | Prepared by | Comments |
1.0 | 10-Jul-2006 | Wipro | |
Table of Contents
[pic 8]
1.0 Introduction 5
1.1 Scope 5
1.2 Key Contributors 5
1.3 ETL Design Considerations 5
2.0 ETL Process- Technical Design 5
2.1 Data Extraction 7
2.2 Data Transformation 7
2.2.1 ETL-Surrogate Key Process 7
2.2.1.1 Background 7
2.2.1.2 Purpose 7
2.2.1.3 Overall Process Flow for Surrogate Key Generation 8
2.2.1.4 Process to Generate Surrogate Keys for New Records 8
2.2.1.5 Control Tables for ETL Surrogate Keys 9
2.2.2
2.2.2.1 Approach 1: Process to Validate Foreign Keys and Business Codes 11
2.2.2.2 Approach 2: Process to Validate Foreign Keys and Business Codes 13
ETL Process 13
2.3 Data Loading 14
3.0 Scheduling 14
4.0 Restart and Recovery 15
4.1 Generic Restart Approach for Known Issues 16
5.0 Data Integrity 18
5.1.1 Background 18
5.1.2 Purpose 18
5.1.3 Scope 18
5.1.4 Process Flow 19
5.1.5 Process Details 19
5.1.6 Implementation Strategy 20
5.1.7 Datastage Export File 21
5.1.8 Assumption 21
6.0 User View Management 21
6.1.1 Approach-I 21
6.1.2 Approach-II 21
6.1.3 Implementation Strategy 21
6.1.4 Table Structures 22
7.0 Job Parameters 23
7.1.1 Job Parameters Tables 24
7.1.2 Table Structures 24
8.0 Data Purge 25
9.0 Partitioning Strategy 25
9.1.1 Database Partitioning 25
9.1.2 ETL Partitioning 26
10.0 Environment 26
11.0 Acronomy and Glossary 26
12.0 Appendix 26
12.1 Data Integrity 26
12.2 Data Model for Data Integrity Audit Table 27
Introduction
The purpose of this document is to present the general design principles and technical approach for capturing, transforming and loading data into the VRC using the Ascential Datastage ETL tool.
Each release will follow identical (or very similar) processes as data is extracted from differing Sources and transformed into uniformly identified records available for further processing including standardized keys which allow tracking information back to original sources as necessary.
...