Friday, 26 February 2016

Data Integrator Transforms Overview

 Data Integrator Transforms List



Hi Everyone,

SAP Data services is one of the finest ETL(Extract, Transform, Load) tools which  delivers a single enterprise-class solution for data integration, data quality, data profiling, and text data processing that allows you to integrate, transform, improve, and deliver trusted data to critical business processes.

SAP Data services Transforms are built-in system objects stored in repository, which are used whenever we want to transform data from source(s) to target(s).
The transforms can be found under Transforms tab of our Local object Library – which provides access to all repository objects (in-built or user built).

The transforms are majorly classified into four categories as below. Expanding each type we can see the list of transforms present in each category.

·          Data Integrator 
·          Data Quality
·          Platform
·          Text Data Processing

Data Integrator Transforms:

The list of data integrator transforms present are:

1.        Data transfer
2.        Date Generation
3.        Effective_Date
4.        Hierarchy_Flattening
5.        History_Preserving
6.        Key_Generation
7.        Map_CDC_Operation
8.        Pivot (Columns to Rows)
9.        Reverse Pivot (Rows to Columns)
10.     Table_Comparison
11.     XML_Pipeline

Let’s look into the detailed description of each of the transforms present under Data integrator category.

1) Data Transfer:

This transform writes the data from a source or the output from another transform into a transfer object and subsequently reads data from the transfer object.
The transfer type can be a relational database table or file.
We can use the Data_Transfer transform to push down operations to the database server when the transfer type is a database table.
We can also can push down resource-consuming operations such as joins, GROUP BY, and sorts using this transform.
Please go through the article ‘Data Transfer transform in SAP Data Services’ to know more details about this transform.

2) Date Generation:

This transform produces a series of dates incremented as we specify in the transform settings.
We use this transform to produce the key values for a time dimension target.
From this generated sequence we can also populate other fields in the time dimension (such as day_of_week) using functions in a query transform.
Please go through the article ‘Date Generation transform in SAP Data Services’ to know more details about this transform.

3) Effective Date:

This transform is used to calculate an “effective-to” value for data that contains an effective date.
The calculated effective-to date and an existing effective date produce a date range that allows queries based on effective dates to produce meaningful results.
Please go through the article ‘Effective Date transform in SAP Data Services’ to know more details about this transform.

4) Hierarchy Flattening:

This transform constructs a complete hierarchy from parent/child relationships, then produces a description of the hierarchy in vertically or horizontally flattened format which can be used to build start models in data warehouse environment.
Please go through the article ‘Hierarchy Flattening transform in SAP Data Services’ to know more details about this transform.

5) History Preserving:

The History_Preserving transform allows us to produce a new row in our target rather than updating an existing row.
We can indicate in which columns the transform identifies changes to be preserved.
If the value of certain columns change, this transform creates a new row for each row flagged as UPDATE in the input data set.
Please go through the article ‘History Preserving transform in SAP Data Services’ to know more details about this transform.

6) Key Generation:

This transform is used to generate new keys for new rows/records in a data set.
When it is necessary to generate artificial keys in a table, the Key_Generation transform looks up the maximum existing key value from a table and uses it as the starting value to generate new keys.
The transform expects the generated key column to be part of the input schema.
Please go through the article ‘Key Generation transform in SAP Data Services’ to know more details about this transform.

7) Map CDC Operation:

Using its input requirements (values for the Sequencing column and a Row operation column), the Map CDC Operation transform performs three functions:
Sorts input data based on values in Sequencing column drop-down list and (optional) the Additional grouping columns box.
Maps output data based on values in Row operation column drop-down list. Source table rows are mapped to INSERT, UPDATE, or DELETE operations before passing them on to the target.
Resolves missing, separated, or multiple before- and after-images for UPDATE rows.
Allows you filter columns and view UPDATE rows prior to running the job.
While commonly used to support relational or mainframe changed-data capture (CDC), this transform supports any data stream as long as its input requirements are met. Relational CDC sources include Oracle and SQL Server.
This transform is typically the last object before the target in a data flow because it produces INPUT, UPDATE and DELETE operation codes. Data Services produces a warning if other objects are used.
Please go through the article ‘Map CDC Operation transform in SAP Data Services’ to know more details about this transform.

8) Pivot (Columns to Rows):

This transform creates a new row for each value in a column that we identify as a pivot column.
The Pivot transform allows us to change how the relationship between rows is displayed.
For each value in each pivot column, Data Services produces a row in the output data set. We can also create pivot sets to specify more than one pivot column.
Please go through the article ‘Pivot (Columns to Rows) transform in SAP Data Services’to know more details about this transform.

9) Reverse Pivot (Rows to Columns):

This transform creates one row of data from several existing rows.
The Reverse Pivot transform allows us to combine data from several rows into one row by creating new columns.
For each unique value in a pivot axis column and each selected pivot column, Data Services produces a column in the output data set.
Please go through the article ‘Reverse Pivot (Rows to Columns) transform in SAP Data Services’ to know more details about this transform.
Table Comparision:
This transform compares two data sets and produces the difference between them as a data set with rows flagged as INSERT, UPDATE, or DELETE.
The Table_Comparison transform allows us to detect and forward changes that have occurred since the last time a target was updated.
Note that in order to use the Table_Comparison transform with Teradata 13 and later tables as the comparison table and target table, we must do the following things:
On the Teradata server, set the General parameter DBSControl to TRUE to allow uncommitted data to be read.
In the Data Services Teradata datastore, add the following statement in the “Additional session parameters” field:
SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
Please go through the article ‘Table Comparision transform in SAP Data Services’ to know more details about this transform.
XML Pipeline:
This transform is used to process large XML files one instance of a repeatable structure at a time.
With this transform, Data Services does not need to read the entire XML input into memory then build an internal data structure before performing the transformation.
An NRDM structure is not required to represent the entire XML data input. Instead, the XML_Pipeline transform uses a portion of memory to process each instance of a repeatable structure, then continually releases and reuses memory to steadily flow XML data through the transform.
During execution, Data Services pushes operations of the XML_Pipeline transform to the XML source.
Please go through the article ‘XML Pipeline transform in SAP Data Services’ to know more details about this transform.
The below image represents all the transforms available as part of Data Integrator category.

The calculated effective-to date and an existing effective date produce a date range that allows queries based on effective dates to produce meaningful results.
Please go through the article ‘Effective Date transform in SAP Data Services’ to know more details about this transform.







1 comment: