Friday 26 February 2016

Text data Processing Transforms Overview

Text data Processing Transforms:

The list of Text data processing transforms present are:

·         Entity_Extraction

Lets look into the detailed description of each of the transforms present under Text data Processing category.
Entity Extraction:
The Entity Extraction transform performs linguistic processing on content by using semantic and syntactic knowledge of words.
We can configure the transform to identify paragraphs, sentences, and clauses and it can extract entities and facts from text.
Typically, we use the Entity Extraction transform when we have text with specific information we want to extract and then use in downstream
 analytics and applications.
Please go through the article ‘Entity Extraction transform in SAP Data Services’ to know more details about this transform.

Platform Transforms Overview


Platform Transforms List & Overview

The list of Platform transforms present are:

1.       Case
2.       Data_Mask
3.       Map_Operation
4.       Merge
5.       Query
6.       Row_Generation
7.       SQL
8.       Validation

Lets look into the detailed description of each of the transforms present under Platform category.

1. Case:

This transform specifies multiple paths in a single transform (different rows are processed in different ways).
The Case transform simplifies branch logic in data flows by consolidating case or decision making logic in one transform. Paths are defined in an expression table.
Please go through the article ‘Case transform in SAP Data Services’ to know more details about this transform.

2. Data_Mask:

The Data Mask transform enables us to protect personally identifiable information in our data.
Personal information includes data such as credit card numbers, salary information, birth dates, personal identification numbers, or bank account numbers.
We may want to use data masking to support security and privacy policies, and to protect our customer or employee information from possible theft or exploitation.
Please go through the article ‘Data Mask transform in SAP Data Services’ to know more details about this transform.

3. Map Operation:

This transform modifies data based on mapping expressions and current operation codes. The operation codes can be converted between data manipulation operations.
Writing map expressions per column and per row type (INSERT/UPDATE/DELETE) allows us to perform:
Change the value of data for a column.
Execute different expressions on a column, based on its input row type.
Use the before_image function to access the before image value of an UPDATE row.
Please go through the article ‘Map Operation transform in SAP Data Services’ to know more details about this transform.

4. Merge:

This transform combines incoming data sets, producing a single output data set with the same schema as the input data sets.
Please go through the article ‘Merge transform in SAP Data Services’ to know more details about this transform.

5. Query:

The Query transform retrieves a data set that satisfies conditions that we specify.
A Query transform is similar to a SQL SELECT statement.
Please go through the article ‘Query transform in SAP Data Services’ to know more details about this transform.

6. Row generation:

This transform produces a data set with a single column.
The column values start with the number that we set in the ‘Row number starts’ at option. The value then increments by one to a specified number of rows.
Please go through the article ‘Row Generation transform in SAP Data Services’ to know more details about this transform.

7. SQL:

This transform performs the indicated SQL query operation. Use this transform to perform standard SQL operations when other built-in transforms cannot perform them.
The options for the SQL transform include specifying a datastore, join rank, cache, array fetch size, and entering SQL text.
Note:The SQL transform supports a single SELECT statement only.
Please go through the article ‘SQL transform in SAP Data Services’ to know more details about this transform.

8.Validation:

The Validation transform qualifies a data set based on rules for input schema columns.
We can apply multiple rules per column or bind a single reusable rule (in the form of a validation function) to multiple columns.
The Validation transform can identify the row, column, or columns for each validation failure. We can also use the Validation transform to filter or replace (substitute) data that fails our criteria.
When we enable a validation rule for a column, a check mark appears next to it in the input schema.
Please go through the article ‘Validation transform in SAP Data Services’ to know more details about this transform.

10. XML Map:

The XML_Map transform is a data transform engine designed for hierarchical data. It provides functionality similar to a typical XQuery or XSLT engine.
The XML_Map transform takes one or more source data sets and produces a single target data set. Flat data structures such as database tables or flat files are also supported as both source and target data sets.
We can use the XML_Map transform to perform a variety of tasks. For example:
We can create a hierarchical target data structure such as XML or IDoc from a hierarchical source data structure.
We can create a hierarchical target data structure based on data from flat tables.
We can create a flat target data set such as a database table from data in a hierarchical source data structure.
XML_Map transform works in two modes- Normal and Batch mode.
In normal mode, data is handled on a row by row basis before sending it to the next transform.
In batch mode, data is handled as block of rows, before sending it to the next transform.
There are different transform icons to indicate each mode.
Please go through the article ‘XML Map transform in SAP Data Services’ to know more details about this transform.

Data Quality Transforms Overview


Data Quality Transforms Overview/List


The list of data quality transforms present are:

1.        Associate
2.        Country_ID
3.        Data_Cleanse
4.        DSF2_Walk_Sequencer
5.        Geocoder
6.        Global_Address_Cleanse
7.        Global_Suggestion_List
8.        Match
9.        USA_Regulatory_Address_Cleanse
10.     User_Defined

Lets look into the detailed description of each of the transforms present under Data Quality category.

1 Associate:

The Associate transform  works downstream from Match transform to provide a way to combine, or associate, their match results by using the Match transform-generated Group Number fields.
We may need to add a Group Statistics operation to the Associate transform to gather match statistics.
You can combine the results of two or more Match transforms, two or more Associate transforms, or any combination of the two.
For example, we may use one Match transform to match on name and address, use a second Match transform to match on SSN, and then use an Associate transform to combine the match groups produced by the two Match transforms.
Please go through the article ‘Associate transform in SAP Data Services’ to know more details about this transform.

2. Country ID:

The Country ID transform parses our input data and then identifies the country of destination for each record.
After identifying the country, the transform can output the country name, any of three different ISO country codes, an ISO script code, and a percentage of confidence in the assignment.
Though we can use the Country ID transform before any transform in a data flow, we will probably find it most useful during a transactional address cleanse job.
Place the Country ID transform before the Global Suggestion List transform. The Global Suggestion List transform needs the ISO_Country_Code_2Char field that the Country ID transform can output.
It is not necessary to use the Country ID transform before the Global Address Cleanse transform in a data flow because the Global Address Cleanse transform contains its own Country ID processing.
It is also not necessary to use the Country ID transform before the USA Regulatory Address Cleanse transform because the input data should contain U.S. addresses only.
Please go through the article ‘Country ID transform in SAP Data Services’ to know more details about this transform.

3. Data Cleanse:

Use the Data Cleanse transform to parse and format custom or person and firm data as well as phone numbers, dates, e-mail addresses, and Social Security numbers.
Custom data includes operational or product data specific to the business.
The cleansing package we specify defines how our data should be parsed and standardized.
Within a data flow, the Data Cleanse transform is typically placed after the address cleansing process and before the matching process.
Please go through the article ‘Data Cleanse transform in SAP Data Services’ to know more details about this transform.

4. DSF2 Walk Sequencer:

To add walk sequence information to our data, include the DSF2 Walk Sequencer transform in the data flow. We can then send our data through presorting software to qualify for the following walk-sequence discounts:
Carrier Route
Walk Sequence
90% Residential Saturation
75% Total Active Saturation
DSF2 walk sequencing is often called “pseudo” sequencing because it mimics USPS walk sequencing.
Where USPS walk-sequence numbers cover every address, DSF2 walk sequence processing provides “pseudo” sequence numbers for the addresses only in that particular file.
Please go through the article ‘DSF2 Walk Sequencer transform in SAP Data Services’ to know more details about this transform.

5. Geocoder:

The Geocoder transform uses geographic coordinates expressed as latitude and longitude, addresses, and point-of-interest (POI) data. Using the transform, we can append addresses, latitude and longitude, census data (US only), and other information to the data.
Based on mapped input fields, the Geocoder transform has three modes of geocode processing:
Address Geocoding
Reverse Geocoding
POI textual search
Please go through the article ‘Geocoder transform in SAP Data Services’ to know more details about this transform.

6. Global Address Cleanse:

The Global Address Cleanse transform identifies, parses, validates, and corrects global address data, such as primary number, primary name, primary type, directional, secondary identifier, secondary number, locality, region and postcode.
Note:The Global Address Cleanse transform does not support CASS certification or produce a USPS Form 3553. If you want to certify your U.S. address data, you must use the USA Regulatory Address Cleanse transform, which supports CASS.
If we perform both address cleansing and data cleansing, the Global Address Cleanse transform typically comes before the Data Cleanse transform in the data flow.
Please go through the article ‘Global Address Cleanse transform in SAP Data Services’ to know more details about this transform.

7. Global Suggestion List:

The Global Suggestion List transform query addresses with minimal data, and it can offer suggestions for possible matches. It is a beneficial research tool for managing unassigned addresses from a batch process.
Global Suggestion List functionality is designed to be integrated into our own custom applications via the Web Service.
The Global Suggestion List transform requires the two character ISO country code on input. Therefore, we may want to place a transform, such as the Country ID transform, that will output the ISO_Country_Code_2Char field before the Global Suggestion List transform.
The Global Suggestion List transform is available for use with the Canada, Global Address, and USA engines.
Please go through the article ‘Global Suggestion List transform in SAP Data Services’ to know more details about this transform.

8.Match:

The Match transform is responsible for performing matching based on the business rules we define. The transform then sends matching and unique records on to the next transform in the data flow.
For best results, the data in which we are attempting to find matches should be cleansed. Therefore, we may need to include other Data Quality transforms before the Match transform.
Please go through the article ‘Match transform in SAP Data Services’ to know more details about this transform.

9. USA Regulatory Address Cleanse:

The USA Regulatory Address Cleanse transform identifies, parses, validates, and corrects U. S. address data according to the U.S. Coding Accuracy Support System (CASS).
This transform can create the USPS Form 3553 and output many useful codes to our records. We can also run in a non-certification mode as well as produce suggestion lists.
If we perform both data cleansing and matching, the USA Regulatory Address Cleanse transform typically comes before the Data Cleanse transform and any of the Match transforms in the data flow.
SAP recommends using a sample job or data flow that is set up according to best practices for a specific use case.
Please go through the article ‘USA Regulatory Address Cleanse transform in SAP Data Services’ to know more details about this transform.

10 .User Defined:

The User-Defined transform provides us with custom processing in a data flow using full Python scripting language.
The applications for the User-Defined transform are nearly limitless. It can do just about anything that we can write Python code to do.
We can use the User-Defined transform to generate new records, populate a field with a specific value, create a file, connect to a website, or send an email, just to name a few possibilities.
We can place this transform anywhere in our data flow. If we have created our own transform, then the only restrictions about where it can be located in the data flow are those which we place on it.
Although the User-Defined transform is quite flexible and powerful, we will find that many of the tasks we want to perform can be accomplished with the Query transform.
The Query transform is generally more scalable and faster, and uses less memory than User-Defined transforms.
Please go through the article ‘User Defined transform in SAP Data Services’ to know more details about this transform.
The below image shows the pictorial representation of all the transforms available as part of Data Quality category.