Data Quality Transforms Overview/List
The list of data quality transforms present are:
1. Associate
2. Country_ID
3. Data_Cleanse
4. DSF2_Walk_Sequencer
5. Geocoder
6. Global_Address_Cleanse
7. Global_Suggestion_List
8. Match
9. USA_Regulatory_Address_Cleanse
10. User_Defined
Lets look into the detailed description of each of the transforms present under Data Quality category.
1 Associate:
The Associate transform works downstream from Match transform to provide a way to combine, or associate, their match results by using the Match transform-generated Group Number fields.
We may need to add a Group Statistics operation to the Associate transform to gather match statistics.
You can combine the results of two or more Match transforms, two or more Associate transforms, or any combination of the two.
For example, we may use one Match transform to match on name and address, use a second Match transform to match on SSN, and then use an Associate transform to combine the match groups produced by the two Match transforms.
Please go through the article ‘Associate transform in SAP Data Services’ to know more details about this transform.
2. Country ID:
The Country ID transform parses our input data and then identifies the country of destination for each record.
After identifying the country, the transform can output the country name, any of three different ISO country codes, an ISO script code, and a percentage of confidence in the assignment.
Though we can use the Country ID transform before any transform in a data flow, we will probably find it most useful during a transactional address cleanse job.
Place the Country ID transform before the Global Suggestion List transform. The Global Suggestion List transform needs the ISO_Country_Code_2Char field that the Country ID transform can output.
It is not necessary to use the Country ID transform before the Global Address Cleanse transform in a data flow because the Global Address Cleanse transform contains its own Country ID processing.
It is also not necessary to use the Country ID transform before the USA Regulatory Address Cleanse transform because the input data should contain U.S. addresses only.
Please go through the article ‘Country ID transform in SAP Data Services’ to know more details about this transform.
3. Data Cleanse:
Use the Data Cleanse transform to parse and format custom or person and firm data as well as phone numbers, dates, e-mail addresses, and Social Security numbers.
Custom data includes operational or product data specific to the business.
The cleansing package we specify defines how our data should be parsed and standardized.
Within a data flow, the Data Cleanse transform is typically placed after the address cleansing process and before the matching process.
Please go through the article ‘Data Cleanse transform in SAP Data Services’ to know more details about this transform.
4. DSF2 Walk Sequencer:
To add walk sequence information to our data, include the DSF2 Walk Sequencer transform in the data flow. We can then send our data through presorting software to qualify for the following walk-sequence discounts:
Carrier Route
Walk Sequence
90% Residential Saturation
75% Total Active Saturation
DSF2 walk sequencing is often called “pseudo” sequencing because it mimics USPS walk sequencing.
Where USPS walk-sequence numbers cover every address, DSF2 walk sequence processing provides “pseudo” sequence numbers for the addresses only in that particular file.
Please go through the article ‘DSF2 Walk Sequencer transform in SAP Data Services’ to know more details about this transform.
5. Geocoder:
The Geocoder transform uses geographic coordinates expressed as latitude and longitude, addresses, and point-of-interest (POI) data. Using the transform, we can append addresses, latitude and longitude, census data (US only), and other information to the data.
Based on mapped input fields, the Geocoder transform has three modes of geocode processing:
Address Geocoding
Reverse Geocoding
POI textual search
Please go through the article ‘Geocoder transform in SAP Data Services’ to know more details about this transform.
6. Global Address Cleanse:
The Global Address Cleanse transform identifies, parses, validates, and corrects global address data, such as primary number, primary name, primary type, directional, secondary identifier, secondary number, locality, region and postcode.
Note:The Global Address Cleanse transform does not support CASS certification or produce a USPS Form 3553. If you want to certify your U.S. address data, you must use the USA Regulatory Address Cleanse transform, which supports CASS.
If we perform both address cleansing and data cleansing, the Global Address Cleanse transform typically comes before the Data Cleanse transform in the data flow.
Please go through the article ‘Global Address Cleanse transform in SAP Data Services’ to know more details about this transform.
7. Global Suggestion List:
The Global Suggestion List transform query addresses with minimal data, and it can offer suggestions for possible matches. It is a beneficial research tool for managing unassigned addresses from a batch process.
Global Suggestion List functionality is designed to be integrated into our own custom applications via the Web Service.
The Global Suggestion List transform requires the two character ISO country code on input. Therefore, we may want to place a transform, such as the Country ID transform, that will output the ISO_Country_Code_2Char field before the Global Suggestion List transform.
The Global Suggestion List transform is available for use with the Canada, Global Address, and USA engines.
Please go through the article ‘Global Suggestion List transform in SAP Data Services’ to know more details about this transform.
8.Match:
The Match transform is responsible for performing matching based on the business rules we define. The transform then sends matching and unique records on to the next transform in the data flow.
For best results, the data in which we are attempting to find matches should be cleansed. Therefore, we may need to include other Data Quality transforms before the Match transform.
Please go through the article ‘Match transform in SAP Data Services’ to know more details about this transform.
9. USA Regulatory Address Cleanse:
The USA Regulatory Address Cleanse transform identifies, parses, validates, and corrects U. S. address data according to the U.S. Coding Accuracy Support System (CASS).
This transform can create the USPS Form 3553 and output many useful codes to our records. We can also run in a non-certification mode as well as produce suggestion lists.
If we perform both data cleansing and matching, the USA Regulatory Address Cleanse transform typically comes before the Data Cleanse transform and any of the Match transforms in the data flow.
SAP recommends using a sample job or data flow that is set up according to best practices for a specific use case.
Please go through the article ‘USA Regulatory Address Cleanse transform in SAP Data Services’ to know more details about this transform.
10 .User Defined:
The User-Defined transform provides us with custom processing in a data flow using full Python scripting language.
The applications for the User-Defined transform are nearly limitless. It can do just about anything that we can write Python code to do.
We can use the User-Defined transform to generate new records, populate a field with a specific value, create a file, connect to a website, or send an email, just to name a few possibilities.
We can place this transform anywhere in our data flow. If we have created our own transform, then the only restrictions about where it can be located in the data flow are those which we place on it.
Although the User-Defined transform is quite flexible and powerful, we will find that many of the tasks we want to perform can be accomplished with the Query transform.
The Query transform is generally more scalable and faster, and uses less memory than User-Defined transforms.
Please go through the article ‘User Defined transform in SAP Data Services’ to know more details about this transform.
The below image shows the pictorial representation of all the transforms available as part of Data Quality category.