Thursday, 10 November 2016

History of SAP BODS

History of SAP BODS/ Evolution of SAP BODS:

SAP BusinessObjects Data Services was not directly developed by SAP Company. It was acquired from BusinessObjects Company and BusinessObjects acquired it from Acta Technology Inc. 
Acta Technology Inc., headquartered in Mountain View, CA was provider of first real time data integration platform. The two software products provided by Acta were an ETL tool, named as Data Integration (DI) tool also known as ‘Actaworks’ and a Data Management orData Quality (DQ) tool.
BusinessObjects, a French company, world’s leading provider of Business Intelligence (BI) solutions acquired Acta Technology Inc. in the year 2002. BusinessObjects rebranded the two products of Acta as BusinessObjects Data Integration (BODI) tool and BusinessObjects Data Quality (BODQ) tool.
In the year 2007, SAP, legend in ERP solutions, acquired BusinessObjects and renamed the products as SAP BODI and SAP BODQ.  Later in the year 2008 SAP integrated both software products into a single end to end software product and named it as SAP BusinessObjects Data Services (BODS) which provides both data integration and data management solutionsIn the earlier versions of SAP BODS text data processing solution is also included with it.
 Why SAP BODS?
In the present market there are many ETL tools which dose Extraction, Transformation and Loading tasks like SAP BODS, Informatica, IBM InfoSphere Data Stage, Abinitio, Oracle Warehouse Builder (OWB) etc.
Let’s look over why SAP BODS has got much importance in the present world’s market,
  • Firstly, it is a SAP product, as SAP is serving 70% of present world’s market and it is tightly integrated with any database.
  • SAP BODS is a single product which delivers Data Integration, Data Quality, Data profiling and Text data processing solutions.
  • SAP Data Services can Move, Unlock and govern enterprise data effectively.
  • SAP Data Services Cost-effectively delivers its solutions and is single window application with complete easy to use GUI (Graphical User Interface).

 

Architecture of SAP BusinessObjects Data Services (BODS)

SAP BusinessObjects Data Services is a data warehousing product that delivers a single enterprise-class software solution for data integration (ETL), data management (data quality) and text data processing.
  • Data Integrationis the Extraction, Transformation and Loading (ETL) technologyof enterprise data between heterogeneous sources and targets.
Sources and Targets can be SAP Applications (ERP, CRM etc.), SAP BW, SAP HANA, Any Relational Database (MS SQL Server, Oracle etc.), Any File (Excel Workbook, Flat file, XML, HDFS etc.), unstructured text, Web services etc.
ETL technology of SAP BODS can be done in both Batch mode and Real time mode data integration.
  • Data Managementor Data Qualityprocess Cleanses, Enhances, Matches and Consolidates the enterprise data to get an accurate or quality form of data.
  • Text Data Processing analyzes and extracts specific information (entities or facts) from large volumes of unstructured text like emails, paragraphs etc,
Architecture:
            The following figure outlines the architecture of standard components of SAP BusinessObjects Data Services.
Note: On top of SAP BODS, a full SAP BusinessObjects BI Platform or SAP BusinessObjects Information Platform Services (IPS) should be installed for User and Rights security management from 4.x versions. Data Services relies on CMC (Central Management Console) for Authentication and Security features. In earlier versions it was done is Management console of SAP BODS.
BODS Designer:
SAP BusinessObjects Data Services Designer is a developer or designer tool. It is an easy-to-use graphical user interface wheredevelopers can design objects that consist of data mappings, transformations, and control logic.
 Repository
                Repository is the space in a database server which stores the metadata of the objects used in SAP BusinessObjects Data Services. Each repositorymust be registered in the Central Management Console (CMC) and associated with one or more Job Servers which run the jobs you create.
There are three types of repositories used with SAP BODS:
  • Local repository:
A local repository stores the metadata of all the objects (like projects, jobs, work flows, and data flows) and source/target metadata defined by developers in SAP BODS Desinger.
  • Central repository:
A central repository is used for multi-user development and version management of objects. Developers can check objects in and out of their local repositories to a shared object library provided by central repository. The central repository preserves all versions of an application’s objects, so you can revert to a previous version if needed.
  • Profiler repository:
A profiler repository is used to store all the metadata of profiling tasks performed in SAP BODS designer.
Where, CMS repository is used to store the metadata of all the tasks done in CMC of SAP BO BI platform or IPS.
Information Steward Repository is used to store the metadata of profiling tasks and objects defined in SAP Information Steward.
Job Server
The SAP BusinessObjects Data Services Job Server retrieves the job information from its respected repository and starts the data engine to process the job.
The Job Server can move data in either batch or real-time mode and uses distributed query optimization, multi-threading, in-memory caching, in-memory data transformations, and parallel processing to deliver high data throughput and scalability.
Access Server
The SAP BusinessObjects Data Services Access Server is a real-time, request-reply message broker that collects message requests, routes them to a real-time service, and delivers a message reply within a user-specified time frame.
Management Console
SAP BusinessObjects Data Services Management Console is the Web-based application with the following properties.
Administration, Impact and Lineage Analysis, Operational Dashboard, Auto Documentation, Data Validation and Data Quality Reports.



How to improve performance of sap bods jobs


  • how to improve performance of sap bods jobs
  • performance optimization techniques in sap bods
  • bods performance tuning


Performance optimization in SAP BODS                                                SAPBODSINFO.BLOGSPOT.COM

Ø  The following execution options can be used to improve the performance of BODS jobs:
Ø  Monitor Sample Rate: If the job processes large amount of data set the ‘Monitor Sample Rate’ to a higher value (maximum being 50,000, default is 1000) to reduce the number of I/O calls to the log file there by improving the performance.
Ø  If virus scanner is configured on the BODS JobServer, exclude the Data Services log from the virus scan. Otherwise the virus scan scans the Data Services log repeatedly during the execution, which causes performance degradation.
Ø  Collect Statistics for self-tuning:  BODS has a self-tuning capability to determine the cache type by looking into the statistics of previous job executions. The Collect Statistics option needs to be selected during the first execution of the job. BODS collects the statistics for that job and stores in the metadata for that job. In the next execution select the ‘Use Collected statistics’ option to allow BODS to decide the type of cache to be used to execute the job, there by improving the performance of the job.
Ø  Set the data flow properties like Degree of Parallelism depending upon the number of CPUs available for processing and set the cache type to in-memory if the data volume is less.
Ø  If source tables are from same schema and using same Data Store, identify the joins in the early phase so that the join can be pushed to the database.
Ø  Create synonyms for the tables in other schemas to pushdown join conditions of tables belonging to different schemas in a database.
Ø  Use data base links to connect different schemas in different databases to pushdown the joins of tables available in these schemas to database.
Ø  Use data transfer (type = ‘TABLE’) to pushdown the complex logic in a dataflow to the database.
Ø  Do not use advanced features like Run Distinct as Separate process etc available in ‘Advanced’ tab in Query transform, as it starts multiple sub-processes which causes heavy traffic between the processes and there by lead to the termination of job.
Ø  Do not use Data Transfer transform unless required. (Use table type if required as it is more reliable.). SAP suggests that Data Transform is not a reliable transform and hence recommends to not using it unless required.
Ø  Turn off the Cache option for the tables with larger amounts of data. Cache is turned on by default for every source table. Make sure that there are indexes created on key columns for these tables on which cache is turned off.
Ø  Do not use BODS functions like job_name(), instead initialize a variable in a script and use that variable for mapping in query transforms.
Ø  Use Join where ever applicable in place of look up transform as the look up transform has to access the table for each and every record which increases the number of I/O operations to database.
Ø  Use Query transforms to split the data in place of Case transforms as it is a costly transform in BODS.
Ø  Increase monitor sample rate. ex..to 50K in prod environment.
Ø  Exclude virus scan on data integrator job logs.
Ø  While executing the job for first time or when changes occur with re-run. Select the option COLLECT STATISTICS FOR OPTIMIZATION (this is not selected by default).
Ø  While executing the job second time onwards. Use collected stats.(this is selected by default)
Ø  Degree of parallelism (DOP) option for your data flow to a value greater than one, the thread count per transform will increase. For example, a DOP of 5 allows five concurrent threads for a Query transform. To run objects within data flows in parallel, use the following Data Integrator features:
• Table partitioning
• File multithreading
• Degree of parallelism for data flows
Ø  Use the Run as a separate process option to split a data flow or use the Data Transfer transform to create two sub data flows to execute sequentially. Since each sub data flow is executed by a different Data Integrator al_engine process, the number of threads needed for each will be 50% less
Ø   If you are using the Degree of parallelism option in your data flow, reduce the number for this option in the data flow Properties window.
Ø   Design your data flow to run memory-consuming operations in separate sub data flows that each use a smaller amount of memory, and distribute the sub data flows over  different Job Servers to access memory on multiple machines.
Ø   Design your data flow to push down memory-consuming operations to the database.
Ø   Push-down memory-intensive operations to the database server so that less memory is used on the Job Server computer.
Ø   Use the power of the database server to execute SELECT operations (such as joins, Group By, and common functions such as decode and string functions). Often the database is optimized for these operations
Ø   You can also do a full push down from the source to the target, which means Data Integrator sends SQL INSERT INTO… SELECT statements to the target database.
Ø   Minimize the amount of data sent over the network. Fewer rows can be retrieved when the SQL statements include filters or aggregations.
Ø  Using the following Data Integrator features to improve throughput:
  a) Using caches for faster access to data
  b) Bulk loading to the target.
Ø  Data Integrator does a full push-down operation to the source and target databases when the following conditions are met:
Ø  All of the operations between the source table and target table can be pushed down.
Ø  The source and target tables are from the same data store or they are in data stores that have a database link defined between them.
Ø  A full push-down operation is when all Data Integrator transform operations can be pushed own to the databases and the data streams directly from the source database to the target database. Data Integrator sends SQL INSERT INTO… SELECT statements to the target database
Where the SELECT retrieves data from the source.
Ø  Auto correct loading ensures that the same row is not duplicated in a target table, which is useful for data recovery operations. However, an auto correct load prevents a full push-down operation from the source to the target when the source and target are in different data stores.
Ø  For large loads where auto-correct is required, you can put a Data Transfer transform before the target to enable a full push down from the source to the target. Data Integrator generates an SQL MERGE INTO target statement that implements the Ignore columns with value and Ignore columns with null options if they are selected on the target.
Ø  The lookup and lookup_ext functions have cache options. Caching lookup sources improves performance because Data Integrator avoids the expensive task of creating a database query or full file scan on each row.
Ø  20. You can control the maximum number of parallel Data Integrator engine processes using the Job Server options (Tools > Options> Job Server > Environment). Note that if you have more than eight CPUs on your Job Server computer, you can increase Maximum number of engine processes to improve performance.



How to improve performance of sap bods jobs


  • how to improve performance of sap bods jobs
  • performance optimization techniques in sap bods
  • bods performance tuning


Performance optimization in SAP BODS                                                SAPBODSINFO.BLOGSPOT.COM

Ø  The following execution options can be used to improve the performance of BODS jobs:
Ø  Monitor Sample Rate: If the job processes large amount of data set the ‘Monitor Sample Rate’ to a higher value (maximum being 50,000, default is 1000) to reduce the number of I/O calls to the log file there by improving the performance.
Ø  If virus scanner is configured on the BODS JobServer, exclude the Data Services log from the virus scan. Otherwise the virus scan scans the Data Services log repeatedly during the execution, which causes performance degradation.
Ø  Collect Statistics for self-tuning:  BODS has a self-tuning capability to determine the cache type by looking into the statistics of previous job executions. The Collect Statistics option needs to be selected during the first execution of the job. BODS collects the statistics for that job and stores in the metadata for that job. In the next execution select the ‘Use Collected statistics’ option to allow BODS to decide the type of cache to be used to execute the job, there by improving the performance of the job.
Ø  Set the data flow properties like Degree of Parallelism depending upon the number of CPUs available for processing and set the cache type to in-memory if the data volume is less.
Ø  If source tables are from same schema and using same Data Store, identify the joins in the early phase so that the join can be pushed to the database.
Ø  Create synonyms for the tables in other schemas to pushdown join conditions of tables belonging to different schemas in a database.
Ø  Use data base links to connect different schemas in different databases to pushdown the joins of tables available in these schemas to database.
Ø  Use data transfer (type = ‘TABLE’) to pushdown the complex logic in a dataflow to the database.
Ø  Do not use advanced features like Run Distinct as Separate process etc available in ‘Advanced’ tab in Query transform, as it starts multiple sub-processes which causes heavy traffic between the processes and there by lead to the termination of job.
Ø  Do not use Data Transfer transform unless required. (Use table type if required as it is more reliable.). SAP suggests that Data Transform is not a reliable transform and hence recommends to not using it unless required.
Ø  Turn off the Cache option for the tables with larger amounts of data. Cache is turned on by default for every source table. Make sure that there are indexes created on key columns for these tables on which cache is turned off.
Ø  Do not use BODS functions like job_name(), instead initialize a variable in a script and use that variable for mapping in query transforms.
Ø  Use Join where ever applicable in place of look up transform as the look up transform has to access the table for each and every record which increases the number of I/O operations to database.
Ø  Use Query transforms to split the data in place of Case transforms as it is a costly transform in BODS.
Ø  Increase monitor sample rate. ex..to 50K in prod environment.
Ø  Exclude virus scan on data integrator job logs.
Ø  While executing the job for first time or when changes occur with re-run. Select the option COLLECT STATISTICS FOR OPTIMIZATION (this is not selected by default).
Ø  While executing the job second time onwards. Use collected stats.(this is selected by default)
Ø  Degree of parallelism (DOP) option for your data flow to a value greater than one, the thread count per transform will increase. For example, a DOP of 5 allows five concurrent threads for a Query transform. To run objects within data flows in parallel, use the following Data Integrator features:
• Table partitioning
• File multithreading
• Degree of parallelism for data flows
Ø  Use the Run as a separate process option to split a data flow or use the Data Transfer transform to create two sub data flows to execute sequentially. Since each sub data flow is executed by a different Data Integrator al_engine process, the number of threads needed for each will be 50% less
Ø   If you are using the Degree of parallelism option in your data flow, reduce the number for this option in the data flow Properties window.
Ø   Design your data flow to run memory-consuming operations in separate sub data flows that each use a smaller amount of memory, and distribute the sub data flows over  different Job Servers to access memory on multiple machines.
Ø   Design your data flow to push down memory-consuming operations to the database.
Ø   Push-down memory-intensive operations to the database server so that less memory is used on the Job Server computer.
Ø   Use the power of the database server to execute SELECT operations (such as joins, Group By, and common functions such as decode and string functions). Often the database is optimized for these operations
Ø   You can also do a full push down from the source to the target, which means Data Integrator sends SQL INSERT INTO… SELECT statements to the target database.
Ø   Minimize the amount of data sent over the network. Fewer rows can be retrieved when the SQL statements include filters or aggregations.
Ø  Using the following Data Integrator features to improve throughput:
  a) Using caches for faster access to data
  b) Bulk loading to the target.
Ø  Data Integrator does a full push-down operation to the source and target databases when the following conditions are met:
Ø  All of the operations between the source table and target table can be pushed down.
Ø  The source and target tables are from the same data store or they are in data stores that have a database link defined between them.
Ø  A full push-down operation is when all Data Integrator transform operations can be pushed own to the databases and the data streams directly from the source database to the target database. Data Integrator sends SQL INSERT INTO… SELECT statements to the target database
Where the SELECT retrieves data from the source.
Ø  Auto correct loading ensures that the same row is not duplicated in a target table, which is useful for data recovery operations. However, an auto correct load prevents a full push-down operation from the source to the target when the source and target are in different data stores.
Ø  For large loads where auto-correct is required, you can put a Data Transfer transform before the target to enable a full push down from the source to the target. Data Integrator generates an SQL MERGE INTO target statement that implements the Ignore columns with value and Ignore columns with null options if they are selected on the target.
Ø  The lookup and lookup_ext functions have cache options. Caching lookup sources improves performance because Data Integrator avoids the expensive task of creating a database query or full file scan on each row.
Ø  20. You can control the maximum number of parallel Data Integrator engine processes using the Job Server options (Tools > Options> Job Server > Environment). Note that if you have more than eight CPUs on your Job Server computer, you can increase Maximum number of engine processes to improve performance.



Wednesday, 1 June 2016

SAP BODS online training

SAP BODS
SAP BODS training in bangalore
SAP BODS training in hyderabad
SAP BODS training in mumbai
SAP BODS training in bangalore
Faculty : Realtime Experience.
SAPBODS INFO is a brand and providing quality online and offline trainings for students in world wide. it providing Best SAP BODS online training in Hyderabad/Bangalore/.

HIGHLIGHTS IN OUR TRAINING SERVICE:
Here all faculties of us have experienced the joy of training and trained Resources is available throughout the world. Training leads to better understanding, new knowledge, skills and expertise. Our real time experienced trainers fulfill your dreams and create professionally driven environment. We will develop the Acquaintance with production, development and testing environments. Our trainers assist in Resume preparation, enhance interview skills, sample live projects, Assessments, clarifying doubts, providing materials, explaining bugs and critical issues, development activities, encourage innovative thoughts etc;

SAPBODS INFO  is one best online and class room training center in Hyderabad and Bangalore and chennai,We are also providing training in India, usa,Uk,Canada,Japan,Malasia,Singapore, Australia etc.. ;

SAPBODS INFO  provides flexibility to the students while choosing online classes, classroom trainings, corporate trainings, overview of courses and their scope. ;

COURSE CONTENT:
Introduction
·  What is a data warehouse?
·  Data Warehouse Functions & Implementation
·  Data Warehouse Products & Vendors
·  About SAP & Their Products
·  SAP's Data Warehouse product-SAP BW
·  SAP BW functions, benefits & limitations
·  About Business Objects & their Products
·  SAP BO Product Portfolio- BI, EIM
SAP BO Data Service Overview
·  About Data Services-Introduction
·  Functions-Data Integration & Data Management
·  Data Services Product Evolution (ATL, DI & DQ)
·  Architecture-by Components
·  BO Data Services Tools & its Functions
·  BO Data Services Objects & Object Hierarchy
·  BODS Objects Naming Standards
·  BODS Objects Comparison with SAP BW Objects
SAP BO Data Services-Basic Level
·  Resiportary Manager-BODS Resiportary
·  Resiportary Types-Local, Central, Profile, Data Package
·  Resiportary Creation & Up gradation
·  Server Manager Job Server-Resipository Assignment
·  Management Console- Introduction & Components
·  Data Service Designer-Introduction & GUI
·  Getting Start with Designer to develop First ETL Flow
Data store & Formats
·  Data store-Overview & Types
·  Data store creation - DB, SAP, Adaptor, and Web service
·  Formats - Flat file, DTD, XSD, COBOL, copybooks
·  Data Extraction from Database Tables
·  Data Extraction from Excl Workbook- Multiple Sheets
·  ata Extraction from flat files (CSV, Notepad, and SAP Transport)
·  Data Extraction from XML FILE (DTD, XSD)
·  Data Extraction from COBOL Copybooks
·  Data Distribution to flat file & XML
·  Dynamic Extraction-File Selection & Sheet Selection
SAP BO Data Services-Transforms
·  Transforms & Category (DI, DQ, PF)
·  Data Integrator-Data Transfer, Date Generation
·  Data Integrator- Effective_Data, Hierarchy_Flatterning
·  Data-Integrator-History _Preserving, Key _Generation
·  Data Integrator-Pivot, Reserve_Pivot, XML_Pipeline
·  Platform-Case, Map_Operation, Merge, query
·  Platform-SQL, Validation, Custom ABAP Transform
·  Data Quality-Introduction, standard content directions
·  Data Quality-Dictionaries Configuration
·  Data-Country_ID, Geocoder, Global_Adress_Cleanse
SAP BO Data Services-Advance Level
·  BODS Admin Console-Administration, Auto Reporting
·  Real time Jobs, Embedded Data Flows
·  Variables, Parameters, Substitution Parameters, System Config
·  Debugging, Recovery Mechanism
·  Data Assessment-Data Profiling
·  BODS Performance Tuning Techniques
·  Multi-User Development Environment-Into & Advantages
·  Multi -User Environment-Implementation & Migration
·  BODS Objects Migration Techniques
Data Services-SAP Systems Handing
·  AP systems into, SAP Systems allowed, Terminology
·  SAP BODS and sap ERP/SAP BI Integration
·  SAP BODS User & Role Creation in SAP
·  SAP ERP -Data Flow, Interfaces, and Objects allowed to BODS
·  CREATING sap Application Data store-Properties
·  SAP ERP-Tables & Hierarchies Data Extraction
·  SAP ERP-Getting RFC enabled Functions Modules
·  SAP ERP-IDOC Data Extraction & IDOC Data Loading
·  SAP ERP- Extracting Flat files data from SAP APP Server
·  SAP BW-Data Flow, Interfaces Objects allowed to BODS
·  Creating SAP BW Source & SAP BW Target Data stores
·  SAP BW-Extracting Data from SAP BW info Providers-OHD
·  SAP BW -Extracting Data from SAP BW bases tables-/BIC/
·  SAP BW -Loading data to SAP BW info Providers (MD & TD)
·  SAP BW -BW Jobs Execution
Our Online Services providing world wide like Asia, Europe, America, Africa, Sweden,North Korea, South Korea, Canada,Netherland,Itely, Russia,Israel,New Zealand ,Norway,Singapore,Malasia,etc,

Here we are providing “Sap bods online training”,Sap bods classroom training in Hyderabad, “Sap bods corporate training in India”, In India Online services providing in top cities like Banglore,Chennai,Pune, Mumbai , Delhi, Etc,

sap bods training,sap bods training online usa,sap bods training in Hyderabad, sap bods training in india,sap bods training institutes in India,Hyderabad, Best sap bods training in Hyderabad,top training centers in India,Asia, onine sap bods Hyderabad,sap bods training centers with project, sap bods training,Sap bods training in middle east, Sap bods online training in Canada,Sap bods training in usa, sap bods training in malasia,Singapore,Russia,

Contact: sapbodsinfo@gmail.com