- how to improve performance of sap bods jobs
- performance optimization techniques in sap bods
- bods performance tuning
Performance optimization in SAP BODS SAPBODSINFO.BLOGSPOT.COM
Ø
The following execution options
can be used to improve the performance of BODS jobs:
Ø Monitor Sample Rate: If the job processes large amount of data set
the ‘Monitor Sample Rate’ to a higher value (maximum being 50,000, default is
1000) to reduce the number of I/O calls to the log file there by improving the
performance.
Ø If virus scanner is configured on the BODS JobServer, exclude the
Data Services log from the virus scan. Otherwise the virus scan scans the Data
Services log repeatedly during the execution, which causes performance
degradation.
Ø Collect Statistics for self-tuning: BODS has a self-tuning
capability to determine the cache type by looking into the statistics of
previous job executions. The Collect Statistics option needs to be selected
during the first execution of the job. BODS collects the statistics for that
job and stores in the metadata for that job. In the next execution select the
‘Use Collected statistics’ option to allow BODS to decide the type of cache to
be used to execute the job, there by improving the performance of the job.
Ø Set the data flow properties like Degree of Parallelism depending
upon the number of CPUs available for processing and set the cache type to in-memory if the data volume is less.
Ø If source tables are from same schema and using same Data Store,
identify the joins in the early phase so that the join can be pushed to the
database.
Ø Create synonyms for the tables in other schemas to pushdown join
conditions of tables belonging to different schemas in a database.
Ø Use data base links to connect different schemas in different
databases to pushdown the joins of tables available in these schemas to
database.
Ø Use data transfer (type = ‘TABLE’) to pushdown the complex logic
in a dataflow to the database.
Ø Do not use advanced features like Run Distinct as Separate process etc available in ‘Advanced’ tab in
Query transform, as it starts multiple sub-processes which causes heavy traffic
between the processes and there by lead to the termination of job.
Ø Do not use Data
Transfer transform
unless required. (Use table type if required as it is more reliable.). SAP
suggests that Data Transform is not a reliable transform and hence recommends
to not using it unless required.
Ø Turn off the Cache option for the tables with larger amounts of
data. Cache is turned on by default for every source table. Make sure that
there are indexes created on key columns for these tables on which cache is
turned off.
Ø Do not use BODS functions like job_name(), instead initialize a
variable in a script and use that variable for mapping in query transforms.
Ø Use Join where ever applicable in place of look up transform as
the look up transform has to access the table for each and every record which
increases the number of I/O operations to database.
Ø
Use Query transforms to split
the data in place of Case transforms as it is a costly transform in BODS.
Ø Increase monitor sample rate. ex..to 50K in prod environment.
Ø
Exclude virus scan on data
integrator job logs.
Ø
While executing the job for
first time or when changes occur with re-run. Select the option COLLECT
STATISTICS FOR OPTIMIZATION (this is not selected by default).
Ø
While executing the job second
time onwards. Use collected stats.(this is selected by default)
Ø
Degree of parallelism (DOP)
option for your data flow to a value greater than one, the thread count per
transform will increase. For example, a DOP of 5 allows five concurrent threads
for a Query transform. To run objects within data flows in parallel, use the
following Data Integrator features:
• Table partitioning
• File multithreading
• Degree of parallelism for data flows
• Table partitioning
• File multithreading
• Degree of parallelism for data flows
Ø
Use the Run as a separate
process option to split a data flow or use the Data Transfer transform to
create two sub data flows to execute sequentially. Since each sub data flow is
executed by a different Data Integrator al_engine process, the number of
threads needed for each will be 50% less
Ø
If you are using the Degree of parallelism
option in your data flow, reduce the number for this option in the data flow
Properties window.
Ø
Design your data flow to run memory-consuming
operations in separate sub data flows that each use a smaller amount of memory,
and distribute the sub data flows over different Job Servers to access
memory on multiple machines.
Ø
Design your data flow to push down
memory-consuming operations to the database.
Ø
Push-down memory-intensive operations to the
database server so that less memory is used on the Job Server computer.
Ø
Use the power of the database server to
execute SELECT operations (such as joins, Group By, and common functions such
as decode and string functions). Often the database is optimized for these
operations
Ø
You can also do a full push down from the
source to the target, which means Data Integrator sends SQL INSERT INTO… SELECT
statements to the target database.
Ø
Minimize the amount of data sent over the
network. Fewer rows can be retrieved when the SQL statements include filters or
aggregations.
Ø
Using the following Data
Integrator features to improve throughput:
a) Using caches for
faster access to data
b) Bulk loading to the target.
b) Bulk loading to the target.
Ø Data Integrator does a full push-down operation to the source and
target databases when the following conditions are met:
Ø
All of the operations between
the source table and target table can be pushed down.
Ø
The source and target tables
are from the same data store or they are in data stores that have a database
link defined between them.
Ø
A full push-down operation is
when all Data Integrator transform operations can be pushed own to the databases
and the data streams directly from the source database to the target database.
Data Integrator sends SQL INSERT INTO… SELECT statements to the target database
Where the SELECT retrieves data from the source.
Where the SELECT retrieves data from the source.
Ø
Auto correct loading ensures
that the same row is not duplicated in a target table, which is useful for data
recovery operations. However, an auto correct load prevents a full push-down
operation from the source to the target when the source and target are in
different data stores.
Ø
For large loads where
auto-correct is required, you can put a Data Transfer transform before the
target to enable a full push down from the source to the target. Data
Integrator generates an SQL MERGE INTO target statement that implements the
Ignore columns with value and Ignore columns with null options if they are
selected on the target.
Ø
The lookup and lookup_ext
functions have cache options. Caching lookup sources improves performance
because Data Integrator avoids the expensive task of creating a database query
or full file scan on each row.
Ø
20. You can control the maximum
number of parallel Data Integrator engine processes using the Job Server
options (Tools > Options> Job Server > Environment). Note that if you
have more than eight CPUs on your Job Server computer, you can increase Maximum
number of engine processes to improve performance.
ReplyDeleteLooking for the best Catia Training in Bangalore, then don't delay join IntelliMindz Catia Training in Bangalore. Join our Catia training to get hands-on training and practice in Catia Software. Any Queries call @ 9655877677. IntelliMindz is the best IT Training Institute in Bangalore with placement, offering 200 and more software courses with 100% Placement Assistance.
Catia in Bangalore
Catia in Chennai
Catia Online Course