Data Pump is a more flexible and powerful utility than legacy exp or imp, which has no NLS_LANG conversion issue bothering us, so we like the utility in our country which is a non-English nation.
But the performance is an issue. I had ever compared the speed of those two types of utility to export a Tetra-Bytes grade database under Oracle database 10.2, the legacy exp won the race with direct path and parallel execution. But after a fine tune, the speed of expdp can almost catch up with exp. So our topic is about how to make expdp or impdp faster in oracle?
For 10.2 Data Pump, there is no DIRECT=Y like legacy exp, Data Pump will choose a right path to do the job, which cannot be controlled by DBA.
Here are tips to speed up export job when table mode is applied:
- Parallelize executions.
- expdp Parameter for All Releases of Oracle Database
- impdp Parameter for All Releases of Oracle Database
- Decompose a partitioned table into partitions in par files.
- Set the parameter ESTIMATE to STATISTICS
- Turn on Asynchronous Disk IO
- For 11.2 and later, set ACCESS_METHOD to DIRECT_PATH
- For really big tables
- Drop their indexes.
- Disable their constraints
- Data loading.
- Enable constraints with NOVALIDATE.
- Create indexes back.
With human intervention, split thousands of tables manually into several par files with same or even sized, e.g. 8 par files for parallel executions on OS-level. Don't mix up with the parameter PARALLEL in Data Pump, we trust our human intervention more.
More parameters comparison of export and import between releases can be found here:
If you have a very large table with partitioned, don't export it in one line like this:
TABLES=(
SCHEMA.VERY_BIG_TABLE,
...
Data Pump will do it as your wish with the above par file, but there's a better way instead to do it:
TABLES=(
SCHEMA.VERY_BIG_TABLE:PARTITION01,
SCHEMA.VERY_BIG_TABLE:PARTITION02,
SCHEMA.VERY_BIG_TABLE:PARTITION03,
...
With setting some degrees of parallelism via PARALLEL, we can benefit from this.
The default value of ESTIMATE is BLOCKS, it will take a very long time to estimate when Data Pump faces a very big table in the par file if the NETWORK_LINK parameter is also specified. Sometimes, it seems stuck in the process.
With full capacity during Data Pump, the bottleneck of resource is always IO, you should apply your IO to asynchronous mode or other buffer technology on OS-level to facilitate Data Pump to access disks efficiently.
Since 11g release 2, Oracle introduces a new parameter ACCESS_METHOD of Data Pump, which can be set to DIRECT_PATH, DBA will get more control and choices while data manipulation and migration.
Big tables are really time-consuming at data loading (impdp), this is because they scan the whole table to align with constraints before inserting every new row. In other words, more data has been loaded, more data should be scanned. The last new row to be inserted would be the most time-consuming one.
How about take constraints off, then add them back after data loading? For a 5TB grade of database in my experience, it reduces time from several days to several hours overall.
Here's the steps to remove constraints of big tables temporarily.
You should keep DDL of those indexes for adding them back later.
You should keep DDL of those constraints for enabling them back later.
Now you can import data now.
You may choose VALIDATE or NOVALIDATE, VALIDATE takes longer and NOVALIDATE returns immediately.
It takes time to build indexes, but it's worth it.
The above tips can accelerate Data Pump, except you are allowed to use transportable tablespaces mode, it'll be faster, but the drawback is that transportable tablespace mode is more inflexible when migrating data between different databases.
For more information about data migration, you may refer to the following post: A Systematic Approach to Migrate Databases