SQOOP Performance tuning - Partitioning,Controlling Parallelism & Batch

Marthandan Murugesan

Published Mar 9, 2018

OraOop can create a partitioned table for you – it will be partitioned by the date of the load (one new partition for each export), then sub partitioned by mapper (so 24 mappers would have 24 sub partitions). The easiest way is to use the template table parameter – so here is the command I used – in this example TMARTH is my existing non-partitioned table. TMARTH_EXP is a table that does not exist and will be created by OraOop – it will have the same columns as TMARTH – but will add the columns I mentioned for partitioning.

sqoop export -Dmapred.map.tasks.speculative.execution=false -Doraoop.template.table=tst_product
-Doraoop.partitioned=true --connect jdbc:oracle:thin:@//hostname1:1521/service --username
username --password password --export-dir tmarth --table tmarth_exp --num-mappers
24

To optimize performance, set the number of map tasks to a value lower than the maximum number of connections that the database supports

Using more mappers will lead to a higher no of concurent data transfer tasks, which can results in faster job completion

sqoop import --connect

jdbc:mysql://marth.example.com/sqoop --username

sqoop --password

sqoop --table

employee --num-mappers 10

Inserting Data in Batches

The JDBC interface exposes an API for doing batches in a prepared statement with multiple sets of values. With the --batch parameter, Sqoop can take advantage of this. This API is present in all JDBC drivers because it is required by the JDBC interface.

Enable JDBC batching using the --batch parameter.

sqoop export  --connect <<JDBC URL>>  --username <<SQOOP_USER_NAME>>  --password <<SQOOP_PASSWOR>>   --table <<TABLE_NAME>>  --export-dir <<FOLDER_URI>>  --batch

Arul Ganesh A. 8y

Yo !!! Marthu !!!

1 Reaction

To view or add a comment, sign in

SQOOP Performance tuning - Partitioning,Controlling Parallelism & Batch

Marthandan Murugesan

More articles by Marthandan Murugesan

Others also viewed

ETL/Application Performance Tuning - Think outside the box - Underlying Technology

What Is Spring Data JPA?

Optimising Data Deletion in Spring Data JPA: deleteAllInBatch vs. deleteAll

About DB Transaction and Spring @Transactional

JPA/Hibernate: Performance Optimization

MyBatis| First-class persistence framework with support for custom SQL.

Experience using ArcadeDB for a modelling tool

Get a List of IBM i Object Types

The 3rd generation of DBIx::DataModel is on CPAN

How Keyset Pagination Overcomes Offset Pagination Issues using Spring Boot JPA

Explore content categories