SQOOP Performance tuning - Partitioning,Controlling Parallelism & Batch
OraOop can create a partitioned table for you – it will be partitioned by the date of the load (one new partition for each export), then sub partitioned by mapper (so 24 mappers would have 24 sub partitions). The easiest way is to use the template table parameter – so here is the command I used – in this example TMARTH is my existing non-partitioned table. TMARTH_EXP is a table that does not exist and will be created by OraOop – it will have the same columns as TMARTH – but will add the columns I mentioned for partitioning.
sqoop export -Dmapred.map.tasks.speculative.execution=false -Doraoop.template.table=tst_product
-Doraoop.partitioned=true --connect jdbc:oracle:thin:@//hostname1:1521/service --username
username --password password --export-dir tmarth --table tmarth_exp --num-mappers
24
To optimize performance, set the number of map tasks to a value lower than the maximum number of connections that the database supports
Using more mappers will lead to a higher no of concurent data transfer tasks, which can results in faster job completion
sqoop import --connect
jdbc:mysql://marth.example.com/sqoop --username
sqoop --password
sqoop --table
employee --num-mappers 10
Inserting Data in Batches
The JDBC interface exposes an API for doing batches in a prepared statement with multiple sets of values. With the --batch parameter, Sqoop can take advantage of this. This API is present in all JDBC drivers because it is required by the JDBC interface.
Enable JDBC batching using the --batch parameter.
sqoop export --connect <<JDBC URL>> --username <<SQOOP_USER_NAME>> --password <<SQOOP_PASSWOR>> --table <<TABLE_NAME>> --export-dir <<FOLDER_URI>> --batch
Yo !!! Marthu !!!