Implementing Talend Job for Incremental Data Processing

Implementing Talend Job for Incremental Data Processing

Recently I was stuck with one logic for hours... That is, we have to separate the row based on the count that we are getting as a input. for example If I have 10 rows each 3 rows has to flow first then next 3,3,1.

Article content
Input File

After splitting it, it has to directly go to next flow. We shouldn't create temporary file for splitting the csv. It is easy task when we use tFileOutputDelimited because of split output in several files.

Article content
tFileOutputDelimited - Advance settings

Now I'll show you another way implement this without creating intermediate files.

Article content
Job preview

tLoop - This will be used to iterate the flow till the specified condition meets. Here I have given i=1 and it will loop till 3. Next one is tFileInputDelimited. This component help us to pick the file from specified folder.

Article content
tLoop - Basic Settings

After getting the data from csv, we need to implement the logic in tJavaFlex component. counter is 0 and batchSize is 3. Batch size is based on maximum how many data need to be there in each iteration. We can use tJavaFlex for looping. So, main code is used to loop the data and we can give the condition here. counter value has to be increment for every data flow. Here setNumber variable is to filter the data based on condition. Now setNumber is assigned incremental bases like 1 will be assigned to 1st 3 rows, 2 will be assigned to 2nd 3 rows from the input file and so on. Assign setNumber value in global variable.

Article content
tJavaFlex - Basic Settings

Next component is tFilterRow. It will filter the rows based on the conditions. Already we have assigned setNumber for each rows. Now we need to filter the rows based on the setNumber condition. here i is from loop component. we need to select the use advanced mode check box to write the filter condition.

((Integer)globalMap.get("setNumber")) == i        
Article content
tFilterRow - Basic Settings

Now job can able to produce the data as per the requirement. Use tLogRow to print the data.

Article content
Output from Talend log console






this is not an incrmental load logic right?

Like
Reply

Good Solution. However in the above job, you will end up reading the same file thrice

To view or add a comment, sign in

More articles by Sneha A

  • Python for Geographic Data Analysis - Chapter 1

    Python essentials Learning Objective: This presents some essential programming concepts and how to apply them in the…

  • Basic SQL questions

    Write the following queries in SQL, using the university schema Please, Take the sample data from attached link for…

  • How to configure tESBConsumer

    It is similar to tRESTClient. It is used to call external API.

  • File upload or File download to the endpoint

    tHTTPClient tHTTPClient is a versatile component in Talend that enables you to access web services hosted on HTTP…

  • Talend ESB - SOAP listener

    Here, the scenario is listening the soap service continuously. Place tESBProviderRequest and tESBProviderResponse and…

  • Is JSON Structure handling difficult?

    Actually not, If you understand the JSON structure. In this article we are going to see about the objects within JSON…

    2 Comments
  • Dynamic Schema - DB to File

    The dynamic column retrieves the columns which are undefined in the schema. It can be only one column, also dynamic…

  • How to convert zipped binary data to String format?

    Here Source is kafka consumer component. It contains all the zipped binary data.

  • Extracting data from Kafka and loading into DB

    Kafka is a distributed event store and stream-processing platform. It can publish and consume JSON and XML Structures.

  • Talend Data Integration - Introduction

    There are many technologies booming now a days in the world. Cloud and Big data is one of the emerging technology in IT.

Others also viewed

Explore content categories