KNIME Recursive Workflow

KNIME Recursive Workflow

Recursive functions are something that developers will be familiar with. But to a KNIME workflow it may be not as clear, and a bit tricky to create, especially if you first develop it in desktop. On a KNIME Server calling a workflow recursively is very easy since KNIME Server will create and manage the copy of the workflow. In desktop it is trickier as you are left with the responsibility to create and delete the workflows to simulate a recursive call.

This article is an updated version from my LinkedIn article over 2 years ago. I’ve updated the workflow with the latest version of KNIME (4.4.2).

It is a bit challenging to create a workflow that will call itself recursively in KNIME and the reason for this is the fact that you need to execute your workflow or at least a previous node in KNIME before you can configure and execute the next node. I won’t go through the complete process of actually creating the workflow step by step but will describe the process in more detail than what I did before. If you are really keen on creating a self-referencing workflow or find yourself in a situation that a self-referencing workflow is a way of solving a specific scenario, you can contact me as I do ad-hoc consulting and I’m more than happy to assist you in your KNIME journey especially in the data transformation (ETL) type of workflows.

Let me explain quickly what a self-referencing function is and where it is very useful. In programming, self-referencing functions can especially be used where you need to transverse through a tree-type structure like a folder structure where you do not know how many subfolders or folders a folder will have. An example can be found in figure 1 which shows a folder structure on a Windows based computer.

No alt text provided for this image

Another example would be if you have a database table that references itself by having a Parent_ID field for example that references the ID field in that same table. The amount of levels could also be endless in this scenario depending on the functionality of the table and the software that stores data in that table. I’ve created a recursive workflow that traversed a self-referencing table like this to flatten out the levels and to create a column for every level and then create a data table that holds the data in the flatten out structure. With the self-referencing workflow it doesn’t matter if the data suddenly have another level the workflow will simply process it and if need be another call to the self-referencing workflow will be executed if needed.

No alt text provided for this image


First I’ll describe the workflow process and what it is trying to achieve and then I’ll go through the different parts of the workflow to elaborate on what the different parts need to accomplish.

The workflow will simply create a table with one column in the beginning that will contain a list of numbers. Each row will contain one number and it will run from 1 to 10 or to 50 or 100. The number of rows doesn’t really matter in this scenario. The requirements of this workflow is to take the previous row’s number, if there is a previous row else it will use the value 0. The current row’s number will then be multiplied with the previous row’s number and stored in a new column.

The remaining rows of the table will then be passed to the recursive workflow with the value of the top row, which is the previous value of the next row, as a variable.

The recursive workflow will check if there are more rows left to process and if so, it will call the recursive workflow with the remaining rows and the new “Previous” value. If there are no more rows to process the top row with the answer will be combined/union/concatenated with the results of the output of the called recursive workflow. This will the bubble up until all the rows has been combined/union/concatenated until the first recursive call which will return the complete table to the main workflow.

So if we take an example of 3 numbers (3 rows of data) in a table:

1

2

3

The results will be

1, 0 (1 * 0)

2, 2 (2 * 1)

3, 6 (3 * 2)

The process will be.

Main workflow calls the recursive workflow.

The recursive workflow split the data in two

Top row:

1

The remaining table:

2

3

The recursive workflow makes the calculation and add the result

1, 0

The remaining rows with the value of the top row is passed to the recursive workflow which will be a new instance of the recursive workflow

It will split the table

Top row:

2

Remaining row(s)

3

The new recursive workflow will do the calculation and get a result:

2, 2

Remaining row(s) and the top row’s value will be passed to another new instance of the recursive workflow.

This workflow will split the data and get,

Top row:

3

Remaining row(s) will be empty and thus no new instance of the recursive workflow will be created.

The calculation will be done resulting in:

3, 6

This will be passed back to the calling recursive workflow

The result of the calculation will be union to the result of the 3rd level recursive workflow’s result

2, 2

3, 6

This will be passed to the 1st level recursive workflow and then joined to its results

1, 0

2, 2

3, 6

This will be the first recursive workflow and it will pass this result to the main workflow.

I’ll go through the logic using the KNIME workflow.

The main workflow will use a loop to create the table with the rows numbering the items in in the “Number” column from 1 to the number of records we want to process. There are other easier ways of accomplishing this, but these just serfs as a places holder for some other logic that you may have in your workflow before you require to call the self-referencing workflow.

No alt text provided for this image

Then I’ve created a couple of flow variables that will hold the previous value, the workflow path, the workflow name and the workflow’s next level name. This is mostly required only if running in KNIME desktop since you have to create a copy of the self-referencing workflow, KNIME desktop don’t allow the same workflow to be called while another copy is busy running at the moment. When you run the self-referencing workflow on a KNIME Server you don’t need any of the copying and deleting physical workflows since KNIME Server will handle that for you and you only need to call the workflow.

No alt text provided for this image

Once the initial data and the workflow variable have been created you can then call the first instance of the self-referencing workflow, passing the table of data as well as the flow variables using a “Call Workflow (Table Based)” node.

No alt text provided for this image

Here is the complete self-referencing workflow. I will go through some of the logic but will not go into too much detail for the housekeeping parts where the next level self-referencing workflow gets created and deleted as this is only needed for the desktop version to work. Saying that, I still used this to create the actual workflow as the design gets done on desktop, I simply remove these parts before publishing the workflow to KNIME Server.

No alt text provided for this image

Default/template input data to be used during development of the workflow can now be created for the workflow’s inputs. The default/template data will be overwritten with data being passed to the workflow from the calling workflow.

No alt text provided for this image

The top part of the self-referencing workflow is basically the logic that the workflow needs to execute and this will depend on what you need your workflow to do.

No alt text provided for this image

The “Merge Variables” node ensures that the variables that has been injected into the workflow gets precedence and are available to the top part of the workflow.

The “Row Splitter” splits the data, the top output port only contains the top row of the data as this is what the logic of the workflow needs and the bottom port contains the rest of the rows that still needs to be processed and these are the rows that will be passed to the next level of the recursive workflow. This could be different depending on what your workflow needs to accomplish. The top part executes since we also need the value of the top row to be passed to the next call of the recursive workflow.

No alt text provided for this image

We also have the first part of the housekeeping that needs to execute. This works out what the level number of the next recursive call will be and then creates a copy of the recursive workflow with the next level as the workflow name. This then passes the name and path of the next recursive workflow to the top part of the workflow logic, this simply ensures that the next level workflow will be created before the rest of the logic gets executed.

No alt text provided for this image

The “Empty Table Switch” node determines if there more data to be executed. The top output port indicates there are more rows and passes them to the “Call Workflow (Table Based)” node. The “Empty Table Switch” nod also forces the workflow logic to follow the path of the top output port.

No alt text provided for this image

The “Call Workflow (Table Based)” node will then call the next level of the recursive workflow and pass the remaining rows and the value of the top row’s value to the workflow. Take note that the inactive port will be marked with a RED X to indicate it is inactive. From here the logic of this workflow will be executed recursively until no more call are needed and then the results will bubble up to the first recursive workflow and then to the main workflow. The actual call to the recursive workflow doesn’t get added until the rest of the logic of the workflow is completed. With other words you create the workflow as if there I only one execution and then when the rest of the workflow is completed you add the “Call Workflow (Table Based)” node. I use the “End If” node as it is great to take data from one of the two executed paths created by the “Empty Table Switch” node. Just be mindful when using it this way that you ensure only one of the two paths will execute as the “End If” node cannot handle it if both paths are active.

No alt text provided for this image

From the “End If” node the variable path will force the “Delete Referential Workflow” meta mode. This logic will delete the recursive workflow that has been created. Again when executing this on a KNIME Server this step will not be needed. The deletion process may take a while to execute in the desktop version depending on how long it takes for the operating system to release file locks. The process will try multiple times to delete the recursive workflow. I found that in Windows 10 this sometimes takes up to almost a minute for the file locks to be removed and other times it happens almost immediately. Again when running this on a KNIME Server you don’t have these issues.

No alt text provided for this image

The “Concatenate” node which adds data from different tables to each other, for people with SQL background or knowledge it works the same as a “UNION” of multiple tables. This will add the data that has been processed in the top part of the workflow to the result of the called recursive workflow. This will then be output and received by the workflow that has called this recursive workflow. This will bubble up until the first recursive workflow which will in turn pass the final result to the workflow that has initially called the recursive workflow.

No alt text provided for this image

Here is the final out for a table with 5 rows and one with 10 rows.

No alt text provided for this image

This is a very simple workflow but demonstrates the functionality. Download the whole "Recursive" workflow example from KNIME-Hub: https://hub.knime.com/willem/spaces/Public/latest/Recursive/

If you don’t have KNIME Analytics Platform already you can download the fully functional desktop version, which is free, from https://www.knime.com/downloads

To view or add a comment, sign in

More articles by Willem Lottering

  • OData - No problem with KNIME

    Connecting to OData does not need to be difficult or over complicated. With KNIME it is very easy to connect to OData…

  • Looking for a reason to Upgrade KNIME

    Looking for a reason to justify an upgrade to the latest version of KNIME Analytics Platform, or I guess for any tool…

  • KNIME - Column Expressions

    KNIME Analytics Platform is a powerful tool for Data Scientists, but it is just as powerful for Data Engineers that…

  • My first KNIME node

    Thought that I should combine my KNIME and development knowledge by creating my first custom KNIME node. I've created a…

    1 Comment

Others also viewed

Explore content categories