SAP Datasphere Data Modelling: UPSERT And SCD Using Transformation Flow
In SAP Datasphere standard SQL View, it is not possible to write any data manipulation commands like Insert, Update and Delete. We need to use the Transformation Flow to achieve this and it is tricky. And if we need to maintain Slowly Changing Dimensions, it becomes even more tricky. On top of it if you want to mark deleted records and/or physically delete the records in SCD, that a lot more complex with multi-step workarounds.
Let's explore how to achieve all these in Datasphere.
For this article, it is assumed you have basic understanding of SAP Datasphere Transformation Flow and Data Flow.
Source Table
A standard Relational Dataset table with two columns set as Keys.
Target Table
Target table is also a standard Relational Dataset type table. It is has the same columns and keys.
The Key columns are the same as Source Table. This is important as the data is matched based on the Key columns.
How to INSERT data from Source to Target?
We want to achieve the equivalent of the below SQL.
INSERT INTO Target
SELECT * FROM Source
WHERE condition;
In SAP Datasphere, we will use the Transformation Flow as below.
The result is that all the rows from the Source is inserted into the Target table.
Note that the system has updated the Change Type and Change Date columns automatically.
Datasphere INSERT vs. UPSERT
In reality Transformation Flows are UPSERT Commands.
The first time the flow is run, since the Target table is empty, all the rows from Source table is inserted into Target.
Now if we re-run the flow again, there will be NO changes in the Target table as there were no changes in the Source records.
How to update the Target Table column values?
Now, we want to do the equivalent of UPDATE command in SQL.
UPDATE Target
SET
column1 = value1,
column2 = value2, ...
WHERE condition;
To achieve this, first let's change the first row of the Source Table as below.
Now let's re-run the same flow.
We can notice in the Target table the relevant record is updated.
The Change Type is sill 'L' (and I don't know why). We can see for the remaining the values did not change as the Source values did not change but we can see the Change Date Timestamp is changed for all records - meaning all the records were updated based on Source.
Therefore Transformation Flow has done an UPSERT command.
How to update a SINGLE COLUMN in the Target Table ?
To update the single column on the Target table, we retain the mappings for the Key columns and include only the column we want to update in Target.
Let's say we want to update only the New Amount column.
Let's change the Source data as below.
The Sales Date and New Amount columns of Source Table is changed for first two records.
We will only map the New Amount column to updated in the Source Table, along with the Key columns.
Now if we run the flow, we can see only the New Amount column is updated in the Target Table - UPSERT.
How to insert new records to Target Table?
If the Source Table has new records, if they are not present in the Target Table based on the Key Columns, then these records will be added to the Target Table.
How to update a column of a table (without Source Table)?
Let's say we want to update the Target Table column New Amount as New Amount * 2.
In this case you can use the same table as Source and Target in Transformation Flow. In the Transformation for the New Amount you can modify it to New Amount * 2. Run the flow and the column will be updated.
Recommended by LinkedIn
How to DELETE records in Target Table when Source records are deleted?
I haven't found a way to DELETE records in Target Table when the Source Table records are deleted. Table records can be manually deleted using the GUI. It can be done manually but not using the Transformation Flow.
Workaround : We can maintain a Record Status column in Target. In Transformations you can LEFT JOIN Target and Source to identify records that are not in Target and update the Record Status column as DELETED. Then later there can be a scheduled job in the table to delete those records (Data Integration - Table Monitor).
Can we maintain Slowly Changing Dimensions (SCD) using Transformation Flows?
We can use the combination of the above techniques to MANUALLY main a table as SCD.
In Transformations, use LEFT JOIN technique to identify new, existing and deleted records. Use the Record Status column to update as New, Deleted, etc.
Can we use Data Flows for Slowly Changing Dimensions?
First item to note is that we cannot use a Delta enabled table as Target in Data Flows. It can be used as Source. Data Flows are comparatively slow. There are very limited transformations capabilities are available.
Let's do some quick tests:
1] With APPEND mode, we can add records from Source to Target table (empty table).
This will add the records to the Target table.
2] Again let's rerun the Data Flow with APPEND mode.
It will fail due to Keys violation as we are trying to insert the same records again.
3] Let's change the MODE to APPEND with UPSERT enabled. Now run the Data Flow.
The Target records will be updated. If you have a Timestamp column in Target and set the default value to Current Timestamp, you can notice it will be updated which indicates the records are actually updated.
4] Now add a new record to Source and re-run the Data Flow.
The new record will be added to the Target table. BUT if you notice the Timestamp column, you will notice it has updated for ALL the records. As per the setup, it has performed the UPSERT operation.
5] Remove one of the Source records and re-run the flow.
No records are deleted in the Target table. This is expected as we are in UPSERT mode.
But you can see the Timestamp for the records existing in the Source table is updated. For the record that was deleted in the Source shows old the Timestamp.
Therefore Data Flow also do not delete records in Target when Source records are deleted.
6] Let's change the mode in the Data Flow to DELETE.
It gives a warning that the "Matching records will be deleted". It is very important to understand this !!! You can specify which columns to be used for the data matching for the deletion.
Data Flow in DELETE mode does not delete the records in Target table which were deleted in the Source. It will take Source records and Target records and DELETE the MATCHING RECORDS where the columns are matching based on the column mappings.
If we want to delete the records in Target based on the deleted records in the Source, we will need to select the missing records in Target using a Transformation Flow or SQL View (LEFT JOIN). Use that as the Source with DELETE mode in Data Flow so the matching records will be deleted in the Target.
6] Updating a single column in the Target Table from Source Table.
There is a major catch when updating the single column in the Target.
The Target columns will be updated based on the Source based on the column mappings. BUT the columns that we excluded for mapping, will all be updated with NULL !!!!
So be very careful when trying to update a single column. The workaround is to always update all the columns.
Summary
We can use Transformation Flows for UPSERTS and for single column updates. But Transformation Flows cannot delete records in Target. We can do work around with status column and scheduled delete for the table.
Data Flows can perform UPSERTS. Data Flows are very slow and has very limited transformation capabilities. Care must be taken using Delete mode as it will delete the matching records in Target. When updating a single column in the Target based on Source, it will make the non-mapped columns NULL.
Based on the requirements, the combination of Transformation Flow and/or Data Flow can be used for the Slowly Changing Dimensions with deletes.
Quirks....
1] You cannot mix Delta enabled tables and other types of tables in the transformation part of the Transformation Flow.
2] If you are replicating data from SAP S4, any empty columns with no value will be sent to Datasphere as EMPTY columns. That means it is NOT NULL or NOT BLANK. It is EMPTY.
You cannot check for Column IS NOT NULL. Since the column is EMPTY, it will return TRUE for NOT NULL as EMPTY is not equal to NULL.
You will need to use Column != ''
ColumnName != '' --two single quotes with no space in between
The wierdest part is, when you preview these EMPTY value in table or flow data preview, in data grid it will show as '-' (two quotes and dash in between). When you copy the column value, it will copy the dash. But if you check for ColumnName = '-', it will give FALSE.
3] If you use any derived columns in the transformation with no value, then these columns will have NULL for no values.
4] Because SAP S4 sends no value as EMPTY, this creates problem when the column is one of the Key columns.
Let's say Column A is a key. In the first iteration, the Column A is EMPTY value and the row get added. Even though key column is always set to NOT NULL, since EMPTY is not NULL, the row will get added. In the second iteration, if Column A has a value, a NEW row will be added to the table - since the new value is not equal to EMPTY. The flow will consider based on key columns, the Column A has a new value and hence it is considered a new row.