System.Threading.Channel is friend for programmer
Problem statement
Today I was analyzing a system that was ported from physical VM to Pivotal cloud foundry after converting to .NET Core. The system is a batch process run on a schedule coded in. It would need to process millions(close of records 15 to20 million) per month.
The Legacy system was running on VM with some 16 to 32 GB of memory and good hard drive space before conversion. It had good threading logic in place meaning, we can say how many threads can be run concurrently and how many records per policy could be allocated per thread. The defaults for legacy was to have some 5 threads with 5000 policy per thread. Over a period of time there were some additional business logic that was added. This involved additional serialization and deserialization of some binary content. So all was in happy path until the new business logic was added which throwed some 'out of memory exception'. There was a experienced developer, who by trial and error found that by adjusting the policy size per thread he could still get way with the memory issue. The time period was not an issue as it ran mostly after business hours. There was no attempt made then to see what was the real cause of the issue.
Now when back to cloud, since the application was in PCF, we had to specify the memory limit. That is where the problem started. We just initially started with 2 GB. But then the old devil 'out of memory exception' started showing its horns.
Sloutions
One soultion for us was to increase the container size. We could increase the size to may be 6 GB and have number of records per thread to be around 3000 and get the whole thing working.
But what about the time. It would still take such a long time. So we decide that we should put in some optimization effort. 3000 policy was fine but the memory size was unaccpetable. Second the running time.
On investigating we found that one of the portion of the code was doing a SQL bulk copy. 5000 records would be read from one database. On each record there was some processing that was done. Then it was transformed into a data table. This data table was then bulk copied.
Ideally within the thread ever thing was sequential. The data reader was yeilding one record at a time. So two things where happening. The source data reader was blocked until the business process(very processor intensive) was completed. Second holding 3000 to 5000 records per thread in memory in data tabe. That multiplied by number of threads say 5 was at any given point was 15000 to 25000 records in memory. Phew!!!. This was good for system in IaaS but not in container.
This is when we introduced system.threading.channel. We introduced two Bounded Channel. The data reader started puting the retrieved object in a channel. This was received and processed by the business process function in a channel, Say A, once the processing was done, the object was converted to data row. and added to the table. If the number of records in that data table was ,say greater than, 200, we would put that data table in another channel say B. This was received by another listener which then bulk copied the same to database.
This gave us double benefits. From memory foot print aspect, if I ought to reduce the memory of container and I do not want to touch the thread count or record count per thread, I still can decrease the number of records to hold in-memory prior to bulk copy. So I can squeeze the limits. From performance standpoint, since the data reader was just dumping the content to channel A, it was also able to run faster, not waiting for the business logic to complete. All these improved the performance dramatically to an extent where a process that was running for 6 hours is now completed in 1.5 hours.
Conclusion
The ideal solution in such case is to use Dataflow(https://docs.microsoft.com/en-us/dotnet/standard/parallel-programming/dataflow-task-parallel-library). But not all the time we will have the luxury of re-writing the entire code using the it.It could be because of time or framework constraints. In such cases we can use the miniature of this in our code, which is, System.Threading.Channel.
Nice one Venkatesh.