Monitoring Windows Processes
In the world of Performance Engineering, we all know how important is data and the different kinds of trends it provides us in form of graphs, numbers, pie-charts, etc. The data can be any component or part of the n-tier architecture be it on-prem or cloud based and also the operating system on which these massive servers are deployed.
In this section, I would talk about monitoring one such trend on Windows operating system which I recently worked upon while doing tuning of one of the applications.
Let me start with the problem statement first (I will not be able to provide it from start but would place a context enough to cover the topic). The product under test involves Server side which runs on Java platform and the client side which operates on .Net platform. Both server and Client runs on Windows machines.
Problem Statement: After doing performance analysis with lot of tweaks on the both server and Client side, it was seen that the performance issue is on the Client side. With the memory usage trend in Grafana we could see that the memory was increasing gradually over the time and by the 3rd day of the test would spike up to 85-90% utilization across on all the RDP machines which are being used as simulators to generate load and as a result the machines would run out of virtual memory.
Approach: Now that we know we are running out of memory, the next course of action was to find out if the application process of the client was causing the issue and hence it was important to understand the memory trend of a specific process (main application process in this case) of the client machine. Remember, in performance engineering the aim should be to eliminate the various factors contributing to the performance aspect which eventually helps in determining and evaluating the real bottleneck issue. Other part of it was that we were already monitoring many metrics in Grafana and thus had no more possibility of adding any new ones. This brings us to the next part on how we created a batch script and used MS-Excel to create the graphs.
@echo off
set "LOG_FILE=process-monitor-%COMPUTERNAME%.csv"
set "HEADER=DATE_TIME,PID,PRIVATE_SIZE,USERNAME"
set "PROCESS_NAME=<<process.exe>>"
if not exist "%LOG_FILE%" (
echo %HEADER% > "%LOG_FILE%"
)
for /f "tokens=2,4,7 delims= " %%i in ('tasklist /nh /v /fi "imagename eq %PROCESS_NAME%"') do (
echo %date:~4% %time%,%%i,"%%j",%%k >> "%LOG_FILE%" )
The above snippet is of the utility script which can be used as a batch file to extract memory information on DATE_TIME,PROCESS ID,PRIVATE_SIZE,USERNAME. The script can run as a job and outputs the details into a csv log file.
Note - You can enter the name of the process to monitor in <<process.exe>> and select the tokens value depending on the data you want to monitor
In order to run the script in a loop format every time in windows, we used Task Scheduler feature of Windows. You can find it: Open the Run prompt and type 'taskschd.msc'.
Use the create task option and fill out the required fields such as the interval to run the task, timeout of the task and upload the utility script in the Action tab and choose the path in Start in (Optional) field where it would output the csv file.
Run the task and it would create a csv delimetted file in the desired location.
Generating the graph:
Open MS-Excel -> Paste the created csv file -> Choose Insert tab from the top and select Pivot table -> Choose the data which needs to be kept on the x-axis and y-axis respectively -> Insert the 2D line trend. This would plot the graph with the available data and the trend can be seen.
We had the below trend
It can be seen from the trend that over a period of 2 days the specific process memory for all the users gradually increased from 6000 MB to 8000 MB ( an increase of 33.33%) and that if allowed to run for more days could possibly lead to exhaustion of memory. This behavior indicated a possible reason of a memory leak happening within the process.
Thus with these figures we were able to go back to the developers and further investigate the issue at the code level and resolve it.
Any reason for not using perfmon? And why can't you leverage grafana to monitor the process?