Arrays in PowerShell and the += operator performance

15 years ago I was highly impressed by the book “The Art of Electronics” by Paul Horowitz and Winfield Hill. It is a perfect sample of learning from examples. In addition to having examples of good electronic circuits, it also contains examples of bad ideas, with discussions of what makes the good designs good and the bad ones bad. Let’s try to do the same with a problem that makes every second PowerShell script slow.

When you need to create an array and put elements into it, the most popular way is something like that:

$arr = @()
for ($i = 0; $i -lt 10; $i++) { 
    $arr +=  "test string $i" 
}

It works fine when you work with a small number of elements, but what if you need to add 10 thousand or 1 million of elements? 

I created a simple test script, that adds one thousand of elements to the arrays of different sizes using += operator and measure the time. Looks like the running time increases linearly with the size of the input.

The reason is very simple. Microsoft documentation: “When you use the += operator, PowerShell actually creates a new array with the values of the original array and the added value. This might cause performance issues if the operation is repeated several times or the size of the array is too big”. 

Let's try to rewrite the code in different ways. All four samples do the same: create an array and fill it with 100 thousand of elements. First one is "bad design", others are good. It is up to you which one to choose. In some cases, it is easy to estimate the number of elements in the new array, or rewrite your for cycle to the PowerShell pipeline. The other option is using .Net classes to organize your arrays. ArrayList class is a universal replacement of PowerShell arrays with no performance issues.

# Simple demo of difference in array performance 
$ArrSize    = 100000
# Adding elements using += operator
Measure-Command {
    $arr = @()
    for ($i = 0; $i -lt $ArrSize; $i++)
    { 
        $arr += "test" + $i
    }
} | select -ExpandProperty TotalSeconds | 
        % { Write-Host "Adding items to array with += operator: $_ sec" }
# Create array of predefined size and fill it with elements
Measure-Command {
    $arr = @($null)*$ArrSize 
    for ($i = 0; $i -lt $ArrSize; $i++)
    { 
        $arr[$i] = "test" + $i
    }
} | Select -ExpandProperty TotalSeconds | 
        % { Write-Host "Create array with null elements and fill them with values:" $_ "sec" }
# Use Powershell pipeline
Measure-Command {
    $arr = 0..$ArrSize | % { "test" + $_ } 
} | Select -ExpandProperty TotalSeconds | 
        % { Write-Host "Generating array and save it to variable: $_ sec"}

# Use System.Collections.ArrayList class
Measure-Command {
    $Arr = New-Object -TypeName System.Collections.ArrayList
        for ($i = 0; $i -lt $ArrSize; $i++)
    { 
        $arr.Add("test" + $i)
    }
} | Select -ExpandProperty TotalSeconds | 
        % { Write-Host "Using ArrayList class:" $_ "sec" }

The results are impressing:

Adding items to array with += operator: 548.9887574 sec
Create array with null elements and fill them with values: 0.9157684 sec
Generating array and save it to variable: 0.8700861 sec
Using ArrayList class: 0.3603113 sec

No comments required. += is obviously the worst possible way to fill an array with data. I have no idea why PowerShell decided to use this class instead of ArrayList, but now you know several ways, how to avoid this performance issue.

thanks. Working like a charm!! :)

Like
Reply

Kudos to Andrei Grigorev for explaining me this issue a year ago! :)

To view or add a comment, sign in

More articles by Vladimir Provorov

Others also viewed

Explore content categories