Fastest Way to Delete Empty Records in 1d Array

Here is what I am using but maybe it can be done faster within this line and not have to go through the loop to test for empty values?  Thanks in advance!

$tmp_array = StringRegExpReplace($tmp_array, "\|\|+", "|")


$array = Get_Array()

$array = StripEmptyRecords($array, 0)

Func Get_Array()

    Local $array[7]

    $array[0] = "12345"
    $array[1] = "01621"
    $array[2] = "xyz"
    $array[3] = "abc@defg.com"
    $array[4] = " john smith"
    $array[5] = " sally turner "
    $array[5] = "  "
    $array[6] = "zxy"

    Return $array

EndFunc   ;==>Get_Array

Func StripEmptyRecords($array, $start_row)

    $tmp_array = _ArrayToString($array, "|", $start_row)
    $tmp_array = StringStripCR($tmp_array)

    $tmp_array = StringRegExpReplace($tmp_array, "\|\|+", "|")

    If StringRight($tmp_array, 1) = "|" Then
        $tmp_array = StringTrimRight($tmp_array, 1)

    If StringLeft($tmp_array, 1) = "|" Then
        $tmp_array = StringTrimLeft($tmp_array, 1)

    $final_array = StringSplit($tmp_array, "|", 2)

    For $x = UBound($final_array) - 1 To 0 Step -1
        If StringStripWS($final_array[$x], 8) = "" Then
            _ArrayDelete($final_array, $x)

    Return $final_array

EndFunc   ;==>StripEmptyRecords


Since _ArrayToString() already traverse the entire array you can do it by yourself and meanwhile keep just non empty data:

#include <Array.au3>

$array = Get_Array()

$array = StripEmptyRecords($array, 0)

Func Get_Array()

    Local $array[7]

    $array[0] = "12345"
    $array[1] = "01621"
    $array[2] = "xyz"
    $array[3] = "abc@defg.com"
    $array[4] = " john smith"
    $array[5] = " sally turner "
    $array[5] = "  "
    $array[6] = "zxy"

    Return $array

EndFunc   ;==>Get_Array

Func StripEmptyRecords(ByRef $aData, $iStart)
    If Not IsArray($aData) Then Return SetError(1, 0, False)
    Local $iElements = UBound($aData)
    If $iStart >= $iElements Then Return SetError(2, 0, False)
    Local $sResult
    For $Index = $iStart To $iElements - 1
        $sResult &= (StringStripWS($aData[$Index], 8) ? $aData[$Index] & '|' : '')
    Return StringSplit(StringTrimRight($sResult, 1), '|', 2)


Doing it without concatenation may be faster for larger arrays.

Based on what you're showing, you could do a simple for/next loop with a blank ret array or you could put in the bells and whistles to catch some errors like so:

#include <Array.au3>

Global $array = Get_Array()

$array = _StripEmptyElements($array)

Func Get_Array()

    Local $array[7]

    $array[0] = "12345"
    $array[1] = "01621"
    $array[2] = "xyz"
    $array[3] = "abc@defg.com"
    $array[4] = " john smith"
    $array[5] = " sally turner "
    $array[5] = "  " & @CRLF & "  "
    $array[6] = "zxy"

    Return $array

EndFunc   ;==>Get_Array

; $bNoWS = White spaces only count as empty element, true by default
Func _StripEmptyElements(ByRef $aArgs, $iStart = 0, $bNoWS = True)

    If UBound($aArgs, 2) Then Return SetError(1, 0, 0) ; out of bounds
    If $iStart = Default Or $iStart == -1 Then $iStart = 0
    If $bNoWS = Default Or $bNoWS == -1 Then $bNoWS = True

    Local $iUB = UBound($aArgs)
    ; catch start out of bounds
    If $iStart < 0 Or $iStart > $iUB - 1 Then Return SetError(2, 0, 0)

    Local $aRet[$iUB]
    Local $iEnum = 0
    ; build array without concatenation
    For $i = $iStart To $iUB - 1
        If StringLen($aArgs[$i]) == 0 Then ContinueLoop
        If $bNoWS Then
            If StringRegExp($aArgs[$i], "(?m)^\s+$") Then ContinueLoop
        $aRet[$iEnum] = $aArgs[$i]
        $iEnum += 1

    If $iEnum = 0 Then
        ; nothing found, but rather than return a false
        ;  set error and return array where user can do what they want with it
        Return SetError(2, 0, $aArgs)
    ; resize return array
    ReDim $aRet[$iEnum]
    ; return extended as the ubound of new array
    Return SetExtended($iEnum, $aRet)


Doesn't seem to be faster not even with 1 million of records.

#include <Array.au3>

$iLen = 1

For $Index = 1 To 6
    $iLen *= 10
    $aData = BuildArray($iLen)
    ConsoleWrite('Records in array: ' & $iLen & @CRLF)
    ConsoleWrite('Empty records: ' & @extended & @CRLF)

    $iTimer = TimerInit()
    $aNew1 = StripEmptyRecords($aData, 0)
    ConsoleWrite(Round(TimerDiff($iTimer), 2) & ' ms' & @TAB)
    ConsoleWrite('Records: ' & UBound($aNew1) & @CRLF)

    $iTimer = TimerInit()
    $aNew2 = _StripEmptyElements($aData)
    ConsoleWrite(Round(TimerDiff($iTimer), 2) & ' ms' & @TAB)
    ConsoleWrite('Records: ' & UBound($aNew2) & @CRLF)


Func BuildArray($iLen)
    Local $aData[$iLen]
    Local $sData, $iEmpty = 0
    For $Index = 0 To $iLen - 1
        $sData = (Random(1, 10, 1) = 5 ? '' : RandomString(Random(10, 50, 1)))
        $aData[$Index] = $sData
        If Not $sData Then $iEmpty += 1
    Return SetError(0, $iEmpty, $aData)

Func RandomString($iLen)
    Local Static $aChars = StringSplit('abcdefghijklmnopqrstuvwxyz', '')
    Local $sString
    For $Index = 1 To $iLen
        $sString &= $aChars[Random(1, $aChars[0], 1)]
    Return $sString

Func StripEmptyRecords(ByRef $aData, $iStart)
    If Not IsArray($aData) Then Return SetError(1, 0, False)
    Local $iElements = UBound($aData)
    If $iStart >= $iElements Then Return SetError(1, 0, False)
    Local $sResult
    For $Index = $iStart To $iElements - 1
        $sResult &= (StringStripWS($aData[$Index], 8) ? $aData[$Index] & '|' : '')
    Return StringSplit(StringTrimRight($sResult, 1), '|', 2)

Func _StripEmptyElements(ByRef $aArgs, $iStart = 0, $bNoWS = True)

    If UBound($aArgs, 2) Then Return SetError(1, 0, 0) ; out of bounds

    If $iStart = Default Or $iStart == -1 Then $iStart = 0
    If $bNoWS = Default Or $bNoWS == -1 Then $bNoWS = True

    Local $iUB = UBound($aArgs)
    ; catch start out of bounds
    If $iStart < 0 Or $iStart > $iUB - 1 Then Return SetError(2, 0, 0)

    Local $aRet[$iUB]
    Local $iEnum = 0

    ; build array without concatenation
    For $i = $iStart To $iUB - 1
        If StringLen($aArgs[$i]) == 0 Then ContinueLoop
        If $bNoWS Then
            If StringRegExp($aArgs[$i], "(?m)^\s+$") Then ContinueLoop
        $aRet[$iEnum] = $aArgs[$i]
        $iEnum += 1

    If $iEnum = 0 Then
        ; nothing found, but rather than return a false
        ;  set error and return array where user can do what they want with it
        Return SetError(2, 0, $aArgs)

    ; resize return array
    ReDim $aRet[$iEnum]
    ; return extended as the ubound of new array
    Return SetExtended($iEnum, $aRet)



Records in array: 10
Empty records: 0
0.1 ms    Records: 10
0.13 ms    Records: 10

Records in array: 100
Empty records: 6
0.5 ms    Records: 94
0.62 ms    Records: 94

Records in array: 1000
Empty records: 109
3.78 ms    Records: 891
5.15 ms    Records: 891

Records in array: 10000
Empty records: 996
31.35 ms    Records: 9004
41.58 ms    Records: 9004

Records in array: 100000
Empty records: 9824
312.28 ms    Records: 90176
399.45 ms    Records: 90176

Records in array: 1000000
Empty records: 100389
3134.44 ms    Records: 899611
4157.58 ms    Records: 899611


That's pretty interesting.  The regex overhead is the only thing that I can think of for that with a single redim.  If I'm being honest, I didn't even see your post before I posted.  And was to tired to test yours out once I posted lol.  But proofs in the pudding so to speak.

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

The first is quite fast.  I kept the second for reference.

Func StripEmpty2(ByRef $array, $iStart = 0)
  Local $sArray = StringRegExpReplace(_ArrayToString($array, Default, $iStart), "^\s*\||(?<=\|)\s*\||\|(?=[\s\|]*$)|\s*$", "")
  Return StringSplit($sArray, "|", $STR_NOCOUNT)

Both have been modified to resolve misinterpretation of $iStart meaning

Func StripEmpty3(ByRef $array, $iStart = 0)
  Local $aTemp = _ArrayFindAll($array, "^\s*$", $iStart, Default, 0, 3)
  If @error Then
    If $iStart Then _ArrayDelete($array, "0-" & $iStart - 1)
  For $i = $iStart - 1 To 0 Step -1
    _ArrayInsert($aTemp, 0, $i)
  _ArrayInsert($aTemp, 0, UBound($aTemp))
  _ArrayDelete($array, $aTemp)
9 minutes ago, SmOke_N said:

That's pretty interesting.  The regex overhead is the only thing that I can think of for that with a single redim.  If I'm being honest, I didn't even see your post before I posted.  And was to tired to test yours out once I posted lol.  But proofs in the pudding so to speak.

Redim it's quite expensive with large arrays. With few indices every version is more than enough.

I have added a function and packed the whole thing into a speed comparison:

#include <Array.au3>

Global Const $aArrayRaw = Get_Array()
Global $f_DecimalPlaces = 1
Global $iT, $a_Results[0][3]

Func Get_Array()
    Local $aArray[1e6]

    For $i = 0 To UBound($aArray) - 1
        $aArray[$i] = Random(0,2,1) = 2 ? " " : "x"

    Return $aArray
EndFunc   ;==>Get_Array

; the first measurement
$aArray = $aArrayRaw
ReDim $a_Results[UBound($a_Results) + 1][3]
$a_Results[UBound($a_Results) - 1][0] = "Andreik"
$iT = TimerInit()
$aArray = _strip_Andreik($aArray)
$iT = TimerDiff($iT)
$a_Results[UBound($a_Results) - 1][1] = ($iT)

; the second measurement
$aArray = $aArrayRaw
ReDim $a_Results[UBound($a_Results) + 1][3]
$a_Results[UBound($a_Results) - 1][0] = "SmOke_N"
$iT = TimerInit()
$aArray = _strip_SmOke_N($aArray)
$iT = TimerDiff($iT)
$a_Results[UBound($a_Results) - 1][1] = ($iT)

; the third measurement
$aArray = $aArrayRaw
ReDim $a_Results[UBound($a_Results) + 1][3]
$a_Results[UBound($a_Results) - 1][0] = "Nine 1"
$iT = TimerInit()
$aArray = _stripNine1($aArray)
$iT = TimerDiff($iT)
$a_Results[UBound($a_Results) - 1][1] = ($iT)

; the fourth measurement
$aArray = $aArrayRaw
ReDim $a_Results[UBound($a_Results) + 1][3]
$a_Results[UBound($a_Results) - 1][0] = "Nine 2"
$iT = TimerInit()
$aArray = _stripNine2($aArray)
$iT = TimerDiff($iT)
$a_Results[UBound($a_Results) - 1][1] = ($iT)

; the fifth measurement
$aArray = $aArrayRaw
ReDim $a_Results[UBound($a_Results) + 1][3]
$a_Results[UBound($a_Results) - 1][0] = "AspirinJunkie"
$iT = TimerInit()
$iT = TimerDiff($iT)
$a_Results[UBound($a_Results) - 1][1] = ($iT)

; calculate results and print them out
_ArraySort($a_Results, 0, 0, 0, 1)
For $i = 0 To UBound($a_Results) - 1
    $a_Results[$i][2] = Round($a_Results[$i][1] / $a_Results[0][1], 2)
    $a_Results[$i][1] = Round($a_Results[$i][1], $f_DecimalPlaces)
_ArrayDisplay($a_Results, "Measurement Results", "", 16 + 64, Default, "name|time [ms]|factor")

Func _strip_Andreik(ByRef $aData, $iStart = 0)
    If Not IsArray($aData) Then Return SetError(1, 0, False)
    Local $iElements = UBound($aData)
    If $iStart >= $iElements Then Return SetError(2, 0, False)
    Local $sResult
    For $Index = $iStart To $iElements - 1
        $sResult &= (StringStripWS($aData[$Index], 8) ? $aData[$Index] & '|' : '')
    Return StringSplit(StringTrimRight($sResult, 1), '|', 2)

Func _stripNine1(ByRef $array, $iStart = 0)
    Local $sArray = StringRegExpReplace(_ArrayToString($array, Default, $iStart), "\|\s*$|(?<=\|)\s*\|", "")
    Return StringSplit(($iStart ? _ArrayToString($array, Default, 0, $iStart - 1) & "|" : "") & $sArray, "|", $STR_NOCOUNT)

Func _stripNine2(ByRef $array, $iStart = 0)
    Local $aTemp = _ArrayFindAll($array, "^\s*$", $iStart, Default, 0, 3)
    _ArrayInsert($aTemp, 0, UBound($aTemp))
    _ArrayDelete($array, $aTemp)
    Return $array

; $bNoWS = White spaces only count as empty element, true by default
Func _strip_SmOke_N(ByRef $aArgs, $iStart = 0, $bNoWS = True)

    If UBound($aArgs, 2) Then Return SetError(1, 0, 0) ; out of bounds
    If $iStart = Default Or $iStart == -1 Then $iStart = 0
    If $bNoWS = Default Or $bNoWS == -1 Then $bNoWS = True

    Local $iUB = UBound($aArgs)
    ; catch start out of bounds
    If $iStart < 0 Or $iStart > $iUB - 1 Then Return SetError(2, 0, 0)

    Local $aRet[$iUB]
    Local $iEnum = 0
    ; build array without concatenation
    For $i = $iStart To $iUB - 1
        If StringLen($aArgs[$i]) == 0 Then ContinueLoop
        If $bNoWS Then
            If StringRegExp($aArgs[$i], "(?m)^\s+$") Then ContinueLoop
        $aRet[$iEnum] = $aArgs[$i]
        $iEnum += 1

    If $iEnum = 0 Then
        ; nothing found, but rather than return a false
        ;  set error and return array where user can do what they want with it
        Return SetError(2, 0, $aArgs)

    ; resize return array
    ReDim $aRet[$iEnum]
    ; return extended as the ubound of new array
    Return SetExtended($iEnum, $aRet)

Func _strip_AspirinJunkie(ByRef $A, $iStart = 0, $iEnd = UBound($A) - 1)
    Local $x = $iStart
    For $i = $iStart To $iEnd
        If StringIsSpace($A[$i]) Then ContinueLoop
        $A[$x] = $A[$i]
        $x += 1
    Redim $A[$x]

Nine`s first function performs best, but would still have to be adapted to certain special cases for which it currently does not work, depending on the type of data: 1. if first array element is empty it`s still in the array 2. only line breaks are not recognized as empty strings (which may be correct depending on the context) and 3. if pipes ("|") occur in the strings.


It looks like replacing the costly StringStripWS() with StringIsSpace() the function performs much better:

Func _strip_Andreik(ByRef $aData, $iStart = 0)
    If Not IsArray($aData) Then Return SetError(1, 0, False)
    Local $iElements = UBound($aData)
    If $iStart >= $iElements Then Return SetError(2, 0, False)
    Local $sResult
    For $Index = $iStart To $iElements - 1
        $sResult &= (StringIsSpace($aData[$Index]) ? '' : $aData[$Index] & '|')
    Return StringSplit(StringTrimRight($sResult, 1), '|', 2)

Totally forgot about this function. Thanks @AspirinJunkie.

@AspirinJunkie Thanks for the evaluation.  I edited the pattern of my first script above to meet all (as far of my tests went) special cases you mentioned.  As for CR, if it is required, OP can just add it along with \s as alternates.  In the case of pipes included in the array, OP can change the separation character to anything he wants if needs be.


The pattern is still not enough to match all empty rows of the array. Try this:

#include <Array.au3>

Local $array[7]
$array[0] = "  "
$array[1] = "01621"
$array[2] = "xyz"
$array[3] = ""
$array[4] = " john smith"
$array[5] = " sally turner "
$array[6] = ""

$array = StripEmpty2($array)

Func StripEmpty2(ByRef $array, $iStart = 0)
  Local $sArray = StringRegExpReplace(_ArrayToString($array, Default, $iStart), "\|\s*$|(?<=\|)\s*\|", "")
  Return StringSplit(($iStart ? _ArrayToString($array, Default, 0, $iStart - 1) & "|" : "") & $sArray, "|", $STR_NOCOUNT)

First index is not stripped. Also having two or more empty rows at the end of the array will leave an empty space:

#include <Array.au3>

Local $array[8]
$array[0] = "  "
$array[1] = "01621"
$array[2] = "xyz"
$array[3] = ""
$array[4] = " john smith"
$array[5] = " sally turner "
$array[6] = ""
$array[7] = ""

$array = StripEmpty2($array)

Func StripEmpty2(ByRef $array, $iStart = 0)
  Local $sArray = StringRegExpReplace(_ArrayToString($array, Default, $iStart), "\|\s*$|(?<=\|)\s*\|", "")
  Return StringSplit(($iStart ? _ArrayToString($array, Default, 0, $iStart - 1) & "|" : "") & $sArray, "|", $STR_NOCOUNT)



@Nine Try this to see an unexpected behavior:

#include <Array.au3>

Local $array[10]
$array[0] = " "
$array[1] = "  "
$array[2] = "  "
$array[3] = "  test "
$array[4] = "   "
$array[5] = " "
$array[6] = "  more tests  "
$array[7] = " "
$array[8] = ""
$array[9] = "   "

ConsoleWrite(StringLen($array[6]) & @CRLF)

$array = StripEmpty2($array)

ConsoleWrite(StringLen($array[1]) & @CRLF)

Func StripEmpty2(ByRef $array, $iStart = 0)
  Local $sArray = StringRegExpReplace(_ArrayToString($array, Default, $iStart), "^\s*\||(?<=\|)\s*\||\|(?=[\s\|]*$)", "")
  Return StringSplit($sArray, "|", $STR_NOCOUNT)

Spaces of the last row are added to the last non empty data due to regex pattern matching only the last pipe delimitator without the following spaces.


You need one more update (don't hate me :lol: )

#include <Array.au3>

Local $array[3]
$array[0] = ''
$array[1] = ''
$array[2] = ' some data   '

ConsoleWrite(StringLen($array[2]) & @CRLF)
$array = StripEmpty2($array)
ConsoleWrite(StringLen($array[0]) & @CRLF)


Func StripEmpty2(ByRef $array, $iStart = 0)
  Local $sArray = StringRegExpReplace(_ArrayToString($array, Default, $iStart), "^\s*\||(?<=\|)\s*\||\|(?=[\s\|]*$)|\s*$", "")
  Return StringSplit($sArray, "|", $STR_NOCOUNT)

I think you can remove the positive lookahead from the last capturing group for a simpler pattern like this:


It might be slightly faster but don't expect anything considerable.

Interesting.  I agree your pattern is simpler to follow, but for some reason that escapes me, the positive lookahead is a tad (10-15%) faster on any array length.

Maybe it is the capturing group that causing it to be slower ?

