#1499 closed Feature Request (Fixed)
Performance increase to internal __FTP_ListToArray()
| Reported by: | Beege | Owned by: | J-Paul Mesnage |
|---|---|---|---|
| Milestone: | 3.3.7.0 | Component: | Standard UDFs |
| Version: | Severity: | None | |
| Keywords: | _FTPEx | Cc: |
Description
In the current internal function _FTP_ListToArray, every time a new item is added to the array, the array is Resized by only +1, which can be a major performance hit when getting directory listing of large numbers. If we instead Resize by ubound*2, and then trim unused slots at the end of function, we can avoid a lot or resizing. I learned this method from the original _filelisttoarray() func.
I have included a example with changes to the function and a demonstration of the new vs. old. I have put a set of XXXXXXXXXXXXXXXXXXXXXXXXXXX around any lines that I changed or added. Not that many changes were needed since the function already kept track of array size in index $array[0]. Thank you for your consideration and hard work in creating this wonderful language.:)
Attachments (1)
Change History (11)
by , 16 years ago
| Attachment: | _FTP_Listtoarray.au3 added |
|---|
comment:1 by , 16 years ago
| Version: | 3.3.4.0 |
|---|
comment:2 by , 16 years ago
| Milestone: | → 3.3.7.0 |
|---|---|
| Owner: | changed from to |
| Resolution: | → Fixed |
| Status: | new → closed |
Fixed by revision [5802] in version: 3.3.7.0
comment:3 by , 15 years ago
His version is not going to work as coded.
If $Return_Type is '1' (Directories only) or '2' (Files only) he never bothers to check the value of $Arraycount for the Redim() at the end of the routine. As coded, it simply defaults to returning a one-dimension array. Therefore, with $Return_Type set to '1' or '2', the proposed FTP_ListToArray() internal routine would only work correctly when called from _FTP_ListToArray(), and would fail when called by either _FTP_ListToArray2D() or _FTP_ListToArrayEx().
Also, why all the extra code to build and maintain separate arrays for files and directories, when they are ALWAYS combined at the end?
I'd suggest the following, which fixes the Redim() bug, eliminates a bunch of useless code, and is faster yet:
Func __FTP_ListToArray($l_FTPSession, $Return_Type = 0, $l_Flags = 0, $bFmt = 1, $ArrayCount = 6, $l_Context = 0) Local $tWIN32_FIND_DATA, $tFileTime, $IsDir, $callFindNext Local $Index = 0 If $ArrayCount = 1 Then ; filename only Local $Array[1] Else Local $Array[1][$ArrayCount] EndIf If $Return_Type < 0 Or $Return_Type > 2 Then Return SetError(1, 0, $Array) ;~ Global Const $tagWIN32_FIND_DATA = "DWORD dwFileAttributes; dword ftCreationTime[2]; dword ftLastAccessTime[2]; dword ftLastWriteTime[2]; DWORD nFileSizeHigh; DWORD nFileSizeLow; dword dwReserved0; dword dwReserved1; WCHAR cFileName[260]; WCHAR cAlternateFileName[14];" $tWIN32_FIND_DATA = DllStructCreate($tagWIN32_FIND_DATA) Local $callFindFirst = DllCall($__ghWinInet_FTP, 'handle', 'FtpFindFirstFileW', 'handle', $l_FTPSession, 'wstr', "", 'ptr', DllStructGetPtr($tWIN32_FIND_DATA), 'dword', $l_Flags, 'dword_ptr', $l_Context) If @error Or Not $callFindFirst[0] Then Return SetError(1, _WinAPI_GetLastError(), 0) Do $IsDir = BitAND(DllStructGetData($tWIN32_FIND_DATA, "dwFileAttributes"), $FILE_ATTRIBUTE_DIRECTORY) = $FILE_ATTRIBUTE_DIRECTORY If ($IsDir And $Return_Type <> 2) Or (Not $IsDir And $Return_Type <> 1) Then $Index += 1 If $ArrayCount = 1 Then If UBound($Array) < $Index+1 Then ReDim $Array[$Index*2] $Array[$Index] = DllStructGetData($tWIN32_FIND_DATA, "cFileName") Else If UBound($Array) < $Index+1 Then ReDim $Array[$Index*2][$ArrayCount] $Array[$Index][0] = DllStructGetData($tWIN32_FIND_DATA, "cFileName") $Array[$Index][1] = _WinAPI_MakeQWord(DllStructGetData($tWIN32_FIND_DATA, "nFileSizeLow"), DllStructGetData($tWIN32_FIND_DATA, "nFileSizeHigh")) If $ArrayCount = 6 Then $Array[$Index][2] = DllStructGetData($tWIN32_FIND_DATA, "dwFileAttributes") $tFileTime = DllStructCreate($tagFILETIME, DllStructGetPtr($tWIN32_FIND_DATA, "ftLastWriteTime")) $Array[$Index][3] = _Date_Time_FileTimeToStr( $tFileTime ,$bFmt) $tFileTime = DllStructCreate($tagFILETIME, DllStructGetPtr($tWIN32_FIND_DATA, "ftCreationTime")) $Array[$Index][4] = _Date_Time_FileTimeToStr( $tFileTime ,$bFmt) $tFileTime = DllStructCreate($tagFILETIME, DllStructGetPtr($tWIN32_FIND_DATA, "ftLastAccessTime")) $Array[$Index][5] = _Date_Time_FileTimeToStr( $tFileTime ,$bFmt) EndIf EndIf EndIf $callFindNext = DllCall($__ghWinInet_FTP, 'bool', 'InternetFindNextFileW', 'handle', $callFindFirst[0], 'ptr', DllStructGetPtr($tWIN32_FIND_DATA)) If @error Then Return SetError(2, _WinAPI_GetLastError(), 0) Until Not $callFindNext[0] DllCall($__ghWinInet_FTP, 'bool', 'InternetCloseHandle', 'handle', $callFindFirst[0]) ; No need to test @error. If $ArrayCount = 1 Then ; filename only $Array[0] = $Index ReDim $Array[$Index+1] Else $Array[0][0] = $Index ReDim $Array[$Index + 1][$ArrayCount] EndIf Return $Array EndFunc ;==>__FTP_ListToArray
comment:4 by , 15 years ago
Actually, this with the test "If $Return_Type - $IsDir <> 1 Then" to determine what is included in the returned array is likely quicker than the compound statement in the last example, and I'd think maintaining a simple $ArraySize variable would be cleaner than calling Ubound() for each iteration of the main loop. So, here's a final cleaned-up version:
Func __FTP_ListToArray($l_FTPSession, $Return_Type = 0, $l_Flags = 0, $bFmt = 1, $ArrayCount = 6, $l_Context = 0) Local $tWIN32_FIND_DATA, $tFileTime, $IsDir, $callFindNext Local $Index = 0, $ArraySize = 16 If $ArrayCount = 1 Then ; filename only Local $Array[16] Else Local $Array[16][$ArrayCount] EndIf If $Return_Type < 0 Or $Return_Type > 2 Then Return SetError(1, 0, $Array) ;~ Global Const $tagWIN32_FIND_DATA = "DWORD dwFileAttributes; dword ftCreationTime[2]; dword ftLastAccessTime[2]; dword ftLastWriteTime[2]; DWORD nFileSizeHigh; DWORD nFileSizeLow; dword dwReserved0; dword dwReserved1; WCHAR cFileName[260]; WCHAR cAlternateFileName[14];" $tWIN32_FIND_DATA = DllStructCreate($tagWIN32_FIND_DATA) Local $callFindFirst = DllCall($__ghWinInet_FTP, 'handle', 'FtpFindFirstFileW', 'handle', $l_FTPSession, 'wstr', "", 'ptr', DllStructGetPtr($tWIN32_FIND_DATA), 'dword', $l_Flags, 'dword_ptr', $l_Context) If @error Or Not $callFindFirst[0] Then Return SetError(1, _WinAPI_GetLastError(), 0) Do $IsDir = BitAND(DllStructGetData($tWIN32_FIND_DATA, "dwFileAttributes"), $FILE_ATTRIBUTE_DIRECTORY) = $FILE_ATTRIBUTE_DIRECTORY If $Return_Type - $IsDir <> 1 Then $Index += 1 If $ArrayCount = 1 Then If $Index = $ArraySize Then $ArraySize *= 2 ReDim $Array[$ArraySize] EndIf $Array[$Index] = DllStructGetData($tWIN32_FIND_DATA, "cFileName") Else If $Index = $ArraySize Then $ArraySize *= 2 ReDim $Array[$ArraySize][$ArrayCount] EndIf $Array[$Index][0] = DllStructGetData($tWIN32_FIND_DATA, "cFileName") $Array[$Index][1] = _WinAPI_MakeQWord(DllStructGetData($tWIN32_FIND_DATA, "nFileSizeLow"), DllStructGetData($tWIN32_FIND_DATA, "nFileSizeHigh")) If $ArrayCount = 6 Then $Array[$Index][2] = DllStructGetData($tWIN32_FIND_DATA, "dwFileAttributes") $tFileTime = DllStructCreate($tagFILETIME, DllStructGetPtr($tWIN32_FIND_DATA, "ftLastWriteTime")) $Array[$Index][3] = _Date_Time_FileTimeToStr( $tFileTime ,$bFmt) $tFileTime = DllStructCreate($tagFILETIME, DllStructGetPtr($tWIN32_FIND_DATA, "ftCreationTime")) $Array[$Index][4] = _Date_Time_FileTimeToStr( $tFileTime ,$bFmt) $tFileTime = DllStructCreate($tagFILETIME, DllStructGetPtr($tWIN32_FIND_DATA, "ftLastAccessTime")) $Array[$Index][5] = _Date_Time_FileTimeToStr( $tFileTime ,$bFmt) EndIf EndIf EndIf $callFindNext = DllCall($__ghWinInet_FTP, 'bool', 'InternetFindNextFileW', 'handle', $callFindFirst[0], 'ptr', DllStructGetPtr($tWIN32_FIND_DATA)) If @error Then Return SetError(2, _WinAPI_GetLastError(), 0) Until Not $callFindNext[0] DllCall($__ghWinInet_FTP, 'bool', 'InternetCloseHandle', 'handle', $callFindFirst[0]) ; No need to test @error. If $ArrayCount = 1 Then ; filename only $Array[0] = $Index ReDim $Array[$Index+1] Else $Array[0][0] = $Index ReDim $Array[$Index + 1][$ArrayCount] EndIf Return $Array EndFunc ;==>__FTP_ListToArray
comment:5 by , 15 years ago
Although it should be:
Local $Index = 0, $ArraySize = 16 If $ArrayCount = 1 Then ; filename only Local $Array[$ArraySize] Else Local $Array[$ArraySize]$ArrayCount] EndIf
I didn't need to leave '16' hard-coded in the array definitions...
comment:6 by , 15 years ago
Spiff59 is right about 2D arrays not working when user wants only files or only Dirs returned. The last block of code in the version I submitted should be:
Switch $Return_Type Case 0 If $ArrayCount = 1 Then ReDim $DirectoryArray[$DirectoryArray[0] + $FileArray[0] + 1] For $i = 1 To $FileIndex $DirectoryArray[$DirectoryArray[0] + $i] = $FileArray[$i] Next $DirectoryArray[0] += $FileArray[0] Else ReDim $DirectoryArray[$DirectoryArray[0][0] + $FileArray[0][0] + 1][$ArrayCount] For $i = 1 To $FileIndex For $j = 0 To $ArrayCount-1 $DirectoryArray[$DirectoryArray[0][0] + $i][$j] = $FileArray[$i][$j] Next Next $DirectoryArray[0][0] += $FileArray[0][0] EndIf Return $DirectoryArray Case 1 ;XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX If $ArrayCount = 1 Then ReDim $DirectoryArray[$DirectoryIndex+1] Else ReDim $DirectoryArray[$DirectoryIndex+1][$ArrayCount] EndIf ;XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Return $DirectoryArray Case 2 ;XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX If $ArrayCount = 1 Then ReDim $FileArray[$FileIndex +1] Else ReDim $FileArray[$FileIndex +1][$ArrayCount] EndIf ;XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Return $FileArray EndSwitch
comment:8 by , 15 years ago
Not (unnecessarily) building two seperate arrays, which then have to be combined in the end, saves a ton of code, and makes for a more straight-forward, easily-maintained function.
It does however affect the order in which hits are returned in the output array.
The original function would place all folders first, followed by all files.
My (smaller, faster, cleaner) version returns a folder, then any files within that folder, then the next folder, and it's files, and so on...
There is no documentation, or requirement, regarding the order in which the returned array is sorted. There any advantage, nor disadvantage, to either sequence. So, I (of course), vote for the simpler version of the routine.
Maybe a fresh set of eyes will pipe in on this... ?
comment:9 by , 15 years ago
Actually, maybe there is an advantage to the folder/files, folder/files, folder/files sequence of the newer, smaeller, version. It might allow one to use the returned array sequence to determine a folders contents, without having to compare the actual pathnames (although subfolders would complicate that coding). In any case, this array sequencing is no less useful than that of the existing routine.
comment:10 by , 15 years ago
Sorry to be the one who caused a bug, but the mistake pointed out in my code submitted above never got corrected before being applied to the beta. Also my opinion on which is best (Spiff59 or mine), I personally would like to stick to the one I submitted. Not because its mine but because it leaves the return format of the array same. Spiff59s function is a bit quicker, but like he said it changes the return format which could possible cause users scripts to break or act/look differently. I would only push for a change like that if the performance increase was much higher.

Automatic ticket cleanup.