#1499 closed Feature Request (Fixed)
Performance increase to internal __FTP_ListToArray()
Reported by: | Beege | Owned by: | Jpm |
---|---|---|---|
Milestone: | 3.3.7.0 | Component: | Standard UDFs |
Version: | Severity: | None | |
Keywords: | _FTPEx | Cc: |
Description
In the current internal function _FTP_ListToArray, every time a new item is added to the array, the array is Resized by only +1, which can be a major performance hit when getting directory listing of large numbers. If we instead Resize by ubound*2, and then trim unused slots at the end of function, we can avoid a lot or resizing. I learned this method from the original _filelisttoarray() func.
I have included a example with changes to the function and a demonstration of the new vs. old. I have put a set of XXXXXXXXXXXXXXXXXXXXXXXXXXX around any lines that I changed or added. Not that many changes were needed since the function already kept track of array size in index $array[0]. Thank you for your consideration and hard work in creating this wonderful language.:)
Attachments (1)
Change History (11)
Changed 15 years ago by Beege
comment:1 Changed 15 years ago by TicketCleanup
- Version 3.3.4.0 deleted
comment:2 Changed 15 years ago by Jpm
- Milestone set to 3.3.7.0
- Owner changed from Gary to Jpm
- Resolution set to Fixed
- Status changed from new to closed
Fixed by revision [5802] in version: 3.3.7.0
comment:3 Changed 14 years ago by Spiff59
His version is not going to work as coded.
If $Return_Type is '1' (Directories only) or '2' (Files only) he never bothers to check the value of $Arraycount for the Redim() at the end of the routine. As coded, it simply defaults to returning a one-dimension array. Therefore, with $Return_Type set to '1' or '2', the proposed FTP_ListToArray() internal routine would only work correctly when called from _FTP_ListToArray(), and would fail when called by either _FTP_ListToArray2D() or _FTP_ListToArrayEx().
Also, why all the extra code to build and maintain separate arrays for files and directories, when they are ALWAYS combined at the end?
I'd suggest the following, which fixes the Redim() bug, eliminates a bunch of useless code, and is faster yet:
Func __FTP_ListToArray($l_FTPSession, $Return_Type = 0, $l_Flags = 0, $bFmt = 1, $ArrayCount = 6, $l_Context = 0) Local $tWIN32_FIND_DATA, $tFileTime, $IsDir, $callFindNext Local $Index = 0 If $ArrayCount = 1 Then ; filename only Local $Array[1] Else Local $Array[1][$ArrayCount] EndIf If $Return_Type < 0 Or $Return_Type > 2 Then Return SetError(1, 0, $Array) ;~ Global Const $tagWIN32_FIND_DATA = "DWORD dwFileAttributes; dword ftCreationTime[2]; dword ftLastAccessTime[2]; dword ftLastWriteTime[2]; DWORD nFileSizeHigh; DWORD nFileSizeLow; dword dwReserved0; dword dwReserved1; WCHAR cFileName[260]; WCHAR cAlternateFileName[14];" $tWIN32_FIND_DATA = DllStructCreate($tagWIN32_FIND_DATA) Local $callFindFirst = DllCall($__ghWinInet_FTP, 'handle', 'FtpFindFirstFileW', 'handle', $l_FTPSession, 'wstr', "", 'ptr', DllStructGetPtr($tWIN32_FIND_DATA), 'dword', $l_Flags, 'dword_ptr', $l_Context) If @error Or Not $callFindFirst[0] Then Return SetError(1, _WinAPI_GetLastError(), 0) Do $IsDir = BitAND(DllStructGetData($tWIN32_FIND_DATA, "dwFileAttributes"), $FILE_ATTRIBUTE_DIRECTORY) = $FILE_ATTRIBUTE_DIRECTORY If ($IsDir And $Return_Type <> 2) Or (Not $IsDir And $Return_Type <> 1) Then $Index += 1 If $ArrayCount = 1 Then If UBound($Array) < $Index+1 Then ReDim $Array[$Index*2] $Array[$Index] = DllStructGetData($tWIN32_FIND_DATA, "cFileName") Else If UBound($Array) < $Index+1 Then ReDim $Array[$Index*2][$ArrayCount] $Array[$Index][0] = DllStructGetData($tWIN32_FIND_DATA, "cFileName") $Array[$Index][1] = _WinAPI_MakeQWord(DllStructGetData($tWIN32_FIND_DATA, "nFileSizeLow"), DllStructGetData($tWIN32_FIND_DATA, "nFileSizeHigh")) If $ArrayCount = 6 Then $Array[$Index][2] = DllStructGetData($tWIN32_FIND_DATA, "dwFileAttributes") $tFileTime = DllStructCreate($tagFILETIME, DllStructGetPtr($tWIN32_FIND_DATA, "ftLastWriteTime")) $Array[$Index][3] = _Date_Time_FileTimeToStr( $tFileTime ,$bFmt) $tFileTime = DllStructCreate($tagFILETIME, DllStructGetPtr($tWIN32_FIND_DATA, "ftCreationTime")) $Array[$Index][4] = _Date_Time_FileTimeToStr( $tFileTime ,$bFmt) $tFileTime = DllStructCreate($tagFILETIME, DllStructGetPtr($tWIN32_FIND_DATA, "ftLastAccessTime")) $Array[$Index][5] = _Date_Time_FileTimeToStr( $tFileTime ,$bFmt) EndIf EndIf EndIf $callFindNext = DllCall($__ghWinInet_FTP, 'bool', 'InternetFindNextFileW', 'handle', $callFindFirst[0], 'ptr', DllStructGetPtr($tWIN32_FIND_DATA)) If @error Then Return SetError(2, _WinAPI_GetLastError(), 0) Until Not $callFindNext[0] DllCall($__ghWinInet_FTP, 'bool', 'InternetCloseHandle', 'handle', $callFindFirst[0]) ; No need to test @error. If $ArrayCount = 1 Then ; filename only $Array[0] = $Index ReDim $Array[$Index+1] Else $Array[0][0] = $Index ReDim $Array[$Index + 1][$ArrayCount] EndIf Return $Array EndFunc ;==>__FTP_ListToArray
comment:4 Changed 14 years ago by Spiff59
Actually, this with the test "If $Return_Type - $IsDir <> 1 Then" to determine what is included in the returned array is likely quicker than the compound statement in the last example, and I'd think maintaining a simple $ArraySize variable would be cleaner than calling Ubound() for each iteration of the main loop. So, here's a final cleaned-up version:
Func __FTP_ListToArray($l_FTPSession, $Return_Type = 0, $l_Flags = 0, $bFmt = 1, $ArrayCount = 6, $l_Context = 0) Local $tWIN32_FIND_DATA, $tFileTime, $IsDir, $callFindNext Local $Index = 0, $ArraySize = 16 If $ArrayCount = 1 Then ; filename only Local $Array[16] Else Local $Array[16][$ArrayCount] EndIf If $Return_Type < 0 Or $Return_Type > 2 Then Return SetError(1, 0, $Array) ;~ Global Const $tagWIN32_FIND_DATA = "DWORD dwFileAttributes; dword ftCreationTime[2]; dword ftLastAccessTime[2]; dword ftLastWriteTime[2]; DWORD nFileSizeHigh; DWORD nFileSizeLow; dword dwReserved0; dword dwReserved1; WCHAR cFileName[260]; WCHAR cAlternateFileName[14];" $tWIN32_FIND_DATA = DllStructCreate($tagWIN32_FIND_DATA) Local $callFindFirst = DllCall($__ghWinInet_FTP, 'handle', 'FtpFindFirstFileW', 'handle', $l_FTPSession, 'wstr', "", 'ptr', DllStructGetPtr($tWIN32_FIND_DATA), 'dword', $l_Flags, 'dword_ptr', $l_Context) If @error Or Not $callFindFirst[0] Then Return SetError(1, _WinAPI_GetLastError(), 0) Do $IsDir = BitAND(DllStructGetData($tWIN32_FIND_DATA, "dwFileAttributes"), $FILE_ATTRIBUTE_DIRECTORY) = $FILE_ATTRIBUTE_DIRECTORY If $Return_Type - $IsDir <> 1 Then $Index += 1 If $ArrayCount = 1 Then If $Index = $ArraySize Then $ArraySize *= 2 ReDim $Array[$ArraySize] EndIf $Array[$Index] = DllStructGetData($tWIN32_FIND_DATA, "cFileName") Else If $Index = $ArraySize Then $ArraySize *= 2 ReDim $Array[$ArraySize][$ArrayCount] EndIf $Array[$Index][0] = DllStructGetData($tWIN32_FIND_DATA, "cFileName") $Array[$Index][1] = _WinAPI_MakeQWord(DllStructGetData($tWIN32_FIND_DATA, "nFileSizeLow"), DllStructGetData($tWIN32_FIND_DATA, "nFileSizeHigh")) If $ArrayCount = 6 Then $Array[$Index][2] = DllStructGetData($tWIN32_FIND_DATA, "dwFileAttributes") $tFileTime = DllStructCreate($tagFILETIME, DllStructGetPtr($tWIN32_FIND_DATA, "ftLastWriteTime")) $Array[$Index][3] = _Date_Time_FileTimeToStr( $tFileTime ,$bFmt) $tFileTime = DllStructCreate($tagFILETIME, DllStructGetPtr($tWIN32_FIND_DATA, "ftCreationTime")) $Array[$Index][4] = _Date_Time_FileTimeToStr( $tFileTime ,$bFmt) $tFileTime = DllStructCreate($tagFILETIME, DllStructGetPtr($tWIN32_FIND_DATA, "ftLastAccessTime")) $Array[$Index][5] = _Date_Time_FileTimeToStr( $tFileTime ,$bFmt) EndIf EndIf EndIf $callFindNext = DllCall($__ghWinInet_FTP, 'bool', 'InternetFindNextFileW', 'handle', $callFindFirst[0], 'ptr', DllStructGetPtr($tWIN32_FIND_DATA)) If @error Then Return SetError(2, _WinAPI_GetLastError(), 0) Until Not $callFindNext[0] DllCall($__ghWinInet_FTP, 'bool', 'InternetCloseHandle', 'handle', $callFindFirst[0]) ; No need to test @error. If $ArrayCount = 1 Then ; filename only $Array[0] = $Index ReDim $Array[$Index+1] Else $Array[0][0] = $Index ReDim $Array[$Index + 1][$ArrayCount] EndIf Return $Array EndFunc ;==>__FTP_ListToArray
comment:5 Changed 14 years ago by anonymous
Although it should be:
Local $Index = 0, $ArraySize = 16 If $ArrayCount = 1 Then ; filename only Local $Array[$ArraySize] Else Local $Array[$ArraySize]$ArrayCount] EndIf
I didn't need to leave '16' hard-coded in the array definitions...
comment:6 Changed 14 years ago by Beege
Spiff59 is right about 2D arrays not working when user wants only files or only Dirs returned. The last block of code in the version I submitted should be:
Switch $Return_Type Case 0 If $ArrayCount = 1 Then ReDim $DirectoryArray[$DirectoryArray[0] + $FileArray[0] + 1] For $i = 1 To $FileIndex $DirectoryArray[$DirectoryArray[0] + $i] = $FileArray[$i] Next $DirectoryArray[0] += $FileArray[0] Else ReDim $DirectoryArray[$DirectoryArray[0][0] + $FileArray[0][0] + 1][$ArrayCount] For $i = 1 To $FileIndex For $j = 0 To $ArrayCount-1 $DirectoryArray[$DirectoryArray[0][0] + $i][$j] = $FileArray[$i][$j] Next Next $DirectoryArray[0][0] += $FileArray[0][0] EndIf Return $DirectoryArray Case 1 ;XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX If $ArrayCount = 1 Then ReDim $DirectoryArray[$DirectoryIndex+1] Else ReDim $DirectoryArray[$DirectoryIndex+1][$ArrayCount] EndIf ;XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Return $DirectoryArray Case 2 ;XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX If $ArrayCount = 1 Then ReDim $FileArray[$FileIndex +1] Else ReDim $FileArray[$FileIndex +1][$ArrayCount] EndIf ;XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Return $FileArray EndSwitch
comment:7 Changed 14 years ago by Jpm
so What is the best the Spiff59 or the modified Beege?
comment:8 Changed 14 years ago by Spiff59
Not (unnecessarily) building two seperate arrays, which then have to be combined in the end, saves a ton of code, and makes for a more straight-forward, easily-maintained function.
It does however affect the order in which hits are returned in the output array.
The original function would place all folders first, followed by all files.
My (smaller, faster, cleaner) version returns a folder, then any files within that folder, then the next folder, and it's files, and so on...
There is no documentation, or requirement, regarding the order in which the returned array is sorted. There any advantage, nor disadvantage, to either sequence. So, I (of course), vote for the simpler version of the routine.
Maybe a fresh set of eyes will pipe in on this... ?
comment:9 Changed 14 years ago by Spiff59
Actually, maybe there is an advantage to the folder/files, folder/files, folder/files sequence of the newer, smaeller, version. It might allow one to use the returned array sequence to determine a folders contents, without having to compare the actual pathnames (although subfolders would complicate that coding). In any case, this array sequencing is no less useful than that of the existing routine.
comment:10 Changed 14 years ago by Beege
Sorry to be the one who caused a bug, but the mistake pointed out in my code submitted above never got corrected before being applied to the beta. Also my opinion on which is best (Spiff59 or mine), I personally would like to stick to the one I submitted. Not because its mine but because it leaves the return format of the array same. Spiff59s function is a bit quicker, but like he said it changes the return format which could possible cause users scripts to break or act/look differently. I would only push for a change like that if the performance increase was much higher.
Guidelines for posting comments:
- You cannot re-open a ticket but you may still leave a comment if you have additional information to add.
- In-depth discussions should take place on the forum.
For more information see the full version of the ticket guidelines here.
Automatic ticket cleanup.