Modify

#599 closed Feature Request (Completed)

_FileListToArray costs O(N^2) instead of O(NlogN)

Reported by: code65536 Owned by: Gary
Milestone: 3.2.13.10 Component: Standard UDFs
Version: Severity: None
Keywords: Cc:

Description

Memory re-allocation is an expensive operation and as such, should be avoided. For _FileListToArray to read a directory with N entries, it will need to perform N ReDim operations, and with each ReDim operation costing O(N) (assuming that ReDim is implemented efficiently), _FileListToArray will cost O(N2).

A very common way to mitigate this is to double the array size with each re-allocation instead of just incrementing it. Under such a system, you will greatly reduce the number of ReDim operations, down to log2(N), bringing the final asymptotic cost of the function down to O(NlogN), which is optimal.

I am attaching a patch that will fix this problem.

With large directories with about 300 files, implementing this simple change reduced the time spent by over half (from ~10s for 100 runs down to ~4.8s for 100 runs in my benchmark). Proportionally, the performance win will be even larger for directories with more entries.

Attachments (2)

bug599.patch (705 bytes ) - added by anonymous on Oct 4, 2008 at 5:56:49 AM.
patch
bug599-v2.patch (902 bytes ) - added by code65536 on Oct 4, 2008 at 4:46:00 PM.
patch v2

Download all attachments as: .zip

Change History (9)

by anonymous, on Oct 4, 2008 at 5:56:49 AM

Attachment: bug599.patch added

patch

comment:1 by anonymous, on Oct 4, 2008 at 6:01:48 AM

Note that with this patch, UBound will no longer correspond to the number of entries found, but since the first element of the result array contains the number of entries, I don't think that this would be a problem.

If people do think that this is a problem--that people are using UBound instead of the first array element, you can add a final ReDim to the end, right before the return to "trim" away any excess size from the array. The function will still retain O(NlogN) performance.

comment:2 by Valik, on Oct 4, 2008 at 3:57:46 PM

The function must trim to the correct size. People do use UBound() because it's the correct way. The size in element 0 is a carry-over convention from before UBound() existed and is thus largely pointless and redundant.

by code65536, on Oct 4, 2008 at 4:46:00 PM

Attachment: bug599-v2.patch added

patch v2

comment:3 by Gary, on Oct 10, 2008 at 11:01:05 PM

Type: BugFeature Request

comment:4 by TicketCleanup, on Oct 11, 2008 at 12:00:06 AM

Version: 3.2.12.0

Automatic ticket cleanup.

comment:5 by J-Paul Mesnage, on Oct 20, 2008 at 11:28:56 PM

Milestone: 3.2.13.10
Resolution: Fixed
Status: newclosed

comment:6 by J-Paul Mesnage, on Oct 20, 2008 at 11:38:16 PM

Resolution: Fixed
Status: closedreopened

comment:7 by J-Paul Mesnage, on Oct 20, 2008 at 11:38:42 PM

Resolution: Completed
Status: reopenedclosed

Modify Ticket

Action
as closed The owner will remain Gary.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.