Opened on Oct 4, 2008 at 5:55:39 AM
#599 closed Feature Request (Completed)
_FileListToArray costs O(N^2) instead of O(NlogN)
| Reported by: | code65536 | Owned by: | Gary |
|---|---|---|---|
| Milestone: | 3.2.13.10 | Component: | Standard UDFs |
| Version: | Severity: | None | |
| Keywords: | Cc: |
Description
Memory re-allocation is an expensive operation and as such, should be avoided. For _FileListToArray to read a directory with N entries, it will need to perform N ReDim operations, and with each ReDim operation costing O(N) (assuming that ReDim is implemented efficiently), _FileListToArray will cost O(N2).
A very common way to mitigate this is to double the array size with each re-allocation instead of just incrementing it. Under such a system, you will greatly reduce the number of ReDim operations, down to log2(N), bringing the final asymptotic cost of the function down to O(NlogN), which is optimal.
I am attaching a patch that will fix this problem.
With large directories with about 300 files, implementing this simple change reduced the time spent by over half (from ~10s for 100 runs down to ~4.8s for 100 runs in my benchmark). Proportionally, the performance win will be even larger for directories with more entries.
Attachments (2)
Change History (9)
by , on Oct 4, 2008 at 5:56:49 AM
| Attachment: | bug599.patch added |
|---|
comment:1 by , on Oct 4, 2008 at 6:01:48 AM
Note that with this patch, UBound will no longer correspond to the number of entries found, but since the first element of the result array contains the number of entries, I don't think that this would be a problem.
If people do think that this is a problem--that people are using UBound instead of the first array element, you can add a final ReDim to the end, right before the return to "trim" away any excess size from the array. The function will still retain O(NlogN) performance.
comment:2 by , on Oct 4, 2008 at 3:57:46 PM
The function must trim to the correct size. People do use UBound() because it's the correct way. The size in element 0 is a carry-over convention from before UBound() existed and is thus largely pointless and redundant.
comment:3 by , on Oct 10, 2008 at 11:01:05 PM
| Type: | Bug → Feature Request |
|---|
comment:5 by , on Oct 20, 2008 at 11:28:56 PM
| Milestone: | → 3.2.13.10 |
|---|---|
| Resolution: | → Fixed |
| Status: | new → closed |
comment:6 by , on Oct 20, 2008 at 11:38:16 PM
| Resolution: | Fixed |
|---|---|
| Status: | closed → reopened |
comment:7 by , on Oct 20, 2008 at 11:38:42 PM
| Resolution: | → Completed |
|---|---|
| Status: | reopened → closed |

patch