Jump to content

Recommended Posts

Posted (edited)
19 hours ago, AspirinJunkie said:

I'm not sure if this was adressed to me

I was not aiming my comment at you or anyone in particular. This is the approach I would take in particular cases (as mentioned). I thought my comments might be potentially valuable to some readers. In fact a combination of both approaches also seems doable. It really does depend on the array discrepancies (which we all seem to agree about). Today I'm busy. I'll try it some time during the week.

Edited by czardas
Posted (edited)

As an experiment, I have reinvented the binary search algorithm. I'm sure this method has been tried before and it might turn out to be a fruitless exercise. However - given the idea that many duplicate file names appear in some backup folder along with various differences, using the standard binary search algorithm might not necessarily be the most optimal approach. This code has not been thoroughly tested.

CODE DELETED
See here: https://www.autoitscript.com/forum/topic/190340-parallel-exponential-search-algorithm/

This is a proof of concept and will need further adaptation to work with all unicode strings. Incremental of powers of two are used for the binary search (not division by two). Numbers are used here for testing the algorithm. It's also a parallel search. I might have overlooked something.

 

Edited by czardas
Posted (edited)

I have just run a couple of precursory tests so far, using the following similar arrays:

#include <Array.au3>

Local $aOne[50001]
For $i = 0 To 50000
    $aOne[$i] = Hex($i, 8)
Next
_ArrayShuffle($aOne)
Local $aTwo = $aOne
_ArrayReverse($aTwo)
ReDim $aOne[35000]
ReDim $aTwo[35000]

_ArraySort($aOne)
_ArraySort($aTwo)

_ArrayDisplay($aOne)
_ArrayDisplay($aTwo)

It appears that binary search is less efficient than a straight forward linear search on these arrays. A parallel linear search might be best when there are more matches. Current code differences produce unfair tests, so I haven't got a straight forward answer yet. I can still imagine how these methods will compare given a variety of starting conditions. Inverted/exponential binary search needs testing separately against the standard binary search (outside of this context). Again varied starting conditions will play an important role.

Test results were as follows:

Parallel Binary Search: 603.656865104227
Average Binary search:  2276.60675885885
Average Linear scan:    160.069939565546

 

 

 

Edited by czardas
  • 3 weeks later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...