czardas Posted September 6, 2013 Share Posted September 6, 2013 (edited) Replacing the >regexp above with _StringToHex() (see next post) will make this more universal (not only for file names) and eliminates all chance of mistaken identity. Also including StringLower() adds a little overhead, but it is required for case insensitive matching. Remove it and it will become case sensitive, which isn't what you want in this circumstance - at least I don't think so. ; $tname = _StringToHex(StringLower($array[$1])) ; There is still a chance that a global variable with the same name might already exist, although it is unlikely the name will only contain an even number of hex characters (this depends on how you name your variables). I don't currently have a solution for this. If you don't use any globals, or always include at least one non hexadecimal character in the name (for example underscore $_ABCD), the problem will never occur. Another problem that could occur (depending on circumstance) is hitting a limit for variable name length. Here it shouldn't be an issue: because the maximum length of a file name is 255 characters, and that is less than half any variable name length limit. Converting to hex is a reliable method but it also doubles the string length. Aaargh - Problems with unicode in file names.This is getting complicated. Edited September 6, 2013 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
czardas Posted September 6, 2013 Share Posted September 6, 2013 (edited) Final attempt at modifying kylomas' code using StringToBinary(): Now it should work fine with unicode file names. Please read the above post. In theory everything mentioned there applies here also (some of the numbers will be different). ; func _process_array($array) local static $aFinal_idx = 0 local $tname, $tFiles = $array[0] $ttFiles += $array[0] redim $aFinal[ ubound($aFinal) + $tFiles ] for $1 = 1 to ubound($array) - 1 ;$tname = stringregexpreplace($array[$1],'[\.\:\\ ]','_') $tname = StringToBinary(StringLower($array[$1]), 2) ; This replaces the line above if isdeclared( eval('s' & $tname) ) = 0 then assign('s' & $tname,1,2) $aFinal[$aFinal_idx] = $array[$1] $aFinal_idx += 1 $tUniqueFiles += 1 Else $tDups += 1 endif next endfunc ; I learned something new. Edited September 6, 2013 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
Cravin Posted September 6, 2013 Author Share Posted September 6, 2013 Wow, this is some fantastic work, fellas. Thanks for coming together and helping me solve this problem. This has GREATLY reduced the amount of time it takes to remove/exclude duplicates from my arrays. I have all I need to move forward at this point. I just tested it on some large arrays and it's amazing. I spent most of yesterday on my example. Oh well! Additionally, thanks to you Czardas for spending as much time on this as you did . This community is great! czardas 1 Link to comment Share on other sites More sharing options...
czardas Posted September 6, 2013 Share Posted September 6, 2013 It has been a learning experience for me too, so I'm glad I made the effort. Did you try replacing the code above. It will be slightly slower but prevents problems mentioned. There is always a big chance that some of your file names contain underscore already, not to mention unicode. Perhaps kylomas has further suggestions. operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
jchd Posted September 6, 2013 Share Posted September 6, 2013 Sorry to jump in, but I can't run code right now and have hard time catching up the thread details. What is the issue with Unicode filenames exactly? This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
czardas Posted September 6, 2013 Share Posted September 6, 2013 (edited) Sorry to jump in, but I can't run code right now and have hard time catching up the thread details. What is the issue with Unicode filenames exactly? I wasn't aware that variable names can be declared which use unicode characters in AutoIt. The code posted by kylomas assigns a variable with the name of the file. Edited September 6, 2013 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
kylomas Posted September 6, 2013 Share Posted September 6, 2013 (edited) @czardas The only other option that I considered last night was to replace each of the invalid chars with it's own signature in case I needed to retain file name integrity to translate the name back to it's origional. Like this... $tname = stringregexpreplace($array[$1],'\.','_-1-_') $tname = stringregexpreplace($tname,'\:','_-2-_') $tname = stringregexpreplace($tname,'\\','_-3-_') Still not fullproof. I think I like your solution better as I did not even consider non-English chars. (English-centric hubris, again). Dr.'s appointment in an hour so no time to work on this now... kylomas edit: File names with underscores should not be a problem, unless I'm missing something. Edited September 6, 2013 by kylomas Forum Rules Procedure for posting code "I like pigs. Dogs look up to us. Cats look down on us. Pigs treat us as equals." - Sir Winston Churchill Link to comment Share on other sites More sharing options...
Cravin Posted September 6, 2013 Author Share Posted September 6, 2013 It has been a learning experience for me too, so I'm glad I made the effort. Did you try replacing the code above. It will be slightly slower but prevents problems mentioned. There is always a big chance that some of your file names contain underscore already, not to mention unicode. Perhaps kylomas has further suggestions. Interestingly I ran this with the code you changed vs the original that kylomas wrote and what I discovered is that your method uses more memory, but it actually seems to have reduced the time it takes to process by about 15~20% on a final array with about 170k elements. Link to comment Share on other sites More sharing options...
czardas Posted September 6, 2013 Share Posted September 6, 2013 (edited) Well it's all very interesting to me. It's hard to pin down exactly the best method because results change with varying numbers of duplicates. The conversion to binary idea produces longer duplicate variable names. I expected it to run a bit slower because of the conversion. I'm not too familiar with some of the methods. @kylomas MsgBox(0, "False Positive", _ StringRegExpReplace("123_456.789",'[\.\:\\ ]','_') = _ StringRegExpReplace("123.456.789",'[\.\:\\ ]','_')) Edited September 7, 2013 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
kylomas Posted September 6, 2013 Share Posted September 6, 2013 @czardas - I can see that as a problem in a generalized function (not DSN's only). What do you think of the technique in post # 27? Forum Rules Procedure for posting code "I like pigs. Dogs look up to us. Cats look down on us. Pigs treat us as equals." - Sir Winston Churchill Link to comment Share on other sites More sharing options...
czardas Posted September 6, 2013 Share Posted September 6, 2013 (edited) kylomas I just assumed a file name can be anything that I can create on my own computer such as: s_-1-_.txt I just tested assigning non-word character variable names and they definately do not work. Unicode does not work without conversion. Underscore is a problem. Edited September 6, 2013 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
jchd Posted September 6, 2013 Share Posted September 6, 2013 Why would converting the Unicode string into its hex representation of UTF-<something like 8 or 16> not work and cover all bases? This would be easy to convert back and would alleviate any kind of limitation (or I'm misunderstanding something big.) This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
czardas Posted September 7, 2013 Share Posted September 7, 2013 (edited) Why would converting the Unicode string into its hex representation of UTF-<something like 8 or 16> not work and cover all bases? This would be easy to convert back and would alleviate any kind of limitation (or I'm misunderstanding something big.) ; That's exactly what I'm suggesting. Maybe there's a better way to do it than using StringToBnary(). First it has to be converted to la single case - lowercase or uppercase. Then it's converted to a binary representation. ; $tname = StringToBinary(StringLower($array[$1]), 2) ; This replaces the line above ; Where are you jchd, that you can't run any code? I have visions of you half way up a mountain or someting like that. Edited September 7, 2013 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
jchd Posted September 7, 2013 Share Posted September 7, 2013 AutoIt strings are already UTF16 This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
czardas Posted September 7, 2013 Share Posted September 7, 2013 (edited) Okay but try assigning a variable called $_祲礳祳.txt. That doesn't work, so it needs converting to something else. For example $_0x797279337973002E007400780074 . . . Big Endian OR ==> $_0x7279337973792E00740078007400 . . . Little Endian oops ran the wrong bit of code - fixed. That's what I get using BinaryToString. I tested that the variables exist and can be assigned a value which can also be read. Edited September 7, 2013 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
jchd Posted September 7, 2013 Share Posted September 7, 2013 What I'm saying is that what you do is "take binary representation of the string as UTF16". In this case, no other conversion takes place than string --> binary. (was on friend's tiny tablet) This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
czardas Posted September 7, 2013 Share Posted September 7, 2013 (edited) Isn't that what I'm doing? If you don't modify the case first you will get mismatches with different uppercase / lowercase variants. UpDown.txt <> updOWN.txT <> uPdOwN.TXT Edited September 7, 2013 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
jchd Posted September 7, 2013 Share Posted September 7, 2013 That"s the right way to do it. I was just nitpicking about the "Then it's converted to UTF-16." part. I know you know, but other readers might get confused by the wording. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
czardas Posted September 7, 2013 Share Posted September 7, 2013 Okay I know my terminology needs to improve. Sorry. LOL operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
jchd Posted September 7, 2013 Share Posted September 7, 2013 I'm the culprit by jumping into the thread without taking the pain to read it in full! (Back to bed for me now) This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now