Leaderboard

Best practise for best 'search in filenames' hits?

Musashi reacted to jchd for a topic

FYI, an SQLite DB is just one file, not a herd of files, which is portable verbatim on any hardware/software platform you can think of. SQLite is zero installation and zero maintainance. Also SQLite is by far the most widespread DB engine ever and one of the top most used software piece: https://www.sqlite.org/mostdeployed.html So you already use SQLite several times daily, without even noticing. A DB file is by far more reliable that anything else, thanks to the ACID properties of DBs. SQLite file format has not changed since introduction of version 3, 2004-07-22, something very few software can be proud of. Yet there have been an enormous number of powerful features added to the library over the years. Given the "around 10¹²" implementations in current use all over the world, it's guaranteed that this format is going to last very long (authors says at least till 2050). SQLite is free: public-domain open source, just needs a C compiler to build from source. Standard SQLite DBs can reach 281 TB, more than you'll ever need! All in all, an SQLite DB is ideal for storage, maintainance and query of any-sized dataset, with the added benefit of possible concurrency and powerful querying language. No other file format offers the same power, speed and flexibility. Of course, there is always a little price to pay: learn to use the features you'll need, but that's what help foras are for, aren't they?

October 24, 2020

1 point

Best practise for best 'search in filenames' hits?

Guy_ reacted to jchd for a topic

Here's an exemple you can play with: a .CSV file and an SQLite database with the same data. You can use SQLite Expert (free version) to navigate the database. It has 4 columns: title, upcase (uppercase unaccented), shorter (without whitespaces) and shortest (without vowels). Removing diacritics with _Unaccent(): there is the option Full, to remove more aggressively (depends on language script). ; Unicode Normalization Forms Global Enum $UNF_NormC = 1, $UNF_NormD, $UNF_NormKC = 5, $UNF_NormKD Func _UNF_Change($sIn, $iForm) If $iForm = $UNF_NormC Or $iForm = $UNF_NormD Or $iForm = $UNF_NormKC Or $iForm = $UNF_NormKD Then Local $aRet = DllCall("Normaliz.dll", "int", "NormalizeString", "int", $iForm, "wstr", $sIn, "int", -1, "ptr", 0, "int", 0) Local $tOut = DllStructCreate("wchar[" & 2 * ($aRet[0] + 20) & "]") $aRet = DllCall("Normaliz.dll", "int", "NormalizeString", "int", $iForm, "wstr", $sIn, "int", -1, "ptr", DllStructGetPtr($tOut, 1), "int", 2 * ($aRet[0] + 20)) Return DllStructGetData($tOut, 1) Else SetError(1, 0, $sIn) EndIf EndFunc ;==>_UNF_Change Func _Unaccent($s, $Full = 0) Local Static $aPat = [ _ "(*UCP)[\x{300}-\x{36F}`'¨^¸¯]", _ ; combining accents only "(*UCP)\p{Mn}|\p{Lm}|\p{Sk}" _ ; " " + all modifying letters ] Return StringRegExpReplace(_UNF_Change($s, $UNF_NormD), $aPat[Mod($Full, 2)], "") EndFunc ;==>_Unaccent A relatively slow version of a fuzzy search (much faster version in my SQLite extension): ; Computes the number of typos (Damerau-Levenshtein distance) between two short strings. ; Four types of differences are counted: ; insertion of a character, abcd ab#cd ; deletion of a character, abcd acd ; exchange of a character abcd ab$d ; inversion of adjacent chars abcd acbd ; ; This function does NOT satisfy the so-called "triangle inequality", which means ; more simply that it makes NO attempt to compute the MINIMUM edit distance in all ; cases. If you need that, you should use more complex algorithms. ; ; This simple function allows a fuzzy compare for e.g. recovering from typical ; human typos in short strings like names, address, cities... while getting rid of ; minor scripting differences. ; ; Strings are lowercased. ; String $st2 can be used as a pattern similar to the SQL 'LIKE' operator: ; '_' and trailing '%' act as in LIKE. These wildcards can be passed as parameters ; but these should contain exactly one character for the function to work properly. ; ; Complexity is in O(n^2) so don't use with long strings! ; Func _Typos(Const $st1, Const $st2, $anychar = '_', $anytail = '%') Local $s1, $s2, $pen, $del, $ins, $subst If Not IsString($st1) Then Return SetError(-1, -1, -1) If Not IsString($st2) Then Return SetError(-2, -2, -1) If $st2 = '' Then Return StringLen($st1) If $st2 == $anytail Then Return 0 If $st1 = '' Then Return(StringInStr($st2 & $anytail, $anytail, 1) - 1) EndIf ;~ $s1 = StringSplit(_LowerUnaccent($st1)), "", 2) ;; _LowerUnaccent() addon function not available here ;~ $s2 = StringSplit(_LowerUnaccent($st2)), "", 2) ;; _LowerUnaccent() addon function not available here $s1 = StringSplit(StringLower($st1), "", 2) $s2 = StringSplit(StringLower($st2), "", 2) Local $l1 = UBound($s1), $l2 = UBound($s2) Local $r[$l1 + 1][$l2 + 1] For $x = 0 To $l2 - 1 Switch $s2[$x] Case $anychar If $x < $l1 Then $s2[$x] = $s1[$x] EndIf Case $anytail $l2 = $x If $l1 > $l2 Then $l1 = $l2 EndIf ExitLoop EndSwitch $r[0][$x] = $x Next $r[0][$l2] = $l2 For $x = 0 To $l1 $r[$x][0] = $x Next For $x = 1 To $l1 For $y = 1 To $l2 $pen = Not ($s1[$x - 1] == $s2[$y - 1]) $del = $r[$x-1][$y] + 1 $ins = $r[$x][$y-1] + 1 $subst = $r[$x-1][$y-1] + $pen If $del > $ins Then $del = $ins If $del > $subst Then $del = $subst $r[$x][$y] = $del If ($pen And $x > 1 And $y > 1 And $s1[$x-1] == $s2[$y-2] And $s1[$x-2] == $s2[$y-1]) Then If $r[$x][$y] >= $r[$x-2][$y-2] Then $r[$x][$y] = $r[$x-2][$y-2] + 1 $r[$x-1][$y-1] = $r[$x][$y] EndIf Next Next Return ($r[$l1][$l2]) EndFunc ;; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ;; example usage ;; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <Array.au3> Local $reference = "lexicographically" Local $Words[11][2] = [ _ [$reference], _ ["Lexicôgraphicaly"], _ ["lexkographicaly"], _ ["Lexico9raphically"], _ ["lexioo9raphically"], _ ["Lexicographical"], _ ["lexicographlcally"], _ ["Lex1cogr@phically"], _ ["lexic0graphïca1yl"], _ ["lexIcOgraphically"], _ ["Lexlcographically"] _ ] For $i = 0 To UBound($Words) - 1 $Words[$i][1] = _Typos($Words[$i][0], $reference) Next _ArrayDisplay($Words, "Number of typos") _ConsoleWrite("Usage of '_' and '%' wildcards in pattern:" & @LF & @TAB & "_Typos('lex1c0gr@fhlâofznho', 'LEx_c_gr%') = " & _Typos('lex1c0gr@fhlofznho', 'lex_c_gr%') & @LF) ; Unicode-aware ConsoleWrite Func _ConsoleWrite($s) ConsoleWrite(BinaryToString(StringToBinary($s, 4), 1)) EndFunc ;==>__ConsoleWrite Library.csv Library.sq3

October 24, 2020

1 point

Best practise for best 'search in filenames' hits?

Guy_ reacted to Nine for a topic

Here something you could start with : #include <Constants.au3> $sText = "L'été des indiens : (äëôìû)." $sText = ReFormat ($sText) MsgBox ($MB_SYSTEMMODAL,"",$sText) Func ReFormat ($sString) Local $aRepl = [["[ÄÀÁÂ]","A"],["[ËÈÉÊ]","E"],["[ÏÌÍÎ]","I"],["[ÖÒÓÔ]","O"],["Ç","C"],["[ÜÙÚÛ]","U"],["[':;\.,()«»""°]"," "]] $sString = StringUpper($sString) For $i = 0 to UBound($aRepl)-1 $sString = StringRegExpReplace($sString,$aRepl[$i][0],$aRepl[$i][1]) Next Return StringStripWS($sString, 7) EndFunc

October 24, 2020

1 point

Getting Value from Last Used Row in a defined range

Siwa reacted to GokAy for a topic

@water Hey, I believe we have a misunderstanding here. I don't select cells or ranges, activate sheets, etc. unless they are necessary as well. You have missed a couple of pointers: - I am new to AutoIt, been using it for the last 2 weeks. - Never used Excel UDF myself. Helped another friend in another post, and now this. - Most examples I have seen, required to select a cell or something. I am not even sure if this is an AutoIt requirement? - In VBA if you need it, you just "Dim MyRange as Range" and "Set MyRange = Range(address)", and use it ("Set MyRange = Nothing", when finished). Or just reference it directly. Depends on what you want to achieve. - What I wrote above is a step-by-step analysis (if you would like to call it that) - This was the final verdict, no selects, no selections. And you can use whatever range definition you got, doesn't have to be UsedRange, Selection, CurrentRegion.: "Array = UsedRange.Resize(,5).Offset(0,6) - (here offset number for column should change since now you are offsetting from the start column rather than C). Get your array in 1 line of code, and it is fast." - If you don't need the initial range, why store it anywhere? (Ofcourse, it is better to store in a variable if it will be reused). Or get the contents in an array for some calculations as arrays are faster. This part depends on what you want to achieve as well.

October 24, 2020

1 point

Best practise for best 'search in filenames' hits?

Guy_ reacted to jchd for a topic

I would remove diacritics, vowels, punctuation and spaces, leaving only uppercase consonants. Indeed it's very unlikely that a search based on such reduction would return a large number of hits: in 99.99% of real-world runs, you would get only one hit. You can also guard against typos by using a fuzzy search, allowing an increasing number of typing errors until something get caught. I can provide code for all of this. Of course, handling only latin-based scripts (languages) would be much easier than non-latin scripts like Arabic, Hebrew, Thai, Indic, ... due to their peculiar/complex writing rules. See https://r12a.github.io/scripts/tutorial/part3 to get a grasp about what I'm talking about. Your large array of books is typically loaded a runtime, hence it must reside in a disk file. Then why not use a database (SQLite would be perfect)? I also have SQLite extensions for fuzzy search and such. The idea would be to create a table with full title, reduced title and possibly many other attributes (ISBN, author(s), editor, edition, ...). The precise schema will have to take your exact requirements into consideration. Anyway an SQLite-based solution would be orders of magnitude faster than bare AutoIt code.

October 24, 2020

1 point

Getting Value from Last Used Row in a defined range

Siwa reacted to water for a topic

I'm not sure I fully understand what you try to tell us With your code you are working with a range implicitly but do not store the range object in a variable. I do not recommend to work with a selection because it just displays the range on the screen. Hence it might interfere with what the user tries to do. That's why I recommend to work with ranges where possible.

October 24, 2020

1 point

I Cannot reach some command in a page...

MarcoMonte reacted to Nine for a topic

Glad it worked for you

October 22, 2020

1 point

Sign In

jchd

Points

Posts

TheSaint

Points

Posts

Nine

Points

Posts

TheDcoder

Points

Posts

Popular Content

Best practise for best 'search in filenames' hits?

Best practise for best 'search in filenames' hits?

Best practise for best 'search in filenames' hits?

Getting Value from Last Used Row in a defined range

Best practise for best 'search in filenames' hits?

Getting Value from Last Used Row in a defined range

I Cannot reach some command in a page...

Browse

AutoIt Resources

Release

Beta