Acce Posted September 13, 2020 Share Posted September 13, 2020 Hi Im having a text file with over 160k lines . Im looking for a way to search it quickly im having about 40 pics im searching for the text file I have is built up like so: b.imageUrl = "/images//G_SS_Preview.png"; The data I haver to search for is only "Preview.png" but searching for that gives multiple hits in this text document with results i dont want, Im wondring if I can search the lines that it must include "b.imageUrl" and "Preview.png" to give a match result so I can quickly change the images var to search for all my images ? Link to comment Share on other sites More sharing options...
faustf Posted September 13, 2020 Share Posted September 13, 2020 read file and use stringrexp ? you just tryed ? Link to comment Share on other sites More sharing options...
Zedna Posted September 13, 2020 Share Posted September 13, 2020 $word1 = "Preview.png" $word2 = "b.imageUrl" $file_in = "input_file.txt" $file_out = "output_file.txt" $text_in = FileRead($file_in) $text_in = StringSplit($text_in, @CRLF, 1) $text_out = '' For $i = 1 To $text_in[0] $line = $text_in[$i] If StringInStr($line, $word1) And StringInStr($line, $word2) Then $text_out &= $line & @CRLF EndIf Next FileDelete($file_out) FileWrite($file_out, $text_out) Resources UDF ResourcesEx UDF AutoIt Forum Search Link to comment Share on other sites More sharing options...
Musashi Posted September 13, 2020 Share Posted September 13, 2020 (edited) Here is an alternative way. You get an array with the matching line numbers. #include <File.au3> #include <Array.au3> Global $sFilePath, $aSearchArr, $aResultArr[0], $sSearch1, $sSearch2 $sFilePath = @ScriptDir & '\input_file.txt' $sSearch1 = "b.imageUrl" $sSearch2 = "Preview.png" _FileReadToArray($sFilePath, $aSearchArr) If @error Then Exit For $i = 1 To $aSearchArr[0] If StringInStr($aSearchArr[$i], $sSearch1) And StringInStr($aSearchArr[$i], $sSearch2) Then _ArrayAdd($aResultArr, "LineNo. : " & $i & " => " & $aSearchArr[$i]) EndIf Next _ArrayDisplay($aResultArr, 'Matches : ') EDIT : With 160.000 lines, StringRegExp can possibly be faster than 2 times StringInStr (try it out ) If StringRegExp($aSearchArr[$i], '(?i)' & $sSearch1 & '.*' & $sSearch2) Then _ArrayAdd($aResultArr, "LineNo. : " & $i & " => " & $aSearchArr[$i]) EndIf input_file.txt Edited September 13, 2020 by Musashi "In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move." Link to comment Share on other sites More sharing options...
mikell Posted September 13, 2020 Share Posted September 13, 2020 @Musashi For the fun #include <Array.au3> $txt = FileRead(@ScriptDir & '\input_file.txt') Local $sSearch1 = "b.imageUrl", $sSearch2 = "Preview.png" $aResult = StringRegExp(Execute ( "'" & StringRegExpReplace(StringReplace($txt, "'", "''"), "(?m)^", "' & Assign(""iReplace"", Eval(""iReplace"")+1) * Eval(""iReplace"") & ' - ") & "'" ), '(?i).*?\Q' & $sSearch1 & '\E.*?\Q' & $sSearch2 & '\E.*', 3) _ArrayDisplay($aResult) Musashi 1 Link to comment Share on other sites More sharing options...
Musashi Posted September 13, 2020 Share Posted September 13, 2020 25 minutes ago, mikell said: @Musashi For the fun I would have been deeply grieved if you had not delivered a "one-liner" . Nice job (as usual). "In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move." Link to comment Share on other sites More sharing options...
mikell Posted September 13, 2020 Share Posted September 13, 2020 Thanks Obviously the best script to use is yours (or the one from Zedna) BTW about your edit, may I suggest If StringRegExp($aSearchArr[$i], '(?i)\Q' & $sSearch1 & '\E.*?\Q' & $sSearch2 & '\E') Then \Q...\E could be used in case of special chars in the search terms (such as "b*.imageURL" and so on) Link to comment Share on other sites More sharing options...
Musashi Posted September 13, 2020 Share Posted September 13, 2020 11 minutes ago, mikell said: BTW about your edit, may I suggest ... Your suggestions are always welcome, especially when it comes to regular expressions . I have implemented your (more elegant) variant into the script : #include <File.au3> #include <Array.au3> Global $sFilePath, $aSearchArr, $aResultArr[0], $sSearch1, $sSearch2 $sFilePath = @ScriptDir & '\input_file.txt' $sSearch1 = "b.imageUrl" $sSearch2 = "Preview.png" _FileReadToArray($sFilePath, $aSearchArr) If @error Then Exit For $i = 1 To $aSearchArr[0] If StringRegExp($aSearchArr[$i], '(?i)\Q' & $sSearch1 & '\E.*?\Q' & $sSearch2 & '\E') Then _ArrayAdd($aResultArr, "LineNo. : " & $i & " => " & $aSearchArr[$i]) EndIf Next _ArrayDisplay($aResultArr, 'Matches : ') Now @Acce has several solutions to choose from, and that's all that matters. "In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move." Link to comment Share on other sites More sharing options...
Acce Posted September 14, 2020 Author Share Posted September 14, 2020 (edited) Thanks guys , actually I solved this myself about 5min after posting here here is what I did If StringInStr($Line, $Search) And StringInStr($Line, "b.imageUrl") Then do something , will look at what you guys have suggested and see if that is better StringRegExp looks very interesting to use thanks for your help Edited September 14, 2020 by Acce Link to comment Share on other sites More sharing options...
Zedna Posted September 14, 2020 Share Posted September 14, 2020 (edited) 21 hours ago, Musashi said: EDIT : With 160.000 lines, StringRegExp can possibly be faster than 2 times StringInStr (try it out ) StringInStr() is much faster when CaseSensitive option is ON (1), so when speed is important and it's possible then I use casesense=1 $word1 = StringLower("Preview.png") $word2 = StringLower("b.imageUrl") $file_in = "input_file.txt" $file_out = "output_file.txt" $text_in = StringLower(FileRead($file_in)) $text_in = StringSplit($text_in, @CRLF, 1) $text_out = '' For $i = 1 To $text_in[0] $line = $text_in[$i] If StringInStr($line, $word1, 1) And StringInStr($line, $word2, 1) Then $text_out &= $line & @CRLF EndIf Next FileDelete($file_out) FileWrite($file_out, $text_out) Edited September 14, 2020 by Zedna Resources UDF ResourcesEx UDF AutoIt Forum Search Link to comment Share on other sites More sharing options...
Zedna Posted September 14, 2020 Share Posted September 14, 2020 (edited) As I said in my previous post, it seems that StringInStr with CaseSense=1 is faster than RegExp (in this case). Here are my testing data/script: input_file0.txt (10 lines) --> input_file.txt (160 000 lines) copied by this helper script: $file_in = "input_file0.txt" $file_out = "input_file.txt" $text_in = FileRead($file_in) $text_out = '' For $i = 1 To 16000 $text_out &= $text_in Next FileDelete($file_out) FileWrite($file_out, $text_out) And here is main testing script which measures both variants StringInStr x RegExp: expandcollapse popup$word1 = StringLower("Preview.png") $word2 = StringLower("b.imageUrl") $file_in = "input_file.txt" $file_out = "output_file.txt" $text_in = StringLower(FileRead($file_in)) $text_in = StringSplit($text_in, @CRLF, 1) $text_out = '' $start = TimerInit() For $i = 1 To $text_in[0] $line = $text_in[$i] If StringInStr($line, $word1, 1) And StringInStr($line, $word2, 1) Then $text_out &= $line & @CRLF EndIf Next $end1 = TimerDiff($start) FileDelete($file_out) FileWrite($file_out, $text_out) $file_out = "output_file2.txt" $text_out = '' $start = TimerInit() For $i = 1 To $text_in[0] $line = $text_in[$i] If StringRegExp($line, '(?i)' & $word1 & '.*' & $word2) Then $text_out &= $line & @CRLF EndIf Next $end2 = TimerDiff($start) FileDelete($file_out) FileWrite($file_out, $text_out) MsgBox(0,'Time','StringInStr: ' & $end1 & @CRLF & 'RegExp: ' & $end2) result is: StringInStr: 849.7688 RegExp: 1190.2022 ... so StringInStr is FASTER than RegExp in this case. Note: I'm not RegExp expert so I can't fix some problem in this (copied) RegExp because output_file2.txt is empty. But I think that this comparision is relevant no matter of this bug. input_file0.txt Edited September 14, 2020 by Zedna Resources UDF ResourcesEx UDF AutoIt Forum Search Link to comment Share on other sites More sharing options...
Acce Posted September 14, 2020 Author Share Posted September 14, 2020 wow so many replies by this little question really appreciate it , Just a silly dumb question then how do I activate CaseSense=1 ? Link to comment Share on other sites More sharing options...
Zedna Posted September 14, 2020 Share Posted September 14, 2020 (edited) Look at my examples, it's third parameter in StringInStr(), in combination with StringLower() ... Edited September 14, 2020 by Zedna Resources UDF ResourcesEx UDF AutoIt Forum Search Link to comment Share on other sites More sharing options...
Acce Posted September 14, 2020 Author Share Posted September 14, 2020 nice didn't see the first time thanks , searching a text file this big really has to be as fast as possible. thanks again for the help really want expecting this to be faster then RegExp Link to comment Share on other sites More sharing options...
Acce Posted September 14, 2020 Author Share Posted September 14, 2020 (edited) 1 minute ago, Acce said: nice didn't see the first time thanks , searching a text file this big really has to be as fast as possible. thanks again for the help really want expecting this to be faster then RegExp What I think is funny with my script is that it browses these 160k lines looking for download links for png files , and the files gets downloaded faster then it takes to look up the links , lol Edited September 14, 2020 by Acce Link to comment Share on other sites More sharing options...
Zedna Posted September 14, 2020 Share Posted September 14, 2020 (edited) Post sample of your input TXT file, we can optimize searching by some tricks like this (based on how real data looks like): For $i = 1 To $text_in[0] $line = $text_in[$i] If StringLen($line) < 22 Then ContinueLoop ; -----> optimization If StringInStr($line, $word1, 1) And StringInStr($line, $word2, 1) Then $text_out &= $line & @CRLF EndIf Next EDIT: here is other liitle optimization: instead of $word1 = StringLower("Preview.png") $word2 = StringLower("b.imageUrl") use rather this $word1 = StringLower("b.imageUrl") $word2 = StringLower("Preview.png") --> as first search for "b.imageUrl" Edited September 14, 2020 by Zedna Resources UDF ResourcesEx UDF AutoIt Forum Search Link to comment Share on other sites More sharing options...
Acce Posted September 14, 2020 Author Share Posted September 14, 2020 Thanks for all your help not sure why I had them in opposite order . This was a small little topic but lots of helpful info here cant really say more then thank again Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now