Jump to content

Search text file for 2 words


Acce
 Share

Recommended Posts

Hi Im having a text file with over 160k lines . Im looking for a way to search it quickly 

im having about 40 pics im searching for 

the text file I have is built up like so:  b.imageUrl = "/images//G_SS_Preview.png";

The data I haver to search for is only "Preview.png" but searching for that gives multiple hits in this text document with results i dont want, Im wondring if I can search the lines that it must include "b.imageUrl" and "Preview.png" to give a match result so I can quickly change the images var to search for all my images ? 

Link to comment
Share on other sites

$word1 = "Preview.png"
$word2 = "b.imageUrl"

$file_in = "input_file.txt"
$file_out = "output_file.txt"

$text_in = FileRead($file_in)
$text_in = StringSplit($text_in, @CRLF, 1)

$text_out = ''

For $i = 1 To $text_in[0]
    $line = $text_in[$i]
    
    If StringInStr($line, $word1) And StringInStr($line, $word2) Then
        $text_out &= $line & @CRLF
    EndIf
Next

FileDelete($file_out)
FileWrite($file_out, $text_out)

 

Link to comment
Share on other sites

Here is an alternative way. You get an array with the matching line numbers.

#include <File.au3>
#include <Array.au3>
Global $sFilePath, $aSearchArr, $aResultArr[0], $sSearch1, $sSearch2
$sFilePath = @ScriptDir & '\input_file.txt'
$sSearch1  = "b.imageUrl"
$sSearch2  = "Preview.png"
_FileReadToArray($sFilePath, $aSearchArr)
If @error Then Exit
For $i = 1 To $aSearchArr[0]
    If StringInStr($aSearchArr[$i], $sSearch1) And StringInStr($aSearchArr[$i], $sSearch2) Then
        _ArrayAdd($aResultArr, "LineNo. : " &  $i & " => " & $aSearchArr[$i])
    EndIf
Next
_ArrayDisplay($aResultArr, 'Matches : ')

 

EDIT : With 160.000 lines, StringRegExp can possibly be faster than 2 times StringInStr (try it out ;))

If StringRegExp($aSearchArr[$i], '(?i)' & $sSearch1 & '.*' & $sSearch2) Then
        _ArrayAdd($aResultArr, "LineNo. : " &  $i & " => " & $aSearchArr[$i])
    EndIf

 

input_file.txt

Edited by Musashi

Musashi-C64.png

"In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move."

Link to comment
Share on other sites

@Musashi

For the fun :idiot:

#include <Array.au3>

$txt = FileRead(@ScriptDir & '\input_file.txt')
Local $sSearch1  = "b.imageUrl", $sSearch2  = "Preview.png"

$aResult = StringRegExp(Execute ( "'" & StringRegExpReplace(StringReplace($txt, "'", "''"), "(?m)^", "' & Assign(""iReplace"", Eval(""iReplace"")+1) * Eval(""iReplace"") & ' - ") & "'" ), '(?i).*?\Q' & $sSearch1 & '\E.*?\Q' & $sSearch2 & '\E.*', 3)

_ArrayDisplay($aResult)

 

Link to comment
Share on other sites

11 minutes ago, mikell said:

BTW about your edit, may I suggest ...

Your suggestions are always welcome, especially when it comes to regular expressions :).

I have implemented your (more elegant) variant into the script :

#include <File.au3>
#include <Array.au3>
Global $sFilePath, $aSearchArr, $aResultArr[0], $sSearch1, $sSearch2
$sFilePath = @ScriptDir & '\input_file.txt'
$sSearch1  = "b.imageUrl"
$sSearch2  = "Preview.png"
_FileReadToArray($sFilePath, $aSearchArr)
If @error Then Exit
For $i = 1 To $aSearchArr[0]
    If StringRegExp($aSearchArr[$i], '(?i)\Q' & $sSearch1 & '\E.*?\Q' & $sSearch2 & '\E') Then
        _ArrayAdd($aResultArr, "LineNo. : " &  $i & " => " & $aSearchArr[$i])
    EndIf
Next
_ArrayDisplay($aResultArr, 'Matches : ')

Now @Acce has several solutions to choose from, and that's all that matters.

Musashi-C64.png

"In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move."

Link to comment
Share on other sites

Thanks guys , actually I solved this myself about 5min after posting here here is what I did

If StringInStr($Line, $Search)  And StringInStr($Line, "b.imageUrl") Then 
       do something

, will look at what you guys have suggested and see if that is better 

StringRegExp  looks very interesting to use  

thanks for your help :) 

Edited by Acce
Link to comment
Share on other sites

21 hours ago, Musashi said:

EDIT : With 160.000 lines, StringRegExp can possibly be faster than 2 times StringInStr (try it out ;))

 

 

StringInStr() is much faster when CaseSensitive option is ON (1), so when speed is important and it's possible then I use casesense=1

 

$word1 = StringLower("Preview.png")
$word2 = StringLower("b.imageUrl")

$file_in = "input_file.txt"
$file_out = "output_file.txt"

$text_in = StringLower(FileRead($file_in))
$text_in = StringSplit($text_in, @CRLF, 1)

$text_out = ''

For $i = 1 To $text_in[0]
    $line = $text_in[$i]

    If StringInStr($line, $word1, 1) And StringInStr($line, $word2, 1) Then
        $text_out &= $line & @CRLF
    EndIf
Next

FileDelete($file_out)
FileWrite($file_out, $text_out)

 

Edited by Zedna
Link to comment
Share on other sites

As I said in my previous post, it seems that StringInStr with CaseSense=1 is faster than RegExp (in this case).

Here are my testing data/script:

input_file0.txt (10 lines) --> input_file.txt (160 000 lines) copied by this helper script:

$file_in = "input_file0.txt"
$file_out = "input_file.txt"

$text_in = FileRead($file_in)
$text_out = ''

For $i = 1 To 16000
    $text_out &= $text_in
Next

FileDelete($file_out)
FileWrite($file_out, $text_out)

 

And here is main testing script which measures both variants StringInStr x RegExp:

$word1 = StringLower("Preview.png")
$word2 = StringLower("b.imageUrl")

$file_in = "input_file.txt"
$file_out = "output_file.txt"

$text_in = StringLower(FileRead($file_in))
$text_in = StringSplit($text_in, @CRLF, 1)

$text_out = ''

$start = TimerInit()
For $i = 1 To $text_in[0]
    $line = $text_in[$i]

    If StringInStr($line, $word1, 1) And StringInStr($line, $word2, 1) Then
        $text_out &= $line & @CRLF
    EndIf
Next
$end1 = TimerDiff($start)

FileDelete($file_out)
FileWrite($file_out, $text_out)

$file_out = "output_file2.txt"
$text_out = ''

$start = TimerInit()
For $i = 1 To $text_in[0]
    $line = $text_in[$i]

    If StringRegExp($line, '(?i)' & $word1 & '.*' & $word2) Then
        $text_out &= $line & @CRLF
    EndIf
Next
$end2 = TimerDiff($start)

FileDelete($file_out)
FileWrite($file_out, $text_out)

MsgBox(0,'Time','StringInStr: ' & $end1 & @CRLF & 'RegExp: ' & $end2)

result is:

StringInStr: 849.7688
RegExp: 1190.2022
 

... so StringInStr is FASTER than RegExp in this case.

 

Note:

I'm not RegExp expert so I can't fix some problem in this (copied) RegExp because output_file2.txt is empty.

But I think that this comparision is relevant no matter of this bug.

input_file0.txt

Edited by Zedna
Link to comment
Share on other sites

1 minute ago, Acce said:

nice didn't see the first time thanks , searching a text file this big really has to be as fast as possible. thanks again for the help  really want expecting this to be faster then RegExp 

What I think is funny with my script is that it browses these 160k lines looking for download links for png files , and the files gets downloaded faster then it takes to look up the links , lol 

Edited by Acce
Link to comment
Share on other sites

Post sample of your input TXT file, we can optimize searching by some tricks like this (based on how real data looks like):

For $i = 1 To $text_in[0]
    $line = $text_in[$i]
    If StringLen($line) < 22 Then ContinueLoop ; -----> optimization

    If StringInStr($line, $word1, 1) And StringInStr($line, $word2, 1) Then
        $text_out &= $line & @CRLF
    EndIf
Next

 

EDIT:

here is other liitle optimization:

instead of

$word1 = StringLower("Preview.png")
$word2 = StringLower("b.imageUrl")

use rather this

$word1 = StringLower("b.imageUrl")
$word2 = StringLower("Preview.png")

--> as first search for "b.imageUrl"

Edited by Zedna
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...