Jump to content

extract a sentence containing word in text file


Recommended Posts

3-liner:

For $sentence In StringSplit(StringRegExpReplace(FileRead("2.txt", FileDelete("results.txt")-2),"([\!\.\?])","$1.!?"), ".!?",3)
  If StringRegExp($sentence,"\W" & StringStripWS(FileRead("1.txt"),2) & "\W") Then FileWriteLine("results.txt",$sentence)
Next

@mikell, fork and knife fed :)

1.txt 2.txt results.txt

Code hard, but don’t hard code...

Link to comment
Share on other sites

11 hours ago, JockoDundee said:

3-liner:

For $sentence In StringSplit(StringRegExpReplace(FileRead("2.txt", FileDelete("results.txt")-2),"([\!\.\?])","$1.!?"), ".!?",3)
  If StringRegExp($sentence,"\W" & StringStripWS(FileRead("1.txt"),2) & "\W") Then FileWriteLine("results.txt",$sentence)
Next

@mikell, fork and knife fed :)

1.txt 9 B · 0 downloads 2.txt 238 B · 0 downloads results.txt 143 B · 0 downloads

how can i make it extract only 3 lines

@JockoDundee

Link to comment
Share on other sites

Link to comment
Share on other sites

28 minutes ago, vinnyMS said:

i have a list of words and a 20k lines text file. i need 3 lines per word

1) if one sentence has two different matching words in it, should the sentence appear just once?

2) what if one of the words is over its 3 match limit? show it once, or don’t show at all?

3) what order should the results come in, original sentence order, or word order, or random?

4) and and only the sentence should be displayed, with no indication what words caused the match?

Code hard, but don’t hard code...

Link to comment
Share on other sites

Close Enough :whisper:

#include <Array.au3>

$txt = 'The light that fills the room feels cold and blue, tinted by the shades across the window. This wind' _
         & 'ow faces south. so the light trickles in slowly and at first I can ignore it. but eventually I must ' _
         & 'open my eyes to this underwater light, and take a deep the breath in.  I used to get up so early. Some da' _
         & 'ys I would go outside and watch the sunrise. warming my hands with a mug of herbal tea. The world wa' _
         & 's quie....'

$sWords = "the|cold|light"
$word_MatchRepeat = 3

$array = _a($txt, $sWords, $word_MatchRepeat)
_ArrayDisplay($array, "Results")

Func _a($s, $wPatt, $WR)
    Local $aWords = StringSplit($wPatt, "|"), $patt = '[^.]+(\b\Q' & StringReplace($wPatt, "|", "\E|\b\Q") & _
            '\E)+([.$]|[^.]+\.)(*SKIP)(*F)|\b[^.]+\.|[^ ]{2,}|[.$]{2,}'
    Local $a = StringRegExp(StringRegExpReplace($s, $patt, ""), '\b[^.]+\.', 3)
    For $i = 0 To $aWords[0]
        _ArrayColInsert($a, 1)
    Next
    For $i = 1 To $aWords[0]
        For $j = 0 To UBound($a) - 1
            If StringRegExp($a[$j][0], $aWords[$i]) Then
                $a[$j][$i] = $aWords[$i]
                $a[$j][UBound($a, 2) - 1] += 1
            EndIf
        Next
    Next
    _ArraySort($a, 0, Default, Default, UBound($a, 2) - 1)
    _ArrayColInsert($aWords, 1)

    Local $aNew[0][2]
    For $i = 1 To $aWords[0][0]
        Local $x = 0, $iCount = 0, $1
        Do
            $x += 1
            If $aWords[$i][1] >= $WR Then ExitLoop
                $index = _ArraySearch($a, "\w", Default, Default, Default, 3, Default, $i)
                If $index < 0 Then ExitLoop
                _ArrayAdd($aNew, $a[$index][0] & "|" & $aWords[$i][0], Default, "|")
                For $j = 1 To $aWords[0][0]
                    If $j <> $i Then
                        If $a[$index][$j] Then
                            $aWords[$j][1] += 1
                            If $aWords[$j][1] > $WR Then
                                $1 = _ArraySearch($aNew, $aWords[$j][0], Default, Default, 1, Default, Default, 1)
                                If $1 >= 0 Then
                                    _ArrayDelete($aNew, $1)
                                EndIf
                            EndIf
                            $aNew[UBound($aNew) - 1][1] &= ", " & $a[$index][$j]
                        EndIf
                    Else
                        $aWords[$i][1] += 1
                    EndIf
                    $a[$index][$j] = ""
                Next
            $iCount += 1
        Until $iCount > $WR
    Next
    Return $aNew
EndFunc

 

Link to comment
Share on other sites

On 4/20/2021 at 8:48 PM, JockoDundee said:

1) if one sentence has two different matching words in it, should the sentence appear just once?

2) what if one of the words is over its 3 match limit? show it once, or don’t show at all?

3) what order should the results come in, original sentence order, or word order, or random?

4) and and only the sentence should be displayed, with no indication what words caused the match?

@JockoDundee is it easy to program

Link to comment
Share on other sites

1 hour ago, vinnyMS said:

is it easy to program

yes.

the problem is that I don’t do homework, not my own and certainly not someone else’s.

and this is some form of homework, or test, with a rigid and arbitrary requirement.

why? because what you are asking for is, IMO, not something that solves a problem in the real world.

so let’s say there is a word file that exists with 3 words: 

cat
dog
tree

and a sentence file with 6 sentences:

1. The cat ate the dog.
2. The dog ate the tree.
3. The tree ate the dog.
4. The dog ate the cat.
5. The cat ate the tree.
6. The tree ate the cat.

you would say the result file should be sentences

1,4,5,2,3

and yet if I were to *ADD* the word ate to the word file:

ate
cat
dog
tree

the result file would actually be *REDUCED*:

1,2,3

furthermore, changing the order in the word file changes the results file, -  

and all this without even an indication which word caused the match to appear - or disappear?

Nope, nobody is solving a real world problem like this.

Prove me wrong, what is the use case?

Code hard, but don’t hard code...

Link to comment
Share on other sites

I have a book as a text file, (not pdf) and a word list where I study the context use of the words from the book. I need to see how the words are placed in their sentences in the book. I picked important words and need to see their use. 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...