vinnyMS Posted April 19, 2021 Share Posted April 19, 2021 i need a script that can extract a sentence containing a word written in a list. the result is a text file with sentences extracted with a period as a sentence end limit. after and before a period is the extracted sentence. word list text file: word 1 word 2 word 3 extracted sentence written in "sentence" text file: this is word 1 sentence. this is word 2 sentence. this is word 3 sentence. Link to comment Share on other sites More sharing options...
Musashi Posted April 19, 2021 Share Posted April 19, 2021 Just to understand better : Is this what you want ? Sourcetext : Sentence 1 without the searched term.Sentence 2 is word 1 sentence.Sentence 3 without the searched term.Sentence 4 without the searched term.Sentence 5 is word 2 sentence.Sentence 6 is word 3 sentence.Sentence 7 without the searched term. Word list (as textfile) : word 1 , word 2 , word 3 Resulttext : Sentence 2 is word 1 sentence.Sentence 5 is word 2 sentence.Sentence 6 is word 3 sentence. By the way: It would be helpful if you could provide a source and the word list as text files. Only a few helpers have time and passion to create the files themselves . "In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move." Link to comment Share on other sites More sharing options...
mikell Posted April 19, 2021 Share Posted April 19, 2021 (edited) 55 minutes ago, Musashi said: Only a few helpers have time and passion to create the files themselves ... but some are passionate guys who create something similar themselves so this allows a first try #Include <Array.au3> $p = "word 1|word 2|word 3" $txt = " Sentence 1 without the searched term. Sentence 2 is word 1 sentence. " & @crlf & "Sentence 3 without the searched term. Sentence 4 without the searched term. Sentence 5, is word 2 sentence. " & @crlf & "Sentence 6 is word 3 sentence. Sentence 7 is otherword 3 sentence. Sentence 8 without the searched term. " $res = StringRegExp($txt, '(?s)\s*([^.]+\b(?|' & $p & ')\b[^.]+\.)', 3) _ArrayDisplay($res) Edit Waiting now for new requirements to come Edited April 19, 2021 by mikell vinnyMS, Musashi and FrancescoDiMuro 1 1 1 Link to comment Share on other sites More sharing options...
Alecsis1 Posted April 19, 2021 Share Posted April 19, 2021 Hello! Try something like this. Btw, sorry for my bad English… vimmyMS.zip Link to comment Share on other sites More sharing options...
JockoDundee Posted April 19, 2021 Share Posted April 19, 2021 59 minutes ago, Alecsis1 said: Btw, sorry for my bad English… I doubt vimmy even cares about such things Code hard, but don’t hard code... Link to comment Share on other sites More sharing options...
Musashi Posted April 19, 2021 Share Posted April 19, 2021 (edited) 3 hours ago, JockoDundee said: I doubt vimmy even cares about such things I doubt that too . @Alecsis1 : As far as I have tested this on the quick, your script also delivers the desired result. However, the RegEx variant from @mikell is much shorter (as usual ). BTW : I would remove the following directive : [...] #pragma compile(UPX, True) [...] AV scanners react badly on UPX compressed executables. Use #AutoIt3Wrapper_UseUpx = N or #pragma compile(UPX, False) (which is the default) instead. Edited April 19, 2021 by Musashi typo "In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move." Link to comment Share on other sites More sharing options...
Nine Posted April 19, 2021 Share Posted April 19, 2021 An hybrid solution maybe ? #include <Constants.au3> $p = "word 1|word 2|word 3" $txt = "Sentence 1 without the searched term. Sentence 2 is word 1 sentence. " & @crlf & "Sentence 3 without the searched term. Sentence 4 without the searched term. Sentence 5, is word 2 sentence. " & @crlf & "Sentence 6 is word 3 sentence. Sentence 7 is otherword 3 sentence. Sentence 8 without the searched term." $aSentence = StringSplit($txt, ".", $STR_NOCOUNT) For $i = 0 to UBound($aSentence) - 2 If StringRegExp($aSentence[$i], "\b(" & $p & ")\b") Then FileWriteLine("Result.txt", StringStripWS($aSentence[$i], $STR_STRIPLEADING+$STR_STRIPTRAILING)) Next vinnyMS 1 “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Screen Scraping Multi-Threading Made Easy Link to comment Share on other sites More sharing options...
mikell Posted April 19, 2021 Share Posted April 19, 2021 4 hours ago, Musashi said: delivers the desired result I confess I omitted some details because it sounded a bit like spoon-feeding #Include <Array.au3> #cs 1.txt : word 1 word 2 word 3 #ce $p = StringReplace(StringStripWS(FileRead("1.txt"), 3), @crlf, "|") ;$p = "word 1|word 2|word 3" #cs 2.txt : Sentence 1 without the searched term. Sentence 2 is word 1 sentence. Sentence 3 without the searched term. Sentence 4 without the searched term. Sentence 5, is word 2 sentence. Sentence 6 is word 3 sentence. Sentence 7 is otherword 3 sentence. Sentence 8 without the searched term. #ce $txt = FileRead("2.txt") ;$txt = " Sentence 1 without the searched term. Sentence 2 is word 1 sentence. " & @crlf & "Sentence 3 without the searched term. Sentence 4 without the searched term. Sentence 5, is word 2 sentence. " & @crlf & "Sentence 6 is word 3 sentence. Sentence 7 is otherword 3 sentence. Sentence 8 without the searched term. " $res = StringRegExp($txt, '(?s)\s*([^.]+\b(?|' & $p & ')\b[^.]+\.)', 3) ;_ArrayDisplay($res) FileWrite("result.txt", _ArrayToString($res, @crlf)) Link to comment Share on other sites More sharing options...
vinnyMS Posted April 19, 2021 Author Share Posted April 19, 2021 (edited) On 4/19/2021 at 9:49 AM, mikell said: I confess I omitted some details because it sounded a bit like spoon-feeding #Include <Array.au3> #cs 1.txt : word 1 word 2 word 3 #ce $p = StringReplace(StringStripWS(FileRead("1.txt"), 3), @crlf, "|") ;$p = "word 1|word 2|word 3" #cs 2.txt : Sentence 1 without the searched term. Sentence 2 is word 1 sentence. Sentence 3 without the searched term. Sentence 4 without the searched term. Sentence 5, is word 2 sentence. Sentence 6 is word 3 sentence. Sentence 7 is otherword 3 sentence. Sentence 8 without the searched term. #ce $txt = FileRead("2.txt") ;$txt = " Sentence 1 without the searched term. Sentence 2 is word 1 sentence. " & @crlf & "Sentence 3 without the searched term. Sentence 4 without the searched term. Sentence 5, is word 2 sentence. " & @crlf & "Sentence 6 is word 3 sentence. Sentence 7 is otherword 3 sentence. Sentence 8 without the searched term. " $res = StringRegExp($txt, '(?s)\s*([^.]+\b(?|' & $p & ')\b[^.]+\.)', 3) ;_ArrayDisplay($res) FileWrite("result.txt", _ArrayToString($res, @crlf)) thank you it works, except it adds the text file 1 words in the end of result.txt Edited April 22, 2021 by vinnyMS Link to comment Share on other sites More sharing options...
vinnyMS Posted April 19, 2021 Author Share Posted April 19, 2021 can you make it extract only 3 sentences each time Link to comment Share on other sites More sharing options...
Nine Posted April 19, 2021 Share Posted April 19, 2021 (edited) This ? #include <Constants.au3> $p = "\Q" & StringReplace(StringStripWS(FileRead("1.txt"), 3), @CRLF, "\E|\Q") & "\E" $NUMBER_OF_LINES = 3 $txt = "Sentence 1 without the searched term? Sentence 2 is (TCP/IP) sentence. " & @crlf & "Sentence 3 without the searched term! Sentence 4 without the searched term? Sentence 5, is word 2 sentence. " & @crlf & "Sentence 6 is word 3 sentence. Sentence 7 is otherword 3 sentence. Sentence 8 without the searched term ?" ;$txt = FileRead("2.txt") $aSentence = StringSplit($txt, ".?!", $STR_NOCOUNT) For $i = 0 to $NUMBER_OF_LINES - 1 If StringRegExp($aSentence[$i], "\W" & $p & "\W") Then FileWriteLine("Result.txt", StringStripWS($aSentence[$i], $STR_STRIPLEADING+$STR_STRIPTRAILING)) Next Edited April 19, 2021 by Nine “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Screen Scraping Multi-Threading Made Easy Link to comment Share on other sites More sharing options...
vinnyMS Posted April 19, 2021 Author Share Posted April 19, 2021 this extracts a sentence to the period, removes the period and extracts the next sentence that does have the word in it then saves a s result with all the sentences extracted also what i don't need. 2.txt Transmission Control Protocol/Internet Protocol (TCP/IP) is a protocol system—a collection of protocols that supports network communications. The answer to the question What is a protocol? must begin with the question What is a network? Transmission Control Protocol/Internet Protocol (TCP/IP) is a protocol system—a collection of protocols that supports network communications. The answer to the question What is a protocol? must begin with the question What is a network? Transmission Control Protocol/Internet Protocol (TCP/IP) is a protocol system—a collection of protocols that supports network communications. The answer to the question What is a protocol? must begin with the question What is a network? result.txt Transmission Control Protocol/Internet Protocol (TCP/IP) is a protocol system—a collection of protocols that supports network communications The answer to the question What is a protocol? must begin with the question What is a network? Transmission Control Protocol/Internet Protocol (TCP/IP) is a protocol system—a collection of protocols that supports network communications The answer to the question What is a protocol? must begin with the question What is a network? Transmission Control Protocol/Internet Protocol (TCP/IP) is a protocol system—a collection of protocols that supports network communications Link to comment Share on other sites More sharing options...
Nine Posted April 19, 2021 Share Posted April 19, 2021 I think I gave you enough tools to work with (as well as the others). Adjust the code to fit your needs now. “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Screen Scraping Multi-Threading Made Easy Link to comment Share on other sites More sharing options...
JockoDundee Posted April 19, 2021 Share Posted April 19, 2021 3 hours ago, Nine said: An hybrid solution maybe ? Did you use “An” because H is silent in French? Code hard, but don’t hard code... Link to comment Share on other sites More sharing options...
Nine Posted April 19, 2021 Share Posted April 19, 2021 38 minutes ago, JockoDundee said: Did you use “An” because H is silent in French? So you are telling I should have used a hybrid ? “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Screen Scraping Multi-Threading Made Easy Link to comment Share on other sites More sharing options...
JockoDundee Posted April 19, 2021 Share Posted April 19, 2021 3 minutes ago, Nine said: So you are telling I should have used a hybrid Yes. We say “An hour”, but “A history”. Or “An unsigned integer”, but “A Ulimit”. It depends on whether there is a consonant sound that starts the word after the a or not. Musashi and FrancescoDiMuro 2 Code hard, but don’t hard code... Link to comment Share on other sites More sharing options...
Nine Posted April 19, 2021 Share Posted April 19, 2021 Ahhh. Always had a hard time with languages. One of my prof told me once, that I speak better Fortran that I speak french. FrancescoDiMuro, JockoDundee and Musashi 3 “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Screen Scraping Multi-Threading Made Easy Link to comment Share on other sites More sharing options...
JockoDundee Posted April 19, 2021 Share Posted April 19, 2021 25 minutes ago, Nine said: Always had a hard time with languages. No, you’re actually correct. Because of your Demain comme jamais tag, whenever I read your posts, I can’t help but hear them (in my mind’s ear) in a thick French accent. So I heard “An eye-brid solution”, which is perfect. FrancescoDiMuro 1 Code hard, but don’t hard code... Link to comment Share on other sites More sharing options...
Nine Posted April 19, 2021 Share Posted April 19, 2021 “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Screen Scraping Multi-Threading Made Easy Link to comment Share on other sites More sharing options...
vinnyMS Posted April 19, 2021 Author Share Posted April 19, 2021 i tried to fix it, i can't find how to modify it to make it work Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now