queensoft Posted December 13, 2023 Share Posted December 13, 2023 File content (any subtitle .SRT file): expandcollapse popup1 00:00:01,543 --> 00:00:03,212 Din episoadele anterioare... 2 00:00:03,295 --> 00:00:07,299 Am vorbit cu un coleg care coordoneazπ un program de cercetare 3 00:00:07,382 --> 00:00:11,302 la Universitatea Heidelberg. ║i cred cπ te pot εnscrie. 4 00:00:11,385 --> 00:00:14,180 Fantastisch. Adicπ "fantastic" εn germanπ. 5 00:00:14,263 --> 00:00:17,017 Ea este nepoata mea, Tonya. Va locui cu noi. 6 00:00:17,726 --> 00:00:18,936 - Salut, Missy! - Tonya. 7 00:00:20,186 --> 00:00:22,146 - Ce faci? - Ies εn ora║ pe ascuns. 8 00:00:22,231 --> 00:00:23,273 Care-i planul? 9 00:00:23,356 --> 00:00:25,733 Am o jumate de pachet de ■igπri ║i o sticlπ de cherry. 10 00:00:25,817 --> 00:00:26,902 Stai sπ mπ εmbrac. Read file and split by empty line, return array. $array[0]='1 00:00:01,543 --> 00:00:03,212 Din episoadele anterioare...' $array[1]='2 00:00:03,295 --> 00:00:07,299 Am vorbit cu un coleg care coordoneazπ un program de cercetare' Already tried StringSplit by @crlf & @crlf , various RegExp patterns, none of them worked. Right now I"m trying this, but I"m sick as a dog and cannot focus: Read file, get all empty lines $txt = FileReadToArray("D:\Diverse 2\A87 subtitle video\The.Girl2.srt") ; read file to array Global $match01[1]=['0'] For $i=0 to UBound($txt)-1 ; parse all line if $txt[$i]='' Then ; if line is empty, add index to array of matches _ArrayAdd($match01, $i) EndIf Next After that, I should be able to concatenate lines (0-2, 3-6, ....) and feed to another array - but I'm lost here, something' not working. Thank you. Link to comment Share on other sites More sharing options...
AllenAA Posted December 13, 2023 Share Posted December 13, 2023 expandcollapse popup#include <array.au3> Local $sString = '1' & @CRLF & _ '00:00:01,543 --> 00:00:03,212' & @CRLF & _ 'Din episoadele anterioare...' & @CRLF & _ @CRLF & _ @CRLF & _ '2' & @CRLF & _ '00:00:03,295 --> 00:00:07,299' & @CRLF & _ 'Am vorbit cu un coleg care' & @CRLF & _ 'coordoneazπ un program de cercetare' & @CRLF & _ @CRLF & _ @CRLF & _ '3' & @CRLF & _ '00:00:07,382 --> 00:00:11,302' & @CRLF & _ 'la Universitatea Heidelberg.' & @CRLF & _ '║i cred cπ te pot εnscrie.' & @CRLF & _ @CRLF & _ @CRLF & _ '4' & @CRLF & _ '00:00:11,385 --> 00:00:14,180' & @CRLF & _ 'Fantastisch.' & @CRLF & _ 'Adicπ "fantastic" εn germanπ.' & @CRLF & _ @CRLF & _ @CRLF & _ '5' & @CRLF & _ '00:00:14,263 --> 00:00:17,017' & @CRLF & _ 'Ea este nepoata mea, Tonya.' & @CRLF & _ 'Va locui cu noi.' & @CRLF & _ @CRLF & _ @CRLF & _ '6' & @CRLF & _ '00:00:17,726 --> 00:00:18,936' & @CRLF & _ '- Salut, Missy!' & @CRLF & _ '- Tonya.' & @CRLF & _ @CRLF & _ @CRLF & _ '7' & @CRLF & _ '00:00:20,186 --> 00:00:22,146' & @CRLF & _ '- Ce faci?' & @CRLF & _ '- Ies εn ora║ pe ascuns.' & @CRLF & _ @CRLF & _ @CRLF & _ '8' & @CRLF & _ '00:00:22,231 --> 00:00:23,273' & @CRLF & _ 'Care-i planul?' & @CRLF & _ @CRLF & _ @CRLF & _ '9' & @CRLF & _ '00:00:23,356 --> 00:00:25,733' & @CRLF & _ 'Am o jumate de pachet de ■igπri' & @CRLF & _ '║i o sticlπ de cherry.' & @CRLF & _ @CRLF & _ @CRLF & _ '10' & @CRLF & _ '00:00:25,817 --> 00:00:26,902' & @CRLF & _ 'Stai sπ mπ εmbrac.' & @CRLF Local $aAry = StringRegExp($sString,'(?is)\d+\r\n\d{2}:\d{2}:\d{2},\d+\V+.+?(?=\r\n\r\n|$)',3) If Not @error Then _ArrayDisplay($aAry) Link to comment Share on other sites More sharing options...
queensoft Posted December 13, 2023 Author Share Posted December 13, 2023 Your script, with Local string - is working OK. But using FileRead is NOT working. Check attached files, start with ANSI (easier, right?) the UTF (but not necessary) Thanks. The.Girl2-ANSI.srt The.Girl-UTF.srt Link to comment Share on other sites More sharing options...
Danp2 Posted December 13, 2023 Share Posted December 13, 2023 The files use LF instead of CR/LF for the line delimiters. Latest Webdriver UDF Release Webdriver Wiki FAQs Link to comment Share on other sites More sharing options...
AllenAA Posted December 13, 2023 Share Posted December 13, 2023 5 minutes ago, queensoft said: Your script, with Local string - is working OK. But using FileRead is NOT working. Check attached files, start with ANSI (easier, right?) the UTF (but not necessary) Thanks. The.Girl2-ANSI.srt 89.26 kB · 0 downloads The.Girl-UTF.srt 92.02 kB · 0 downloads #include <array.au3> Local $sFile , $sString , $aAry $sFile = @ScriptDir & '\The.Girl-UTF.srt' $sString = _ReadFile($sFile) $aAry = StringRegExp($sString,'(?is)\d+\r?\n\d{2}:\d{2}:\d{2},\d+\V+.+?(?=\r?\n\r?\n|$)',3) If Not @error Then _ArrayDisplay($aAry , $sFile) $sFile = @ScriptDir & '\The.Girl2-ANSI.srt' $sString = _ReadFile($sFile) $aAry = StringRegExp($sString,'(?is)\d+\r?\n\d{2}:\d{2}:\d{2},\d+\V+.+?(?=\r?\n\r?\n|$)',3) If Not @error Then _ArrayDisplay($aAry , $sFile) Func _ReadFile($sFile) Local $sRet Local $hFile = FileOpen($sFile , FileGetEncoding($sFile)) $sRet = FileRead($hFile) FileClose($hFile) Return $sRet EndFunc Danp2 1 Link to comment Share on other sites More sharing options...
queensoft Posted December 13, 2023 Author Share Posted December 13, 2023 Working GREAT, for both!!! I'm not even gonna try to understand the RegEx pattern! Link to comment Share on other sites More sharing options...
Nine Posted December 13, 2023 Share Posted December 13, 2023 If you prefer StringSplit then try this : #include <Constants.au3> #include <Array.au3> Local $aList = StringSplit(FileRead("The.Girl2-ANSI.srt"), @LF & @LF, $STR_ENTIRESPLIT) _ArrayDisplay($aList) “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Screen Scraping Multi-Threading Made Easy Link to comment Share on other sites More sharing options...
queensoft Posted December 13, 2023 Author Share Posted December 13, 2023 I think I already did (all variants: cr, lf, crlf, for both files) and it didn't work. Link to comment Share on other sites More sharing options...
Nine Posted December 13, 2023 Share Posted December 13, 2023 Well it works for me using your file... “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Screen Scraping Multi-Threading Made Easy Link to comment Share on other sites More sharing options...
mikell Posted December 13, 2023 Share Posted December 13, 2023 Regex too but simpler way, works regardless of the newline #include <Array.au3> $file = @ScriptDir & '\The.Girl-UTF.srt' $a = StringRegExp(FileRead($file), '(?s)(\N.+?(?=\R{2}|$))', 3) _ArrayDisplay($a) $sFile = @ScriptDir & '\The.Girl2-ANSI.srt' $a = StringRegExp(FileRead($sFile),'(?s)(\N.+?(?=\R{2}|$))', 3) _ArrayDisplay($a) Link to comment Share on other sites More sharing options...
queensoft Posted December 13, 2023 Author Share Posted December 13, 2023 I will check both, thank you. Link to comment Share on other sites More sharing options...
AspirinJunkie Posted December 14, 2023 Share Posted December 14, 2023 Even simpler solution with regex: #include <Array.au3> $sString = FileRead("The.Girl-UTF.srt") $aSplit = _regex_split($sString, "\R{2,}") _ArrayDisplay($aSplit) ; #FUNCTION# ==================================================================================================================== ; Name ..........: _regex_split ; Description ...: separates a string of separators described by a regular expression. ; The separators are not part of the result strings. ; Syntax ........: _regex_split($sString, $sSplitPattern) ; Parameters ....: $sString - string, which should be split ; $sSplitPattern - regular expression which defines the separation pattern. ; $dFlag - Flag for StringSplit() ; Return values .: 1D-Array containing the substrings split based on the separators. ; Author ........: AspirinJunkie ; Modified ......: 2023-03-23 ; Example .......: $aSplit = _regex_split("Hello:world:foo://:bar", ':(?!\/\/)') ; =============================================================================================================================== Func _regex_split($sString, Const $sSplitPattern, Const $dFlag = 3) Return StringSplit( _ StringRegExpReplace($sString, $sSplitPattern, Chr(0)), _ Chr(0), $dFlag) EndFunc robertocm 1 Link to comment Share on other sites More sharing options...
queensoft Posted December 14, 2023 Author Share Posted December 14, 2023 On 12/13/2023 at 5:16 PM, Nine said: If you prefer StringSplit then try this : #include <Constants.au3> #include <Array.au3> Local $aList = StringSplit(FileRead("The.Girl2-ANSI.srt"), @LF & @LF, $STR_ENTIRESPLIT) _ArrayDisplay($aList) Well, it looks like this one is also working great! I must have missed the @LF & @LF combo. Link to comment Share on other sites More sharing options...
queensoft Posted December 14, 2023 Author Share Posted December 14, 2023 23 hours ago, mikell said: Regex too but simpler way, works regardless of the newline #include <Array.au3> $file = @ScriptDir & '\The.Girl-UTF.srt' $a = StringRegExp(FileRead($file), '(?s)(\N.+?(?=\R{2}|$))', 3) _ArrayDisplay($a) $sFile = @ScriptDir & '\The.Girl2-ANSI.srt' $a = StringRegExp(FileRead($sFile),'(?s)(\N.+?(?=\R{2}|$))', 3) _ArrayDisplay($a) Also working OK. Just a little note: files need to be opened as ANSI / UTF. Link to comment Share on other sites More sharing options...
queensoft Posted December 14, 2023 Author Share Posted December 14, 2023 6 hours ago, AspirinJunkie said: Even simpler solution with regex: #include <Array.au3> $sString = FileRead("The.Girl-UTF.srt") $aSplit = _regex_split($sString, "\R{2,}") _ArrayDisplay($aSplit) ; #FUNCTION# ==================================================================================================================== ; Name ..........: _regex_split ; Description ...: separates a string of separators described by a regular expression. ; The separators are not part of the result strings. ; Syntax ........: _regex_split($sString, $sSplitPattern) ; Parameters ....: $sString - string, which should be split ; $sSplitPattern - regular expression which defines the separation pattern. ; $dFlag - Flag for StringSplit() ; Return values .: 1D-Array containing the substrings split based on the separators. ; Author ........: AspirinJunkie ; Modified ......: 2023-03-23 ; Example .......: $aSplit = _regex_split("Hello:world:foo://:bar", ':(?!\/\/)') ; =============================================================================================================================== Func _regex_split($sString, Const $sSplitPattern, Const $dFlag = 3) Return StringSplit( _ StringRegExpReplace($sString, $sSplitPattern, Chr(0)), _ Chr(0), $dFlag) EndFunc Working OK, same FileOpen thing ANSI/UTF. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now