queensoft Posted December 13, 2023 Posted December 13, 2023 File content (any subtitle .SRT file): expandcollapse popup1 00:00:01,543 --> 00:00:03,212 Din episoadele anterioare... 2 00:00:03,295 --> 00:00:07,299 Am vorbit cu un coleg care coordoneazπ un program de cercetare 3 00:00:07,382 --> 00:00:11,302 la Universitatea Heidelberg. ║i cred cπ te pot εnscrie. 4 00:00:11,385 --> 00:00:14,180 Fantastisch. Adicπ "fantastic" εn germanπ. 5 00:00:14,263 --> 00:00:17,017 Ea este nepoata mea, Tonya. Va locui cu noi. 6 00:00:17,726 --> 00:00:18,936 - Salut, Missy! - Tonya. 7 00:00:20,186 --> 00:00:22,146 - Ce faci? - Ies εn ora║ pe ascuns. 8 00:00:22,231 --> 00:00:23,273 Care-i planul? 9 00:00:23,356 --> 00:00:25,733 Am o jumate de pachet de ■igπri ║i o sticlπ de cherry. 10 00:00:25,817 --> 00:00:26,902 Stai sπ mπ εmbrac. Read file and split by empty line, return array. $array[0]='1 00:00:01,543 --> 00:00:03,212 Din episoadele anterioare...' $array[1]='2 00:00:03,295 --> 00:00:07,299 Am vorbit cu un coleg care coordoneazπ un program de cercetare' Already tried StringSplit by @crlf & @crlf , various RegExp patterns, none of them worked. Right now I"m trying this, but I"m sick as a dog and cannot focus: Read file, get all empty lines $txt = FileReadToArray("D:\Diverse 2\A87 subtitle video\The.Girl2.srt") ; read file to array Global $match01[1]=['0'] For $i=0 to UBound($txt)-1 ; parse all line if $txt[$i]='' Then ; if line is empty, add index to array of matches _ArrayAdd($match01, $i) EndIf Next After that, I should be able to concatenate lines (0-2, 3-6, ....) and feed to another array - but I'm lost here, something' not working. Thank you.
AllenAA Posted December 13, 2023 Posted December 13, 2023 expandcollapse popup#include <array.au3> Local $sString = '1' & @CRLF & _ '00:00:01,543 --> 00:00:03,212' & @CRLF & _ 'Din episoadele anterioare...' & @CRLF & _ @CRLF & _ @CRLF & _ '2' & @CRLF & _ '00:00:03,295 --> 00:00:07,299' & @CRLF & _ 'Am vorbit cu un coleg care' & @CRLF & _ 'coordoneazπ un program de cercetare' & @CRLF & _ @CRLF & _ @CRLF & _ '3' & @CRLF & _ '00:00:07,382 --> 00:00:11,302' & @CRLF & _ 'la Universitatea Heidelberg.' & @CRLF & _ '║i cred cπ te pot εnscrie.' & @CRLF & _ @CRLF & _ @CRLF & _ '4' & @CRLF & _ '00:00:11,385 --> 00:00:14,180' & @CRLF & _ 'Fantastisch.' & @CRLF & _ 'Adicπ "fantastic" εn germanπ.' & @CRLF & _ @CRLF & _ @CRLF & _ '5' & @CRLF & _ '00:00:14,263 --> 00:00:17,017' & @CRLF & _ 'Ea este nepoata mea, Tonya.' & @CRLF & _ 'Va locui cu noi.' & @CRLF & _ @CRLF & _ @CRLF & _ '6' & @CRLF & _ '00:00:17,726 --> 00:00:18,936' & @CRLF & _ '- Salut, Missy!' & @CRLF & _ '- Tonya.' & @CRLF & _ @CRLF & _ @CRLF & _ '7' & @CRLF & _ '00:00:20,186 --> 00:00:22,146' & @CRLF & _ '- Ce faci?' & @CRLF & _ '- Ies εn ora║ pe ascuns.' & @CRLF & _ @CRLF & _ @CRLF & _ '8' & @CRLF & _ '00:00:22,231 --> 00:00:23,273' & @CRLF & _ 'Care-i planul?' & @CRLF & _ @CRLF & _ @CRLF & _ '9' & @CRLF & _ '00:00:23,356 --> 00:00:25,733' & @CRLF & _ 'Am o jumate de pachet de ■igπri' & @CRLF & _ '║i o sticlπ de cherry.' & @CRLF & _ @CRLF & _ @CRLF & _ '10' & @CRLF & _ '00:00:25,817 --> 00:00:26,902' & @CRLF & _ 'Stai sπ mπ εmbrac.' & @CRLF Local $aAry = StringRegExp($sString,'(?is)\d+\r\n\d{2}:\d{2}:\d{2},\d+\V+.+?(?=\r\n\r\n|$)',3) If Not @error Then _ArrayDisplay($aAry)
queensoft Posted December 13, 2023 Author Posted December 13, 2023 Your script, with Local string - is working OK. But using FileRead is NOT working. Check attached files, start with ANSI (easier, right?) the UTF (but not necessary) Thanks. The.Girl2-ANSI.srtFetching info... The.Girl-UTF.srtFetching info...
Danp2 Posted December 13, 2023 Posted December 13, 2023 The files use LF instead of CR/LF for the line delimiters. Latest Webdriver UDF Release Webdriver Wiki FAQs
AllenAA Posted December 13, 2023 Posted December 13, 2023 On 12/13/2023 at 2:27 PM, queensoft said: Your script, with Local string - is working OK. But using FileRead is NOT working. Check attached files, start with ANSI (easier, right?) the UTF (but not necessary) Thanks. The.Girl2-ANSI.srt 89.26 kB · 0 downloads The.Girl-UTF.srt 92.02 kB · 0 downloads Expand #include <array.au3> Local $sFile , $sString , $aAry $sFile = @ScriptDir & '\The.Girl-UTF.srt' $sString = _ReadFile($sFile) $aAry = StringRegExp($sString,'(?is)\d+\r?\n\d{2}:\d{2}:\d{2},\d+\V+.+?(?=\r?\n\r?\n|$)',3) If Not @error Then _ArrayDisplay($aAry , $sFile) $sFile = @ScriptDir & '\The.Girl2-ANSI.srt' $sString = _ReadFile($sFile) $aAry = StringRegExp($sString,'(?is)\d+\r?\n\d{2}:\d{2}:\d{2},\d+\V+.+?(?=\r?\n\r?\n|$)',3) If Not @error Then _ArrayDisplay($aAry , $sFile) Func _ReadFile($sFile) Local $sRet Local $hFile = FileOpen($sFile , FileGetEncoding($sFile)) $sRet = FileRead($hFile) FileClose($hFile) Return $sRet EndFunc Danp2 1
queensoft Posted December 13, 2023 Author Posted December 13, 2023 Working GREAT, for both!!! I'm not even gonna try to understand the RegEx pattern!
Nine Posted December 13, 2023 Posted December 13, 2023 If you prefer StringSplit then try this : #include <Constants.au3> #include <Array.au3> Local $aList = StringSplit(FileRead("The.Girl2-ANSI.srt"), @LF & @LF, $STR_ENTIRESPLIT) _ArrayDisplay($aList) “They did not know it was impossible, so they did it” ― Mark Twain Reveal hidden contents Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Debug Messages Monitor UDF Screen Scraping Round Corner GUI UDF Multi-Threading Made Easy
queensoft Posted December 13, 2023 Author Posted December 13, 2023 I think I already did (all variants: cr, lf, crlf, for both files) and it didn't work.
Nine Posted December 13, 2023 Posted December 13, 2023 Well it works for me using your file... “They did not know it was impossible, so they did it” ― Mark Twain Reveal hidden contents Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Debug Messages Monitor UDF Screen Scraping Round Corner GUI UDF Multi-Threading Made Easy
mikell Posted December 13, 2023 Posted December 13, 2023 Regex too but simpler way, works regardless of the newline #include <Array.au3> $file = @ScriptDir & '\The.Girl-UTF.srt' $a = StringRegExp(FileRead($file), '(?s)(\N.+?(?=\R{2}|$))', 3) _ArrayDisplay($a) $sFile = @ScriptDir & '\The.Girl2-ANSI.srt' $a = StringRegExp(FileRead($sFile),'(?s)(\N.+?(?=\R{2}|$))', 3) _ArrayDisplay($a)
AspirinJunkie Posted December 14, 2023 Posted December 14, 2023 Even simpler solution with regex: #include <Array.au3> $sString = FileRead("The.Girl-UTF.srt") $aSplit = _regex_split($sString, "\R{2,}") _ArrayDisplay($aSplit) ; #FUNCTION# ==================================================================================================================== ; Name ..........: _regex_split ; Description ...: separates a string of separators described by a regular expression. ; The separators are not part of the result strings. ; Syntax ........: _regex_split($sString, $sSplitPattern) ; Parameters ....: $sString - string, which should be split ; $sSplitPattern - regular expression which defines the separation pattern. ; $dFlag - Flag for StringSplit() ; Return values .: 1D-Array containing the substrings split based on the separators. ; Author ........: AspirinJunkie ; Modified ......: 2023-03-23 ; Example .......: $aSplit = _regex_split("Hello:world:foo://:bar", ':(?!\/\/)') ; =============================================================================================================================== Func _regex_split($sString, Const $sSplitPattern, Const $dFlag = 3) Return StringSplit( _ StringRegExpReplace($sString, $sSplitPattern, Chr(0)), _ Chr(0), $dFlag) EndFunc robertocm 1
queensoft Posted December 14, 2023 Author Posted December 14, 2023 On 12/13/2023 at 3:16 PM, Nine said: If you prefer StringSplit then try this : #include <Constants.au3> #include <Array.au3> Local $aList = StringSplit(FileRead("The.Girl2-ANSI.srt"), @LF & @LF, $STR_ENTIRESPLIT) _ArrayDisplay($aList) Expand Well, it looks like this one is also working great! I must have missed the @LF & @LF combo.
queensoft Posted December 14, 2023 Author Posted December 14, 2023 On 12/13/2023 at 5:54 PM, mikell said: Regex too but simpler way, works regardless of the newline #include <Array.au3> $file = @ScriptDir & '\The.Girl-UTF.srt' $a = StringRegExp(FileRead($file), '(?s)(\N.+?(?=\R{2}|$))', 3) _ArrayDisplay($a) $sFile = @ScriptDir & '\The.Girl2-ANSI.srt' $a = StringRegExp(FileRead($sFile),'(?s)(\N.+?(?=\R{2}|$))', 3) _ArrayDisplay($a) Expand Also working OK. Just a little note: files need to be opened as ANSI / UTF.
queensoft Posted December 14, 2023 Author Posted December 14, 2023 On 12/14/2023 at 11:07 AM, AspirinJunkie said: Even simpler solution with regex: #include <Array.au3> $sString = FileRead("The.Girl-UTF.srt") $aSplit = _regex_split($sString, "\R{2,}") _ArrayDisplay($aSplit) ; #FUNCTION# ==================================================================================================================== ; Name ..........: _regex_split ; Description ...: separates a string of separators described by a regular expression. ; The separators are not part of the result strings. ; Syntax ........: _regex_split($sString, $sSplitPattern) ; Parameters ....: $sString - string, which should be split ; $sSplitPattern - regular expression which defines the separation pattern. ; $dFlag - Flag for StringSplit() ; Return values .: 1D-Array containing the substrings split based on the separators. ; Author ........: AspirinJunkie ; Modified ......: 2023-03-23 ; Example .......: $aSplit = _regex_split("Hello:world:foo://:bar", ':(?!\/\/)') ; =============================================================================================================================== Func _regex_split($sString, Const $sSplitPattern, Const $dFlag = 3) Return StringSplit( _ StringRegExpReplace($sString, $sSplitPattern, Chr(0)), _ Chr(0), $dFlag) EndFunc Expand Working OK, same FileOpen thing ANSI/UTF.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now