Jump to content

Split a file (string) by empty lines


 Share

Recommended Posts

File content (any subtitle .SRT file):

1
00:00:01,543 --> 00:00:03,212
Din episoadele anterioare...

2
00:00:03,295 --> 00:00:07,299
Am vorbit cu un coleg care
coordoneazπ un program de cercetare

3
00:00:07,382 --> 00:00:11,302
la Universitatea Heidelberg.
║i cred cπ te pot εnscrie.

4
00:00:11,385 --> 00:00:14,180
Fantastisch.
Adicπ "fantastic" εn germanπ.

5
00:00:14,263 --> 00:00:17,017
Ea este nepoata mea, Tonya.
Va locui cu noi.

6
00:00:17,726 --> 00:00:18,936
- Salut, Missy!
- Tonya.

7
00:00:20,186 --> 00:00:22,146
- Ce faci?
- Ies εn ora║ pe ascuns.

8
00:00:22,231 --> 00:00:23,273
Care-i planul?

9
00:00:23,356 --> 00:00:25,733
Am o jumate de pachet de ■igπri
║i o sticlπ de cherry.

10
00:00:25,817 --> 00:00:26,902
Stai sπ mπ εmbrac.

Read file and split by empty line, return array.

$array[0]='1
00:00:01,543 --> 00:00:03,212
Din episoadele anterioare...'

$array[1]='2
00:00:03,295 --> 00:00:07,299
Am vorbit cu un coleg care
coordoneazπ un program de cercetare'

Already tried StringSplit by @crlf & @crlf , various RegExp patterns, none of them worked.

Right now I"m trying this, but I"m sick as a dog and cannot focus:

Read file, get all empty lines

$txt = FileReadToArray("D:\Diverse 2\A87 subtitle video\The.Girl2.srt") ; read file to array
Global $match01[1]=['0']
For $i=0 to UBound($txt)-1  ; parse all line
    if $txt[$i]='' Then ; if line is empty, add index to array of matches
        _ArrayAdd($match01, $i)
    EndIf
Next

After that, I should be able to concatenate lines (0-2, 3-6, ....) and feed to another array - but I'm lost here, something' not working.

Thank you.

Link to comment
Share on other sites

#include <array.au3>
Local $sString =  '1' & @CRLF & _
    '00:00:01,543 --> 00:00:03,212' & @CRLF & _
    'Din episoadele anterioare...' & @CRLF & _
     @CRLF & _
     @CRLF & _
    '2' & @CRLF & _
    '00:00:03,295 --> 00:00:07,299' & @CRLF & _
    'Am vorbit cu un coleg care' & @CRLF & _
    'coordoneazπ un program de cercetare' & @CRLF & _
     @CRLF & _
     @CRLF & _
    '3' & @CRLF & _
    '00:00:07,382 --> 00:00:11,302' & @CRLF & _
    'la Universitatea Heidelberg.' & @CRLF & _
    '║i cred cπ te pot εnscrie.' & @CRLF & _
     @CRLF & _
     @CRLF & _
    '4' & @CRLF & _
    '00:00:11,385 --> 00:00:14,180' & @CRLF & _
    'Fantastisch.' & @CRLF & _
    'Adicπ "fantastic" εn germanπ.' & @CRLF & _
     @CRLF & _
     @CRLF & _
    '5' & @CRLF & _
    '00:00:14,263 --> 00:00:17,017' & @CRLF & _
    'Ea este nepoata mea, Tonya.' & @CRLF & _
    'Va locui cu noi.' & @CRLF & _
     @CRLF & _
     @CRLF & _
    '6' & @CRLF & _
    '00:00:17,726 --> 00:00:18,936' & @CRLF & _
    '- Salut, Missy!' & @CRLF & _
    '- Tonya.' & @CRLF & _
     @CRLF & _
     @CRLF & _
    '7' & @CRLF & _
    '00:00:20,186 --> 00:00:22,146' & @CRLF & _
    '- Ce faci?' & @CRLF & _
    '- Ies εn ora║ pe ascuns.' & @CRLF & _
     @CRLF & _
     @CRLF & _
    '8' & @CRLF & _
    '00:00:22,231 --> 00:00:23,273' & @CRLF & _
    'Care-i planul?' & @CRLF & _
     @CRLF & _
     @CRLF & _
    '9' & @CRLF & _
    '00:00:23,356 --> 00:00:25,733' & @CRLF & _
    'Am o jumate de pachet de ■igπri' & @CRLF & _
    '║i o sticlπ de cherry.' & @CRLF & _
     @CRLF & _
     @CRLF & _
    '10' & @CRLF & _
    '00:00:25,817 --> 00:00:26,902' & @CRLF & _
    'Stai sπ mπ εmbrac.' & @CRLF
    
Local $aAry = StringRegExp($sString,'(?is)\d+\r\n\d{2}:\d{2}:\d{2},\d+\V+.+?(?=\r\n\r\n|$)',3)  
If Not @error Then _ArrayDisplay($aAry)

 

Link to comment
Share on other sites

5 minutes ago, queensoft said:

Your script, with Local string - is working OK.

But using FileRead is NOT working.

Check attached files, start with ANSI (easier, right?) the UTF (but not necessary)

Thanks.

The.Girl2-ANSI.srt 89.26 kB · 0 downloads The.Girl-UTF.srt 92.02 kB · 0 downloads

#include <array.au3>
Local $sFile , $sString , $aAry

$sFile = @ScriptDir & '\The.Girl-UTF.srt'
$sString = _ReadFile($sFile)    
$aAry = StringRegExp($sString,'(?is)\d+\r?\n\d{2}:\d{2}:\d{2},\d+\V+.+?(?=\r?\n\r?\n|$)',3) 
If Not @error Then _ArrayDisplay($aAry , $sFile)

$sFile = @ScriptDir & '\The.Girl2-ANSI.srt'
$sString = _ReadFile($sFile)    
$aAry = StringRegExp($sString,'(?is)\d+\r?\n\d{2}:\d{2}:\d{2},\d+\V+.+?(?=\r?\n\r?\n|$)',3) 
If Not @error Then _ArrayDisplay($aAry , $sFile)

Func _ReadFile($sFile)
    Local $sRet
    Local $hFile = FileOpen($sFile , FileGetEncoding($sFile))
    $sRet = FileRead($hFile)
    FileClose($hFile)
    Return $sRet
EndFunc

 

Link to comment
Share on other sites

Even simpler solution with regex:

#include <Array.au3>

$sString = FileRead("The.Girl-UTF.srt")

$aSplit = _regex_split($sString, "\R{2,}")
_ArrayDisplay($aSplit)


; #FUNCTION# ====================================================================================================================
; Name ..........: _regex_split
; Description ...: separates a string of separators described by a regular expression.
;                  The separators are not part of the result strings.
; Syntax ........: _regex_split($sString,  $sSplitPattern)
; Parameters ....: $sString             - string, which should be split
;                  $sSplitPattern       - regular expression which defines the separation pattern.
;                  $dFlag               - Flag for StringSplit()
; Return values .: 1D-Array containing the substrings split based on the separators.
; Author ........: AspirinJunkie
; Modified ......: 2023-03-23
; Example .......: $aSplit = _regex_split("Hello:world:foo://:bar", ':(?!\/\/)')
; ===============================================================================================================================
Func _regex_split($sString, Const $sSplitPattern, Const $dFlag = 3)
    Return StringSplit( _
        StringRegExpReplace($sString, $sSplitPattern, Chr(0)), _
        Chr(0), $dFlag)
EndFunc

 

Link to comment
Share on other sites

23 hours ago, mikell said:

Regex too but simpler way, works regardless of the newline

#include <Array.au3>

$file = @ScriptDir & '\The.Girl-UTF.srt'
$a = StringRegExp(FileRead($file), '(?s)(\N.+?(?=\R{2}|$))', 3) 
_ArrayDisplay($a)

$sFile = @ScriptDir & '\The.Girl2-ANSI.srt'
$a = StringRegExp(FileRead($sFile),'(?s)(\N.+?(?=\R{2}|$))', 3) 
_ArrayDisplay($a)

 

Also working OK.

Just a little note: files need to be opened as ANSI / UTF.

Link to comment
Share on other sites

6 hours ago, AspirinJunkie said:

Even simpler solution with regex:

#include <Array.au3>

$sString = FileRead("The.Girl-UTF.srt")

$aSplit = _regex_split($sString, "\R{2,}")
_ArrayDisplay($aSplit)


; #FUNCTION# ====================================================================================================================
; Name ..........: _regex_split
; Description ...: separates a string of separators described by a regular expression.
;                  The separators are not part of the result strings.
; Syntax ........: _regex_split($sString,  $sSplitPattern)
; Parameters ....: $sString             - string, which should be split
;                  $sSplitPattern       - regular expression which defines the separation pattern.
;                  $dFlag               - Flag for StringSplit()
; Return values .: 1D-Array containing the substrings split based on the separators.
; Author ........: AspirinJunkie
; Modified ......: 2023-03-23
; Example .......: $aSplit = _regex_split("Hello:world:foo://:bar", ':(?!\/\/)')
; ===============================================================================================================================
Func _regex_split($sString, Const $sSplitPattern, Const $dFlag = 3)
    Return StringSplit( _
        StringRegExpReplace($sString, $sSplitPattern, Chr(0)), _
        Chr(0), $dFlag)
EndFunc

 

Working OK, same FileOpen thing ANSI/UTF.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...