Jump to content

Fast delete multiple lines from text file


stamat
 Share

Recommended Posts

I have to automate going through a text file and deleting quite a lot of lines.

I use _FileReadToArray() to read the file into array. But deleting hundreds of elements from the array one by one using _ArrayDelete() takes long. Do you know any more time efficient methods?

Link to comment
Share on other sites

Welcome to AutoIt and the forum!

How large in megabytes is your file?

Edited by water

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

thanks water!

the file is a couple of MBs. but over 30k in lines.

Since I usually have to delete a range of lines (array elements) e.g. from 2000 to 3000 I thought I may use an array split function and then concat the new arrays. But there is no such array split function in AutoIt. :( Any suggestions?

Link to comment
Share on other sites

You delete the lines and then write the array back to disk?

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

This reads the file into an array, ignores some records and writes the rest to an output file.

Global $aInput[1]
_FileReadToArray("C:tempInput.txt")
$hOutput = FileOpen("C:tempOutput.txt", 1)
For $iIndex = 1 to $aInput[0]
  If $aInput[$iIndex] <> "..." Then FileWrite($hOutput, $aInput[$iIndex])
Next
FileClose($hOutput)

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

This also uses a loop to go through each individual index. This is slow. I need a function which splits an array. E.g.

$avArray[0] = "JPM"
$avArray[1] = "Holger"
$avArray[2] = "Jon"
$avArray[3] = "Larry"
$avArray[4] = "Jeremy"
$avArray[5] = "Valik"
$avArray[6] = "Cyberslug"
$avArray[7] = "Nutster"
$avArray[8] = "JdeB"
$avArray[9] = "Tylo"

Let's say I need to get rid of lines 3 through 7. I could split the array at positions 3 and 7. Then concatenate the first and last arrays. This will be faster than iterating through all the elements. Is this possible? Or is there a better way? Thanks.

Link to comment
Share on other sites

Looping through an array is quite fast. The following example fills an array with 30000 elements and checks each element in less then 1/2 second.

What's "slow" is reading and writing a file.

Global $aArray[30000]
Global $iTimer = TimerInit()
For $i = 0 To UBound($aArray) - 1
    $aArray[$i] = Random(1,100000, 1)
Next
ConsoleWrite(TimerDiff($iTimer) & @LF)
Global $bFlag
$iTimer = TimerInit()
ConsoleWrite(UBound($aArray) - 1 & @LF)
For $i = 0 To UBound($aArray) - 1
    If $aArray[$i] > 1000 Then $bFlag = True
Next
ConsoleWrite(TimerDiff($iTimer) & @LF)

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

BTW: How often do you need to process the file (every 5 minutes, daily, once)? And how fast do you need it?

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

True, looping through an array is fast, but the _ArrayDelete() function is slow. What I need is an "array splice" function. Thanks for the help, water. I have a 5 post per day limit so I won't be able to answer till tomorrow. :(

Link to comment
Share on other sites

stamat,

Melba23 has lifted your 5 posts limit for the first 24 hours so we can go on discussing your problem.

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

$sText = ''
For $i = 1 To 20
    $sText &= $i & ' Line' & @CRLF
Next
; MsgBox(0, 'Сообщение', $sText)
$iStartingLine = 10
$iEndingLine = 15

$iStartingLine -= 1
; $sText = FileRead(@ScriptDir&'file.txt')
$iPos1 = StringInStr($sText, @CRLF, 1, $iStartingLine)
$iPos2 = StringInStr($sText, @CRLF, 1, $iEndingLine - $iStartingLine, $iPos1 + 1)
; MsgBox(0, 'Сообщение', $iPos1 &@CRLF& $iPos2)
$sText = StringLeft($sText, $iPos1) & StringTrimLeft($sText, $iPos2)
MsgBox(0, 'Сообщение', $sText)

Edited by AZJIO
Link to comment
Share on other sites

I modified my example from post #6 so it "deletes" records 3 to 7:

Global $aInput[1]
_FileReadToArray("C:tempInput.txt")
$hOutput = FileOpen("C:tempOutput.txt", 1)
For $iIndex = 1 to $aInput[0]
    If $iIndex < 3 Or $iIndex > 7 Then FileWrite($hOutput, $aInput[$iIndex])
Next
FileClose($hOutput)

BTW:

As you can see my example does not call _ArrayDelete so it should be quite fast.

Edited by water

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

$sText = ''
For $i = 1 To 20
    $sText &= $i & ' Line' & @CRLF
Next
; $sText ='1 Line'
; MsgBox(0, 'Preview', $sText)
_StringDelete($sText, 2, 18)
MsgBox(0, 'After', $sText)
; MsgBox(0, 'After, @error=' & @error, $sText)

Func _StringDelete(ByRef $sText, $iStart, $iEnd)
    If $iStart > $iEnd Then
        Local $tmp = $iStart
        $iStart = $iEnd
        $iEnd = $tmp
    EndIf
    Local $iPosStart, $iPosEnd
    $iStart -= 1
    If $iStart < 1 Then
        $iPosStart = 0
        $iStart = 0
    Else
        $iPosStart = StringInStr($sText, @CRLF, 1, $iStart)
        If Not $iPosStart Then Return
    EndIf
    $iPosEnd = StringInStr($sText, @CRLF, 1, $iEnd - $iStart, $iPosStart + 1)
    If $iPosEnd Then
        $sText = StringLeft($sText, $iPosStart) & StringTrimLeft($sText, $iPosEnd)
    Else
        $sText = StringLeft($sText, $iPosStart)
    EndIf
EndFunc

Edited by AZJIO
Link to comment
Share on other sites

water, I tried your code again. And it is fast - 60k rows written in 1.3 sec. So instead of deleting array elements and writing the array back to disk, I will write each line individually using the file write stream. Thanks!! I will try to complete the code asap but I'm sure this solves my problem.

AZJIO, your code works with strings and not arrays. So to use it I will have to convert the array to string which will make it hard for me to work with. Thanks for the input.

Link to comment
Share on other sites

Glad to hear it's working for you :D

That's perfect to end the day. Now it's time for bed!

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

Read the array necessary? Read how the string far faster.

You can even so, but checking that the index did not exceed the size of the array

Global $aInput[1]
_FileReadToArray("C:tempInput.txt")
$hOutput = FileOpen("C:tempOutput.txt", 1)
For $iIndex = 1 to 3
    FileWrite($hOutput, $aInput[$iIndex])
Next
For $iIndex = 7 to $aInput[0]
    FileWrite($hOutput, $aInput[$iIndex])
Next
FileClose($hOutput)
Edited by AZJIO
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...