Jump to content

make script faster


andygo
 Share

Recommended Posts

hello,

 

i have a script which can handle huge data.

i feed it with textfiles, it reads each line into array, then compare the lines, do some string operations. this all happens in a for.to.next loop.

 

the script only use 15% cpu of my 8core.amd

 

can i force the script to use more cpuload an therefore being faster?

 

would it make a speed difference to compile it as 64bit exe on 64bit systems?

 

thank you for commemts :)

Link to comment
Share on other sites

the script only use 15% cpu of my 8core.amd

 

1/8 = 12 % 50''

AutoIt is not multi process / multi thread

 

Signature beginning:
Please remember: "AutoIt"..... *  Wondering who uses AutoIt and what it can be used for ? * Forum Rules *
ADO.au3 UDF * POP3.au3 UDF * XML.au3 UDF * IE on Windows 11 * How to ask ChatGPT for AutoIt Codefor other useful stuff click the following button:

Spoiler

Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind. 

My contribution (my own projects): * Debenu Quick PDF Library - UDF * Debenu PDF Viewer SDK - UDF * Acrobat Reader - ActiveX Viewer * UDF for PDFCreator v1.x.x * XZip - UDF * AppCompatFlags UDF * CrowdinAPI UDF * _WinMergeCompare2Files() * _JavaExceptionAdd() * _IsBeta() * Writing DPI Awareness App - workaround * _AutoIt_RequiredVersion() * Chilkatsoft.au3 UDF * TeamViewer.au3 UDF * JavaManagement UDF * VIES over SOAP * WinSCP UDF * GHAPI UDF - modest begining - comunication with GitHub REST APIErrorLog.au3 UDF - A logging Library * Include Dependency Tree (Tool for analyzing script relations) * Show_Macro_Values.au3 *

 

My contribution to others projects or UDF based on  others projects: * _sql.au3 UDF  * POP3.au3 UDF *  RTF Printer - UDF * XML.au3 UDF * ADO.au3 UDF SMTP Mailer UDF * Dual Monitor resolution detection * * 2GUI on Dual Monitor System * _SciLexer.au3 UDF * SciTE - Lexer for console pane

Useful links: * Forum Rules * Forum etiquette *  Forum Information and FAQs * How to post code on the forum * AutoIt Online Documentation * AutoIt Online Beta Documentation * SciTE4AutoIt3 getting started * Convert text blocks to AutoIt code * Games made in Autoit * Programming related sites * Polish AutoIt Tutorial * DllCall Code Generator * 

Wiki: Expand your knowledge - AutoIt Wiki * Collection of User Defined Functions * How to use HelpFile * Good coding practices in AutoIt * 

OpenOffice/LibreOffice/XLS Related: WriterDemo.au3 * XLS/MDB from scratch with ADOX

IE Related:  * How to use IE.au3  UDF with  AutoIt v3.3.14.x * Why isn't Autoit able to click a Javascript Dialog? * Clicking javascript button with no ID * IE document >> save as MHT file * IETab Switcher (by LarsJ ) * HTML Entities * _IEquerySelectorAll() (by uncommon) * IE in TaskSchedulerIE Embedded Control Versioning (use IE9+ and HTML5 in a GUI) * PDF Related:How to get reference to PDF object embeded in IE * IE on Windows 11

I encourage you to read: * Global Vars * Best Coding Practices * Please explain code used in Help file for several File functions * OOP-like approach in AutoIt * UDF-Spec Questions *  EXAMPLE: How To Catch ConsoleWrite() output to a file or to CMD *

I also encourage you to check awesome @trancexx code:  * Create COM objects from modules without any demand on user to register anything. * Another COM object registering stuffOnHungApp handlerAvoid "AutoIt Error" message box in unknown errors  * HTML editor

winhttp.au3 related : * https://www.autoitscript.com/forum/topic/206771-winhttpau3-download-problem-youre-speaking-plain-http-to-an-ssl-enabled-server-port/

"Homo sum; humani nil a me alienum puto" - Publius Terentius Afer
"Program are meant to be read by humans and only incidentally for computers and execute" - Donald Knuth, "The Art of Computer Programming"
:naughty:  :ranting:, be  :) and       \\//_.

Anticipating Errors :  "Any program that accepts data from a user must include code to validate that data before sending it to the data store. You cannot rely on the data store, ...., or even your programming language to notify you of problems. You must check every byte entered by your users, making sure that data is the correct type for its field and that required fields are not empty."

Signature last update: 2023-04-24

Link to comment
Share on other sites

Do not read the file line by line but either as a whole or by reading bigger chunks into an array and then process the array.

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

If you can use CSV files, I have had really good efficiency with this script (built by czardas).

 

; #FUNCTION# ====================================================================================================================
; Name...........: _CSVSplit
; Description ...: Converts a string in CSV format to a two dimensional array (see comments)
; Syntax.........: CSVSplit ( $aArray [, $sDelim ] )
; Parameters ....: $aArray  - The array to convert
;                  $sDelim  - Optional - Delimiter set to comma by default (see 2nd comment)
; Return values .: Success  - Returns a two dimensional array or a one dimensional array (see 1st comment)
;                  Failure  - Sets @error to:
;                 |@error = 1 - First parameter is not a valid string
;                 |@error = 2 - Second parameter is not a valid string
;                 |@error = 3 - Could not find suitable delimiter replacements
; Author ........: czardas
; Comments ......; Returns a one dimensional array if the input string does not contain the delimiter string
;                ; Some CSV formats use semicolon as a delimiter instead of a comma
;                ; Set the second parameter to @TAB To convert to TSV
; ===============================================================================================================================

Func _CSVSplit($string, $sDelim = ",") ; Parses csv string input and returns a one or two dimensional array
    If Not IsString($string) Or $string = "" Then Return SetError(1, 0, 0) ; Invalid string
    If Not IsString($sDelim) Or $sDelim = "" Then Return SetError(2, 0, 0) ; Invalid string

    $string = StringRegExpReplace($string, "[\r\n]+\z", "") ; [Line Added] Remove training breaks
    Local $iOverride = 63743, $asDelim[3] ; $asDelim => replacements for comma, new line and double quote
    For $i = 0 To 2
        $asDelim[$i] = __GetSubstitute($string, $iOverride) ; Choose a suitable substitution character
        If @error Then Return SetError(3, 0, 0) ; String contains too many unsuitable characters
    Next
    $iOverride = 0

    Local $aArray = StringRegExp($string, '\A[^"]+|("+[^"]+)|"+\z', 3) ; Split string using double quotes delim - largest match
    $string = ""

    Local $iBound = UBound($aArray)
    For $i = 0 To $iBound -1
        $iOverride += StringInStr($aArray[$i], '"', 0, -1) ; Increment by the number of adjacent double quotes per element
        If Mod ($iOverride +2, 2) = 0 Then ; Acts as an on/off switch
            $aArray[$i] = StringReplace($aArray[$i], $sDelim, $asDelim[0]) ; Replace comma delimeters
            $aArray[$i] = StringRegExpReplace($aArray[$i], "(\r\n)|[\r\n]", $asDelim[1]) ; Replace new line delimeters
        EndIf
        $aArray[$i] = StringReplace($aArray[$i], '""', $asDelim[2]) ; Replace double quote pairs
        $aArray[$i] = StringReplace($aArray[$i], '"', '') ; Delete enclosing double quotes - not paired
        $aArray[$i] = StringReplace($aArray[$i], $asDelim[2], '"') ; Reintroduce double quote pairs as single characters
        $string &= $aArray[$i] ; Rebuild the string, which includes two different delimiters
    Next
    $iOverride = 0

    $aArray = StringSplit($string, $asDelim[1], 2) ; Split to get rows
    $iBound = UBound($aArray)
    Local $aCSV[$iBound][2], $aTemp
    For $i = 0 To $iBound -1
        $aTemp = StringSplit($aArray[$i], $asDelim[0]) ; Split to get row items
        If Not @error Then
            If $aTemp[0] > $iOverride Then
                $iOverride = $aTemp[0]
                ReDim $aCSV[$iBound][$iOverride] ; Add columns to accomodate more items
            EndIf
        EndIf
        For $j = 1 To $aTemp[0]
            If StringLen($aTemp[$j]) Then
                If Not StringRegExp($aTemp[$j], '[^"]') Then ; Field only contains double quotes
                    $aTemp[$j] = StringTrimLeft($aTemp[$j], 1) ; Delete enclosing double quote single char
                EndIf
                $aCSV[$i][$j -1] = $aTemp[$j] ; Populate each row
            EndIf
        Next
    Next

    If $iOverride > 1 Then
        Return $aCSV ; Multiple Columns
    Else
        For $i = 0 To $iBound -1
            If StringLen($aArray[$i]) And (Not StringRegExp($aArray[$i], '[^"]')) Then ; Only contains double quotes
                $aArray[$i] = StringTrimLeft($aArray[$i], 1) ; Delete enclosing double quote single char
            EndIf
        Next
        Return $aArray ; Single column
    EndIf

EndFunc ;==> _CSVSplit

; #INTERNAL_USE_ONLY# ===========================================================================================================
; Name...........: __GetSubstitute
; Description ...: Searches for a character to be used for substitution, ie one not contained within the input string
; Syntax.........: __GetSubstitute($string, ByRef $iCountdown)
; Parameters ....: $string   - The string of characters to avoid
;                  $iCountdown - The first code point to begin checking
; Return values .: Success   - Returns a suitable substitution character not found within the first parameter
;                  Failure   - Sets @error to 1 => No substitution character available
; Author ........: czardas
; Comments ......; This function is connected to the function _CSVSplit and was not intended for general use
;                  $iCountdown is returned ByRef to avoid selecting the same character on subsequent calls to this function
;                  Initially $iCountown should be passed with a value = 63743
; ===============================================================================================================================

Func __GetSubstitute($string, ByRef $iCountdown)
    If $iCountdown < 57344 Then Return SetError(1, 0, "") ; Out of options
    Local $sTestChar
    For $i = $iCountdown To 57344 Step -1
        $sTestChar = ChrW($i)
        $iCountdown -= 1
        If Not StringInStr($string, $sTestChar) Then
            Return $sTestChar
        EndIf
    Next
    Return SetError(1, 0, "") ; Out of options
EndFunc ;==> __GetSubstitute

You could make it multi-threaded though by compiling the script as an exe and then build a script that does a run command on the exe (I've got a script that pulls the top record from my database then marks it as being processed and does work on the record which I run 2-3 of them at a time using a run command)

Link to comment
Share on other sites

@Jewtus, please don't duplicate code around the forum. Instead be polite and post a link back to where you got the original code from.

Providing a backlink is better than copying, because no doubt @czardas will update that particular function in the near future. Therefore by linking, you guarantee an up to date function for those who come across your post. Unless you will be updating this post as well with every update they make?

UDF List:

 
_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_ArrayFilter/_ArrayReduce_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 22/04/2018

Link to comment
Share on other sites

I have tried to find the original link sometime back to no avail. I'm now in the habit of putting the URL in the header when I copy code for that very reason.

Link to comment
Share on other sites

This?

I found the link this way:

Edited by water

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

ok, i tried 64 compile and process priority, both didnt change the speed.

icleaned my loop to the needed times:

$totalloops = 0
for x = 1 to array[0] - 1
$totalloops += 1
next


for a 3.000 Items array i need (2.999 + 2.998 + 2.997.... + 1) loops and this takes now only 13 seconds.

i can deal with that :)

Link to comment
Share on other sites

Post your code: 13s for 3000 items seems pretty slow.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

#include <File.au3>
#include <StaticConstants.au3>
#include <EditConstants.au3>
#include <ComboConstants.au3>
#include <GuiConstants.au3>
#include <WindowsConstants.au3>
#include <ButtonConstants.au3>
#include <IE.au3>
#include <Timers.au3>
#include <Process.au3>
#Include <Misc.au3>
#include <Constants.au3>
Break(0)
Opt("TrayAutoPause",0)
Opt("GuiOnEventMode", 1)
if msgbox(65, "Hinweis", "Das Tool wird ggf. von einigen Schutzprogrammen negativ erkannt."& @CRLF & @CRLF &"Ich distanziere mich hiermit ausdrücklich von Schadsoftware"& @CRLF &"in jeglicher Form!"& @CRLF & @CRLF &"©2015 andygo") = 2 then exit
global $loopstop = 0, $v = 0, $what, $writecount, $aRecords, $starttime

$pre = GuiCreate("EMM.Tool", 290, 150)
$laden = GUICtrlCreateButton ("Datei laden", 4, 4, 90, 20)
GUICtrlSetOnEvent($laden, "_verarbeiten")
$breakup = GUICtrlCreateCheckbox("STOP", 100, 4, 90, 20)
GUICtrlSetState($breakup, $GUI_DISABLE)
$save = GUICtrlCreateButton ("Datei speichern", 195, 4, 90, 20)
GUICtrlSetState($save, $GUI_DISABLE)
GUICtrlSetOnEvent($save, "_ausgabe")
$inf1 = GUICtrlCreateLabel("", 4, 55, 282, 20, $SS_CENTER)
$inf2 = GUICtrlCreateLabel("", 4, 75, 282, 20, $SS_CENTER)
$label = GUICtrlCreateLabel("Bereit...", 4, 110, 282, 40, $SS_CENTER)
GUICtrlSetFont($label, 16, 500)
GUISetOnEvent($GUI_EVENT_CLOSE, "_Quit2")
GUISetState(@SW_SHOW)

while 1
    Sleep(50)
wend
Func _Quit2()
    if msgbox(68,"EMM.Tool","Wirklich beenden?") = 6 then Exit
EndFunc

func addup($a, $b, $c, $d)
    GUICtrlSetData($label, $what & $v)
    GUICtrlSetData($inf2, "In: "&$aRecords[0] & " Datensätze ** " & "Out: " &$writecount & " Datensätze")
    if GUICtrlRead($breakup) = 1 then $loopstop = 1
endfunc
func addsave($a, $b, $c, $d)
    GUICtrlSetData($label, "Schreibe... " & $writecount)
endfunc

func _verarbeiten()
    Opt("GuiOnEventMode", 0)
    GUICtrlSetState($laden, $GUI_DISABLE)
    GUICtrlSetState($save, $GUI_DISABLE)
    GUICtrlSetState($breakup, $GUI_UNCHECKED)
    GUICtrlSetData($label, "Bereit...")
    GUICtrlSetData($inf1, "")
    GUICtrlSetData($inf2, "")
    $infile = FileOpenDialog("Datei", @ScriptDir & "\", "emm-log (*.*)", 1)
    If @error Then
        ;MsgBox(4096, "", "No File(s) chosen")
    Else
        If Not _FileReadToArray($infile, $aRecords) Then
            MsgBox(4096, "Fehler", "Fehlerhafte Datei")
        Else
            $writecount = 0
            $v = StringSplit($infile, "\")
            GUICtrlSetData($inf1, $infile)
            GUICtrlSetData($inf2, $aRecords[0] & " Datensätze gelesen")
            $v = $aRecords[0]
            $what = "Prüfe... "
            GUICtrlSetState($breakup, $GUI_enABLE)
            global $dRecords[$aRecords[0]+1], $eRecords[$aRecords[0]+1], $i, $x, $y, $date
            _Timer_SetTimer($pre,50,"addup")
            For $x = 1 To $aRecords[0]
                if $loopstop = 1 then exitloop
                $v -= 1
                if StringLen ($aRecords[$x]) > 0 Then
                    $bRecords = StringLeft($aRecords[$x], 59)
                    $bRecords = StringReplace($bRecords, StringMid ( $bRecords, 22, 20), " ")
                    $dRecords[$x] = StringReplace($bRecords, 33, "xxxxxx")
                    $bRecords = StringSplit($aRecords[$x], "   ", 1)
                    $eRecords[$x] = $bRecords[3]
                endif
            Next
            $v = 0
            for $x = 1 to ($aRecords[0] - 1)
                $v += $x
            next
            $what = "Vergleiche... "
            $i = 1
            For $y = 1 To $aRecords[0]-1
                if $loopstop = 1 then ExitLoop
                For $x = $y+1 To $aRecords[0]
                    if $loopstop = 1 then exitloop 2
                    $v -= 1
                    if $eRecords[$y] = $eRecords[$x] and StringLen ($eRecords[$y]) > 0 then
                        $i += 1
                        $eRecords[$x] = ""
                        $date = StringLeft($dRecords[$x], 19)
                    endif
                next
                if StringLen($eRecords[$y]) > 0 then
                    $writecount += 1
                    if $i = 1 then $dRecords[$y] = $dRecords[$y] &"   "&  $i
                    if $i > 1 then $dRecords[$y] = $dRecords[$y] &"   "&  $i &"   "& $date
                endif
                $i = 1
            next
            _Timer_KillAllTimers($pre)
            GUICtrlSetData($inf2, "In: "&$aRecords[0] & " Datensätze ** " & "Out: " &$writecount & " Datensätze")
            GUICtrlSetState($breakup, $GUI_DISABLE)
            if $loopstop = 1 Then
                GUICtrlSetData($label, "Abbruch durch Benutzer")
                GUICtrlSetState($breakup, $GUI_UNCHECKED)
                GUICtrlSetData($inf1, "")
                GUICtrlSetData($inf2, "")
                $loopstop = 0
            Else
                GUICtrlSetData($label, "Fertig :-)")
                GUICtrlSetState($save, $GUI_enABLE)
            endif
        endif
    endif
    GUICtrlSetState($laden, $GUI_enABLE)
    Opt("GuiOnEventMode", 1)
endfunc

func _ausgabe()
    GUICtrlSetState($save, $GUI_DISABLE)
    GUICtrlSetState($laden, $GUI_disABLE)
    Local $file = FileSaveDialog("Datei", @ScriptDir, "emm-log (*.txt)", 2+16)
    If @error Then
        GUICtrlSetData($label, "Fertig :-)")
    Else
        if StringInStr(StringRight($file, 4), ".") = 0 then $file = $file & ".txt"
        FileOpen($file, 2)
        If $file = -1 Then
            GUICtrlSetData($label, "Fertig :-)")
            FileClose($file)
            MsgBox(0, "Fehler", "Speichern nicht möglich.")
        else
            _Timer_SetTimer($pre,50,"addsave")
            GUICtrlSetState($breakup, $GUI_enABLE)
            FileWriteLine($file, "[code]")
            For $x = 1 To $aRecords[0]
                if StringLen($eRecords[$x]) > 0 then
                    FileWriteLine($file, $dRecords[$x])
                    $writecount -= 1
                endif
            next
            FileWriteLine($file, "[/code]")
            FileClose($file)
            _Timer_KillAllTimers($pre)
            GUICtrlSetState($breakup, $GUI_DISABLE)
            if $loopstop = 1 Then
                GUICtrlSetData($label, "Abbruch durch Benutzer")
                GUICtrlSetState($breakup, $GUI_UNCHECKED)
                $loopstop = 0
            Else
                GUICtrlSetData($label, "Datei gespeichert")
            endif
        Endif
    endif
    GUICtrlSetState($save, $GUI_enABLE)
    GUICtrlSetState($laden, $GUI_enABLE)
endfunc

The main loop-inloop is between line 91 and line 108. this is what takes time.

 

here is an example of 5 lines, how a file looks like to feed the script with. you can create a 3.000 lines file from it.

the lines always start with date-time-stamp in always same format. then 3 spaces, then a block of 16 chars, then again 3 spaces.

in each line this is always the same count of chars. now it follows a long long string, this could be different long in each line. it is followed always by 3 spaces and then the word 'written' or 'blocked'

 

so, now what my script does: it counts how many times the long long string exists in a file, so each line will be compared with each. it counts the same lines,

cut them of a little bit, writes the count behind the cutted string and behind the count, the date-time from the last same string.

 

the output file then looks like this:

2015/02/15 04:48:55   8270870102xxxxxx81   1
2015/02/15 04:48:56   8270300102xxxxxx0F   1
2015/02/15 04:49:03   8270870102xxxxxx16   1
2015/02/15 04:49:07   8270870102xxxxxx70   1
2015/02/15 04:49:07   8270200102xxxxxx0F   1
2015/02/15 04:39:57   8270870102xxxxxx81   142   2015/02/20 12:01:20
2015/02/15 04:39:58   8270870102xxxxxx81   144   2015/02/20 11:51:56
2015/02/15 04:39:58   8270310102xxxxxx0F   1
2015/02/15 04:40:05   8270870102xxxxxx81   142   2015/02/20 11:51:57
2015/02/15 04:40:09   8270970102xxxxxx91   77   2015/02/20 11:57:40

 

inexample.txt

Link to comment
Share on other sites

Sorting the data will make counting much faster. About 10 times faster.

#include <Array.au3>

Opt( "MustDeclareVars", 1 )

Example()


Func Example()
  ; 1. Read file
  Local $aArray1 = FileReadToArray( "inexample1.txt" )
  ;_ArrayDisplay( $aArray1 )

  ; 2. Create 2D-array
  Local $iRows = UBound( $aArray1 )
  Local $aArray2[$iRows][4], $aLine
  For $i = 0 To $iRows - 1
    $aLine = StringSplit ( $aArray1[$i], "   ", 3 ) ; 3 = $STR_ENTIRESPLIT + $STR_NOCOUNT
    $aArray2[$i][0] = $aLine[0] ; Col 0 : Time
    $aArray2[$i][1] = $aLine[1] ; Col 1 : 16 chars
    $aArray2[$i][2] = $aLine[2] ; Col 2 : Long string
    $aArray2[$i][3] = $aLine[3] ; Col 3 : blocked/written
  Next
  ;_ArrayDisplay( $aArray2 )

  ; 3. Index based sort of 2D-array, $aArray2 ($aArray2 is not changed)
  Local $tIndex = DllStructCreate( "uint[" & $iRows & "]" )
  Local $pIndex = DllStructGetPtr( $tIndex )
  ; Sort 2D-array by column 2 (long string)
  ; Sort duplicates by column 0 (time)
  Local $aCmps[2][3] = [ _
    [ 2, 1, +1 ], _ ; Col 2: Compared as strings, asc
    [ 0, 1, +1 ] ]  ; Col 0: Compared as strings, asc
  SortArray( $aArray2, $pIndex, $tIndex, $aCmps )
  ; http://www.autoitscript.com/forum/index.php?showtopic=173279
  ; See post 15

  ; 4. Extract rows in sorted order
  Local $aSorted[$iRows][5], $k
  For $i = 0 To $iRows - 1
    $k = DllStructGetData($tIndex,1,$i+1)
    $aSorted[$i][0] = $aArray2[$k][0] ; Col 0 : Time
    $aSorted[$i][1] = $aArray2[$k][1] ; Col 1 : 16 chars
    $aSorted[$i][2] = $aArray2[$k][2] ; Col 2 : Long string
    $aSorted[$i][3] = $aArray2[$k][3] ; Col 3 : blocked/written
    $aSorted[$i][4] = $k              ; Col 4 : Index in $aArray2 (to display results in file order instead of sort order)
  Next
  ;_ArrayDisplay( $aSorted )

  ; 5. Count equal strings in col 2
  Local $aResults[$iRows][5], $iRes = 0, $iCnt
  For $i = 0 To $iRows - 2
    $iCnt = 1
    $aResults[$iRes][0] = $aSorted[$i][0] ; First time
    $aResults[$iRes][1] = StringLeft( $aSorted[$i][2], 10 ) & "xxxxxx" & StringMid ( $aSorted[$i][2], 17, 2 )
    $aResults[$iRes][4] = $aSorted[$i][4] ; File index
    While $i < $iRows - 1 And $aSorted[$i][2] = $aSorted[$i+1][2]
      $i += 1
      $iCnt += 1
    WEnd
    $aResults[$iRes][2] = $iCnt ; Count
    $aResults[$iRes][3] = $aSorted[$i][0] ; Last time
    $iRes += 1
  Next
  ReDim $aResults[$iRes][5]
  ;_ArrayDisplay( $aResults )

  ; If order of results doesn't matter, you can skip step 6.
  ; and create $aResults2 from $aResults wihtout a sorting.

  ; 6. Sort results in file order instead of sort order
  Local $tIndex = DllStructCreate( "uint[" & $iRes & "]" )
  Local $pIndex = DllStructGetPtr( $tIndex )
  ; Sort 2D-array by column 4 (file index)
  Local $aCmps[1][3] = [ _
    [ 4, 0, +1 ] ]  ; Col 4: Compared as numbers, asc
  SortArray( $aResults, $pIndex, $tIndex, $aCmps )

  ; 7. Extract the rows in file order
  Local $aResults2[$iRes], $k
  For $i = 0 To $iRes - 1
    $k = DllStructGetData($tIndex,1,$i+1)
    $aResults2[$i] = $aResults[$k][0] & "   " & _ ; First time
                     $aResults[$k][1] & "   " & _ ; xxxxxx-string
                     StringFormat( "%3d", $aResults[$k][2] ) ; Count
    If $aResults[$k][2] > 1 Then $aResults2[$i] &= "   " & $aResults[$k][3] ; Last time
    ConsoleWrite( $aResults2[$i] & @CRLF )
  Next
  ;_ArrayDisplay( $aResults2 )

  ; 8. Store results
  Local $s = "[code]" & @CRLF
  For $i = 0 To $iRes - 1
    $s &= $aResults2[$i] & @CRLF
  Next
  $s &= "[/code]" & @CRLF
  FileWrite( "inexample2.txt", $s )
EndFunc


Func SortArray( ByRef $aItems, $pIndex, $tIndex, $aCmps )
  Local $iCmps = UBound( $aCmps ), $c, $r, $v[$iCmps]
  Local $lo, $hi, $mi
  For $i = 0 To UBound( $aItems ) - 1
    For $j = 0 To $iCmps - 1
      $v[$j] = $aItems[$i][$aCmps[$j][0]] ; Values
    Next
    $lo = 0
    $hi = $i - 1
    While $lo <= $hi ; Binary search
      $r = 0 ; Compare result (-1,0,1)
      $j = 0 ; Index in $aCmps array
      $mi = Int( ( $lo + $hi ) / 2 )
      While Not $r And $j < $iCmps                    ; This While-loop handles sorting by multiple
        $c = $aCmps[$j][0]   ; Column                 ; columns. Column values of the two rows are
        Switch $aCmps[$j][1] ; Number/string          ; compared until a difference is found.
          Case 0 ; Compare column values as numbers. The following line is an implementation of the spaceship or three-way comparison operator for numbers like StringCompare is for strings.
            $r = ( $v[$j] < $aItems[DllStructGetData($tIndex,1,$mi+1)][$c] ? -1 : $v[$j] = $aItems[DllStructGetData($tIndex,1,$mi+1)][$c] ? 0 : 1 ) * $aCmps[$j][2] ; * $iCmpAsc
          Case 1 ; Compare column values as strings. StringCompare is a spaceship or three-way comparison operator for strings.
            $r = StringCompare( $v[$j], $aItems[DllStructGetData($tIndex,1,$mi+1)][$c] ) * $aCmps[$j][2] ; * $iCmpAsc
        EndSwitch
        $j += 1
      WEnd
      Switch $r
        Case -1
          $hi = $mi - 1
        Case  1
          $lo = $mi + 1
        Case  0 ; Equal
          ExitLoop
      EndSwitch
    WEnd
    If $i > $mi Then _ ; Make space for the row number in index
      DllCall( "kernel32.dll", "none", "RtlMoveMemory", "struct*", $pIndex+($mi+1)*4, "struct*", $pIndex+$mi*4, "ulong_ptr", ($i-$mi)*4 )
    DllStructSetData( $tIndex, 1, $i, $mi+1+($lo=$mi+1) ) ; Insert row number $i at position $mi+1+($lo=$mi+1) in index
  Next
EndFunc

The zip contains inexample1.txt with 3000 lines and results in inexample2.txt and inexample2-a.txt. Code in tst00.au3 and tst00-a.au3. The a-versions are your versions.

Files.7z

Edited by LarsJ
Link to comment
Share on other sites

LarsJ,

Beware you've a little glitch in grabbing the end of the hex part.

Here's a completely different approach, using SQLite:

#include <SQLite.au3>
;~ #include <SQLite.Dll.au3>    ; comment this after installing the SQLite3.dll library

_SQLite_Startup()
Local $hDB = _SQLite_Open()     ; memory DB

_SQLite_Exec($hDB, "begin")
_SQLite_Exec($hDB, _
"CREATE TABLE andygo (" & _
"  Date CHAR, " & _
"  Hex CHAR, " & _
"  Count INTEGER DEFAULT 1, " & _
"  LastDate CHAR DEFAULT '');" & _
"" & _
"CREATE INDEX ixHex ON andygo (Hex);" & _
"" & _
"CREATE TRIGGER trIns " & _
"BEFORE INSERT " & _
"ON andygo " & _
"WHEN exists (select 1 from andygo A where A.hex = new.hex) " & _
"BEGIN " & _
"     update andygo set count = count + 1, lastdate = new.date where hex = new.hex;" & _
"     select raise(ignore);" & _
"END;" _
)

Local $aIn = StringRegExp(FileRead("inexample1.txt"), "(?m)^(.{19})(?:.{22})([^ ]+)", 3)
Local $iRounds = 20 * Int(UBound($aIn) / 20)
Local $iRest = Mod(UBound($aIn), 20)
For $i = 0 To $iRest - 1 Step 2
    _SQLite_Exec($hDB, "insert into andygo (date, hex) values ('" & $aIn[$i] & "','" & $aIn[$i + 1] & "')")
Next
For $i = $iRest To UBound($aIn) - 1 Step 20
    _SQLite_Exec($hDB, "insert into andygo (date, hex) values " & _
        "('" & $aIn[$i     ] & "','" & $aIn[$i +  1] & "')," & _
        "('" & $aIn[$i +  2] & "','" & $aIn[$i +  3] & "')," & _
        "('" & $aIn[$i +  4] & "','" & $aIn[$i +  5] & "')," & _
        "('" & $aIn[$i +  6] & "','" & $aIn[$i +  7] & "')," & _
        "('" & $aIn[$i +  8] & "','" & $aIn[$i +  9] & "')," & _
        "('" & $aIn[$i + 10] & "','" & $aIn[$i + 11] & "')," & _
        "('" & $aIn[$i + 12] & "','" & $aIn[$i + 13] & "')," & _
        "('" & $aIn[$i + 14] & "','" & $aIn[$i + 15] & "')," & _
        "('" & $aIn[$i + 16] & "','" & $aIn[$i + 17] & "')," & _
        "('" & $aIn[$i + 18] & "','" & $aIn[$i + 19] & "')" _
    )
Next
_SQLite_Exec($hDB, "commit")

Local $aOut, $iRows, $iCols
; you may use the order by clause of your choice or remove the clause if you don't need any sort order
_SQLite_GetTable($hDB,  "select Date || '   ' || substr(hex, 1, 10) || 'xxxxxx' || substr(hex, -2, 2) || '   '" & _
                        " || count || '   ' || lastdate from andygo order by count desc, lastdate", $aOut, $iRows, $iCols)
_ArrayDelete($aOut, "0-1")
ConsoleWrite(_ArrayToString($aOut, @LF) & @LF)

_SQLite_Close($hDB)
_SQLite_Shutdown()

Here, we use a "before insert" trigger to check whether the hex part already exists. If it does, we simply increment the count in the existing row and drop the insertion, else we insert the new row with default count = 1. If the row already exists, we also sore the new date in the lastdate column of the existing row.

We also use multiple VALUES per insert statement (here, 20) to speed up insertion. Wrapping the whole insertion block in a transaction also speeds things up.

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

No jchd, you have a little glitch. The two characters after "xxxxxx" are not the two last characters of the long string, but character no. 17 and 18.

No matter what, an SQL-solution is always interesting. And in this case pretty fast too. Too much overhead is often a problem with SQL and small amounts of data.

Link to comment
Share on other sites

Ha, sorry for misinterpreting. Then retrieval goes like this:

_SQLite_GetTable($hDB,  "select Date || '   ' || substr(hex, 1, 10) || 'xxxxxx' || substr(hex, 17, 2) || '   '" & _
                        " || count || '   ' || lastdate from andygo -- order by count desc, lastdate", $aOut, $iRows, $iCols)

SQLite triggers are a bit slow compared to some other engines (e.g. client/servers) and this shows here. It isn't named "lite" without reason. Yet the speed is still acceptable and I liked the idea of only inserting rows having unique hex strings. Another nice feature is that if the use case requires some odd formatting and/or sorting and/or selection within a given date range, then SQL is quite powerful and efficient.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

jchd, I like your idea too: Only inserting rows having unique hex strings.

andygo, If you need even more speed because of large files, there is probably room for more optimization by using jchd's idea. Especially if there is a large number of duplicates. The sorting code is just a standard binary sorting. All rows are sorted. Including duplicates.

Link to comment
Share on other sites

realisitc input size in most cases is ~ 3.000 to 5.000 lines. to test a 25.000 lines file was a pal's idea just to see the scripts result and time. but i like it and no matter if a user dont recognize timediffernce between yours and jchd sql solution, i will test it too :) feedback follows.

Link to comment
Share on other sites

Good evening, here are the results with a 25.000 lines input.

 

LarsJ code needs ~ 10 seconds

jchd sql code needs ~ 4,5 seconds.

:thumbsup:

 

special test with a 156.000 lines file (34mb input): solved in 21 seconds. crazy fast

Edited by andygo
additional information
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...