DCCD Posted January 30, 2015 Posted January 30, 2015 (edited) Hi, i wrote a script that can replace multiple strings in a xml file works fine but so slow! I've used StringReplace ,_ReplaceStringInFile, StringRegExpReplace, all the same very slow,. The number of replacements in the file about 8000 Any help would be greatly appreciated expandcollapse popup#include <File.au3> $path = @ScriptDir & '\xmlfo.xml' $OXML = FileOpen($path, 256) $XML = FileRead($OXML) $term = 'post' $nofr = 1 Local $aArray = StringRegExp($XML, '(?s)<entry[^>]*>.*?</entry>', 3) FileClose($OXML) $XL = $XML If Not @error Then For $i = 0 To UBound($aArray) - 1 ;get data start ;ConsoleWrite ( $aArray[0] &' '&$i& @CRLF) $date = StringRegExp($aArray[$i], '(?i)<published>(.*?)</published>', 3) If @error Then $date = StringRegExp("date err", "(.{33,}?(?:\s)|.+)", 3) ElseIf Not @error Then ;ConsoleWrite($date[0] & ' ' & $i & @CRLF) EndIf $kind = StringRegExp($aArray[$i], '(?i)<category>(.*?)</category>', 3) If @error Then $kind = StringRegExp("kind err", "(.{33,}?(?:\s)|.+)", 3) ;ConsoleWrite ( $kind[0] &' '&$i& @CRLF) ElseIf Not @error Then ;ConsoleWrite ( $kind[0] &' '&$i& @CRLF) EndIf If $kind[0] = $term And Data(getdate($date[0], 'year'), getdate($date[0], 'month')) = True Then _ReplaceStringInFile($path, $aArray[$i], '') If Not @error Then ;MsgBox(16,'',$XL) ConsoleWrite($nofr & ' ' & $i & @CRLF) $nofr = $nofr + 1 EndIf ;FileDelete(@ScriptDir & '\XML_output.xml') ;FileWrite (@ScriptDir & '\XML_output.xml', StringToBinary ( StringReplace($temp, $aArray[$i], "") , 4) ) Else ConsoleWrite ('err0x0'& @CRLF) EndIf Next EndIf Edited January 30, 2015 by DCCD [u][font=Arial Black]M[/font]y Blog, AVSS Parts[/u][font=Arial Black]Else[/font][font=Arial Black]L[/font]ibya Linux Users Group
Moderators SmOke_N Posted January 30, 2015 Moderators Posted January 30, 2015 (edited) Well, you have a huge issue with loading and unloading 100mb's into memory over and over. Every call to _ReplaceStringInFile opens the file twice. So... 2 suggestions I can think of. 1. Ditch _ReplaceStringInFile() and just read the file into memory once, enum each line, keep what you want, remove what you don't (would require a second string to write back to the file, I say a second string because _ArrayDelete ReDims the array every time). 2. Read the file into chunks and repeat step 1. Edit: If this is some type of database script, sqlite would make a lot more sense. Edited January 30, 2015 by SmOke_N Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.
kylomas Posted January 30, 2015 Posted January 30, 2015 DCCD, See jdelaney's sig for working with XML files directly. kylomas Forum Rules Procedure for posting code "I like pigs. Dogs look up to us. Cats look down on us. Pigs treat us as equals." - Sir Winston Churchill
jguinch Posted January 30, 2015 Posted January 30, 2015 This : StringRegExp("date err", "(.{33,}?(?:s)|.+)", 3) and this : StringRegExp("kind err", "(.{33,}?(?:s)|.+)", 3) has not sense... Can you post a sample of your XML file, and explain us what exactly you want to replace by what ? Spoiler Network configuration UDF, _DirGetSizeByExtension, _UninstallList Firefox ConfigurationArray multi-dimensions, Printer Management UDF
DCCD Posted January 30, 2015 Author Posted January 30, 2015 Thank you all, i really appreciate your help and support , actually i'm still stuck in this slow loop I've tried many ways and it isn't working [u][font=Arial Black]M[/font]y Blog, AVSS Parts[/u][font=Arial Black]Else[/font][font=Arial Black]L[/font]ibya Linux Users Group
Moderators SmOke_N Posted January 30, 2015 Moderators Posted January 30, 2015 You're going to have us guess without your code and an example file of what you've tried aren't you ... ? DCCD 1 Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.
DCCD Posted January 30, 2015 Author Posted January 30, 2015 You're going to have us guess without your code and an example file of what you've tried aren't you ... ? @SmOke_N, thanks a lot, I'll try something different, thanks again for all your help [u][font=Arial Black]M[/font]y Blog, AVSS Parts[/u][font=Arial Black]Else[/font][font=Arial Black]L[/font]ibya Linux Users Group
DCCD Posted February 1, 2015 Author Posted February 1, 2015 Well, you have a huge issue with loading and unloading 100mb's into memory over and over. Every call to _ReplaceStringInFile opens the file twice. So... 2 suggestions I can think of. 1. Ditch _ReplaceStringInFile() and just read the file into memory once, enum each line, keep what you want, remove what you don't (would require a second string to write back to the file, I say a second string because _ArrayDelete ReDims the array every time). 2. Read the file into chunks and repeat step 1. Edit: If this is some type of database script, sqlite would make a lot more sense. each text string need to be replaced may contain more than 500 characters/numbers. [u][font=Arial Black]M[/font]y Blog, AVSS Parts[/u][font=Arial Black]Else[/font][font=Arial Black]L[/font]ibya Linux Users Group
Moderators SmOke_N Posted February 1, 2015 Moderators Posted February 1, 2015 (edited) I'm sorry, I don't see the relevance to your statement/reply. Edit: This would speed up your script exponentially. expandcollapse popup#include <File.au3> $path = @ScriptDir & '\xmlfo.xml' $OXML = FileOpen($path, 256) $XML = FileRead($OXML) FileClose($OXML) $term = 'post' $nofr = 1 Local $aArray = StringRegExp($XML, '(?s)<entry[^>]*>.*?</entry>', 3) If Not @error Then For $i = 0 To UBound($aArray) - 1 ;get data start ;ConsoleWrite ( $aArray[0] &' '&$i& @CRLF) $date = StringRegExp($aArray[$i], '(?i)<published>(.*?)</published>', 3) If @error Then $date = StringRegExp("date err", "(.{33,}?(?:\s)|.+)", 3) ElseIf Not @error Then ;ConsoleWrite($date[0] & ' ' & $i & @CRLF) EndIf $kind = StringRegExp($aArray[$i], '(?i)<category>(.*?)</category>', 3) If @error Then $kind = StringRegExp("kind err", "(.{33,}?(?:\s)|.+)", 3) ;ConsoleWrite ( $kind[0] &' '&$i& @CRLF) ElseIf Not @error Then ;ConsoleWrite ( $kind[0] &' '&$i& @CRLF) EndIf If $kind[0] = $term And Data(getdate($date[0], 'year'), getdate($date[0], 'month')) = True Then $XML = StringReplace($XML, $aArray[$i], '') ;_ReplaceStringInFile($path, $aArray[$i], '') If Not @error Then ;MsgBox(16,'',$XL) ConsoleWrite($nofr & ' ' & $i & @CRLF) $nofr = $nofr + 1 EndIf ;FileDelete(@ScriptDir & '\XML_output.xml') ;FileWrite (@ScriptDir & '\XML_output.xml', StringToBinary ( StringReplace($temp, $aArray[$i], "") , 4) ) Else ConsoleWrite ('err0x0'& @CRLF) EndIf Next EndIf Global $ghOpen = FileOpen($path, $FO_UTF8_NOBOM + $FO_OVERWRITE) FileWrite($ghOpen, $XML) FileClose($ghOpen) Here, as suggested before, we are only opening the file, reading the file, and writing to the file 1 time. Your way, it was opening, reading to memory, writing as many times as the loop was long. One thing is different, the FileOpen at the bottom of the script, you never told _ReplaceStringInFile how to write the data back to the file, so it was writing it regularly, I added $FO_UTF8_NOBOM strictly because that's how you opened it before in your code example. So you may want to backup your xml file before using this code (just FYI). Edited February 1, 2015 by SmOke_N DCCD 1 Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.
DCCD Posted February 4, 2015 Author Posted February 4, 2015 I'm sorry, I don't see the relevance to your statement/reply. Edit: This would speed up your script exponentially. expandcollapse popup#include <File.au3> $path = @ScriptDir & '\xmlfo.xml' $OXML = FileOpen($path, 256) $XML = FileRead($OXML) FileClose($OXML) $term = 'post' $nofr = 1 Local $aArray = StringRegExp($XML, '(?s)<entry[^>]*>.*?</entry>', 3) If Not @error Then For $i = 0 To UBound($aArray) - 1 ;get data start ;ConsoleWrite ( $aArray[0] &' '&$i& @CRLF) $date = StringRegExp($aArray[$i], '(?i)<published>(.*?)</published>', 3) If @error Then $date = StringRegExp("date err", "(.{33,}?(?:\s)|.+)", 3) ElseIf Not @error Then ;ConsoleWrite($date[0] & ' ' & $i & @CRLF) EndIf $kind = StringRegExp($aArray[$i], '(?i)<category>(.*?)</category>', 3) If @error Then $kind = StringRegExp("kind err", "(.{33,}?(?:\s)|.+)", 3) ;ConsoleWrite ( $kind[0] &' '&$i& @CRLF) ElseIf Not @error Then ;ConsoleWrite ( $kind[0] &' '&$i& @CRLF) EndIf If $kind[0] = $term And Data(getdate($date[0], 'year'), getdate($date[0], 'month')) = True Then $XML = StringReplace($XML, $aArray[$i], '') ;_ReplaceStringInFile($path, $aArray[$i], '') If Not @error Then ;MsgBox(16,'',$XL) ConsoleWrite($nofr & ' ' & $i & @CRLF) $nofr = $nofr + 1 EndIf ;FileDelete(@ScriptDir & '\XML_output.xml') ;FileWrite (@ScriptDir & '\XML_output.xml', StringToBinary ( StringReplace($temp, $aArray[$i], "") , 4) ) Else ConsoleWrite ('err0x0'& @CRLF) EndIf Next EndIf Global $ghOpen = FileOpen($path, $FO_UTF8_NOBOM + $FO_OVERWRITE) FileWrite($ghOpen, $XML) FileClose($ghOpen) Here, as suggested before, we are only opening the file, reading the file, and writing to the file 1 time. Your way, it was opening, reading to memory, writing as many times as the loop was long. One thing is different, the FileOpen at the bottom of the script, you never told _ReplaceStringInFile how to write the data back to the file, so it was writing it regularly, I added $FO_UTF8_NOBOM strictly because that's how you opened it before in your code example. So you may want to backup your xml file before using this code (just FYI). @SmOke_N, Thank you for all your help and I apologize for the late response [u][font=Arial Black]M[/font]y Blog, AVSS Parts[/u][font=Arial Black]Else[/font][font=Arial Black]L[/font]ibya Linux Users Group
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now