leuce Posted November 9, 2023 Share Posted November 9, 2023 (edited) Hello everyone I'm using StringReplace to perform a series of replacements in a file. Each replacement only needs to be done (or attempted) once, since each found text occurs only once in the file... and the next replacement will always come AFTER the previous replacement in the file. It is a very, very large file, though. Currently, StringReplace evaluates the entire file each time. Is there a way to tell StringReplace to search for the next found text only starting from the position where the previous replacement was made? My code is (inside a For...Next loop): $sdlfileread = StringReplace ($sdlfileread, $columnsplit[1], $columnsplit[2]) ...in which $columnsplit represents an array of replacements that have been read from a separate tab delimited text file. This array is a set of numbers of increasing values, so larger values are lower down in the file and thus lower down on the replacement list as well. I wrote this script for small files and few replacements, but I'm now trying it on a larger file, and I discover that this is not the ideal way of doing it. Thanks Samuel Added: Edited to make example simpler. Edited November 9, 2023 by leuce Link to comment Share on other sites More sharing options...
Andreik Posted November 9, 2023 Share Posted November 9, 2023 You could trim the left part of your string that doesn't need any more replacements but probably even better would be to use StringRegExpReplace(). Give us a sample of your data and info about data that needs to be replaced and we might find a regex pattern that works optimal for your case. When the words fail... music speaks. Link to comment Share on other sites More sharing options...
leuce Posted November 9, 2023 Author Share Posted November 9, 2023 Thanks, Andreik, for the tip about using regex replace. I tested it but the speed is the same if I use a single regular expression for each replacement. I'm not sure if it's possible to do multiple replacements inside a single regex. For example, if I want to replace "{cat}" with "{dog}", replace "{apple}" with "{pear}" and replace "{bike}" with "{car}", can I perform that replacement with a single line of regular expression? You asked for sample data, but I'm not sure if that'll help. Better you give me an example with cat, dog, apple etc. 🙂 See attached. So, currently, I split the replacements.txt file by line and then by tab, and then use a For...Next loop to make those find/replacements in the XML file. If there is a way to put those six items into a single regular expression ... wow. Samuel replacements.txt Link to comment Share on other sites More sharing options...
Andreik Posted November 9, 2023 Share Posted November 9, 2023 15 minutes ago, leuce said: I tested it but the speed is the same if I use a single regular expression for each replacement. If it's indeed a large text I doubt that the speed it's the same. If these id's are in some sort of order it might be done in a single call else a different approach is needed. Also can you post the text above as plain text not as a picture? When the words fail... music speaks. Link to comment Share on other sites More sharing options...
leuce Posted November 9, 2023 Author Share Posted November 9, 2023 As requested (attached). Granted, my test was small: I made only 6 replacements in a 45 MB XML file and it took 25 seconds whether I use StringReplace or StringRegExpReplace: $sdlfileread = StringReplace ($sdlfileread, $columnsplit[1], $columnsplit[2]) $sdlfileread = StringRegExpReplace ($sdlfileread, $columnsplit[1], $columnsplit[2]) The IDs in the replacements.txt file are in order from small to large, and they occur in the XML file in that same order, but each ID is paired with an attribute that is different for each ID. So e.g. ID 1106 must be replaced with something that says 1106 and "nbs", and ID 1180 must be replaced with something that says 1180 and "quo", etc. Anyway, if the cat/dog example mentioned previously is not possible using regex, then it looks like I'm going to have to experiment with StringMid etc., splitting the file into hundreds of little chunks and then merging it all together in the end. 🙂 Samuel input example.txt Link to comment Share on other sites More sharing options...
Andreik Posted November 9, 2023 Share Posted November 9, 2023 I was thinking of something like this: ; Replacements map Local $mReplace[] $mReplace['1104'] = 'nbs' $mReplace['1105'] = 'quo' $mReplace['1106'] = 'apo' $mReplace['1107'] = 'nbs' $mReplace['1108'] = 'nbs' $sData = '<trans-unit translate="no" id="bf7c95b8-58a9-4352-988c-7679b8942d49"><source><x id="1104"/><x id="1105"/>' & _ '</source></trans-unit><trans-unit id="47e1477a-98e9-4759-997f-a4e722e63ce5"><source>The quick brown fox<x id="1106"/>' & _ 'ABCDE</source><seg-source><mrk mtype="seg" mid="180">Jumps over the lazy dog<x id="1106"/>FGHIJ</mrk></seg-source>' & _ '<target><mrk mtype="seg" mid="180"/></target><sdl:seg-defs><sdl:seg id="180"/></sdl:seg-defs></trans-unit>' & _ '<trans-unit translate="no" id="aa1f3715-586f-437b-8d7b-d2324b27fc8d"><source><x id="1107"/><x id="1108"/></source></trans-unit>' Local $sPattern, $aRegEx Local $iOffset = 1 Local $aKeys = MapKeys($mReplace) For $vKey In $aKeys $sPattern = '<x id="' & $vKey & '"' $aRegEx = StringRegExp($sData, $sPattern, 1, $iOffset) If Not @error Then $iOffset = @extended $sData = StringRegExpReplace($sData, '^(.{' & $iOffset - StringLen($sPattern) - 1 & '})' & $sPattern & '(.*?)', '$1<x id="' & $vKey & '" ctype="' & $mReplace[$vKey] & '"$2') EndIf Next MsgBox(0, '', $sData) This is more optimized than a simple string replace because with each replace the offset it's increased so there will be no more parsing of that part of the string where replaces already occurs. The replace is also fast because it jumps at the exact position where the next replace will be done. Your replacement rules are vaguely described so this it's just a proof of concept and have some requirements like the IDs to be in order and without duplicates but a more refined code might be written. leuce 1 When the words fail... music speaks. Link to comment Share on other sites More sharing options...
leuce Posted November 10, 2023 Author Share Posted November 10, 2023 (edited) Thanks for your effort -- I'll have a look. Added: Aaah, I didn't notice "Check @extended for next offset" in the help file for StringRegExp. That's very useful. Edited November 10, 2023 by leuce Link to comment Share on other sites More sharing options...
mikell Posted November 10, 2023 Share Posted November 10, 2023 14 hours ago, leuce said: can I perform that replacement with a single line of regular expression? Maybe this... $sd = ObjCreate("Scripting.Dictionary") $sd.add("1104", "nbs1") $sd.add("1105", "nbs2") $sd.add("1106", "nbs3") $sd.add("1107", "nbs4") $str = '<trans-unit translate="no" id="bf7c95b8-58a9-4352-988c-7679b8942d49"><source><x id="1103"/><x id="1104"/>' & _ '</source></trans-unit><trans-unit id="47e1477a-98e9-4759-997f-a4e722e63ce5"><source>The quick brown fox<x id="1105"/>' & _ 'ABCDE</source><seg-source><mrk mtype="seg" mid="180">Jumps over the lazy dog<x id="1106"/>FGHIJ</mrk></seg-source>' & _ '<target><mrk mtype="seg" mid="180"/></target><sdl:seg-defs><sdl:seg id="180"/></sdl:seg-defs></trans-unit>' & _ '<trans-unit translate="no" id="aa1f3715-586f-437b-8d7b-d2324b27fc8d"><source><x id="1107"/><x id="1108"/></source></trans-unit>' $r = Execute("'" & StringRegExpReplace($str, "(<x id=""(\d{4})"")/>", _ "' & ($sd.exists('$2') ? '$1 ctype=""' & $sd.item('$2') & '""/>' & $sd.remove('$2') : '$1/>') & '") & "'") Msgbox(0,"", $r) Each replacement will be done only once Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now