Jump to content

StringReplace next only from the previous position onwards


leuce
 Share

Recommended Posts

Hello everyone

I'm using StringReplace to perform a series of replacements in a file.  Each replacement only needs to be done (or attempted) once, since each found text occurs only once in the file... and the next replacement will always come AFTER the previous replacement in the file.  It is a very, very large file, though.  Currently, StringReplace evaluates the entire file each time.  Is there a way to tell StringReplace to  search for the next found text only starting from the position where the previous replacement was made?

My code is (inside a For...Next loop):

$sdlfileread = StringReplace ($sdlfileread, $columnsplit[1], $columnsplit[2])

...in which $columnsplit represents an array of replacements that have been read from a separate tab delimited text file.  This array is a set of numbers of increasing values, so larger values are lower down in the file and thus lower down on the replacement list as well.

I wrote this script for small files and few replacements, but I'm now trying it on a larger file, and I discover that this is not the ideal way of doing it.

Thanks

Samuel

Added: Edited to make example simpler.

Edited by leuce
Link to comment
Share on other sites

  • leuce changed the title to StringReplace next only from the previous position onwards

You could trim the left part of your string that doesn't need any more replacements but probably even better would be to use StringRegExpReplace(). Give us a sample of your data and info about data that needs to be replaced and we might find a regex pattern that works optimal for your case.

When the words fail... music speaks.

Link to comment
Share on other sites

Thanks, Andreik, for the tip about using regex replace.  I tested it but the speed is the same if I use a single regular expression for each replacement. 

I'm not sure if it's possible to do multiple replacements inside a single regex.  For example, if I want to replace "{cat}" with "{dog}", replace "{apple}" with "{pear}" and replace "{bike}" with "{car}", can I perform that replacement with a single line of regular expression?

You asked for sample data, but I'm not sure if that'll help.  Better you give me an example with cat, dog, apple etc.  🙂 See attached. 

So, currently, I split the replacements.txt file by line and then by tab, and then use a For...Next loop to make those find/replacements in the XML file.  If there is a way to put those six items into a single regular expression ... wow. 

Samuel

before after.jpg

replacements.txt

Link to comment
Share on other sites

15 minutes ago, leuce said:

I tested it but the speed is the same if I use a single regular expression for each replacement. 

If it's indeed a large text I doubt that the speed it's the same. If these id's are in some sort of order it might be done in a single call else a different approach is needed. Also can you post the text above as plain text not as a picture?

When the words fail... music speaks.

Link to comment
Share on other sites

As requested (attached).

Granted, my test was small: I made only 6 replacements in a 45 MB XML file and it took 25 seconds whether I use StringReplace or StringRegExpReplace:

$sdlfileread = StringReplace ($sdlfileread, $columnsplit[1], $columnsplit[2])
$sdlfileread = StringRegExpReplace ($sdlfileread, $columnsplit[1], $columnsplit[2])

The IDs in the replacements.txt file are in order from small to large, and they occur in the XML file in that same order, but each ID is paired with an attribute that is different for each ID.  So e.g. ID 1106 must be replaced with something that says 1106 and "nbs", and ID 1180 must be replaced with something that says 1180 and "quo", etc.

Anyway, if the cat/dog example mentioned previously is not possible using regex, then it looks like I'm going to have to experiment with StringMid etc., splitting the file into hundreds of little chunks and then merging it all together in the end. 🙂 

Samuel

input example.txt

Link to comment
Share on other sites

I was thinking of something like this:

; Replacements map
Local $mReplace[]
$mReplace['1104'] = 'nbs'
$mReplace['1105'] = 'quo'
$mReplace['1106'] = 'apo'
$mReplace['1107'] = 'nbs'
$mReplace['1108'] = 'nbs'

$sData = '<trans-unit translate="no" id="bf7c95b8-58a9-4352-988c-7679b8942d49"><source><x id="1104"/><x id="1105"/>' & _
'</source></trans-unit><trans-unit id="47e1477a-98e9-4759-997f-a4e722e63ce5"><source>The quick brown fox<x id="1106"/>' & _
'ABCDE</source><seg-source><mrk mtype="seg" mid="180">Jumps over the lazy dog<x id="1106"/>FGHIJ</mrk></seg-source>' & _
'<target><mrk mtype="seg" mid="180"/></target><sdl:seg-defs><sdl:seg id="180"/></sdl:seg-defs></trans-unit>' & _
'<trans-unit translate="no" id="aa1f3715-586f-437b-8d7b-d2324b27fc8d"><source><x id="1107"/><x id="1108"/></source></trans-unit>'

Local $sPattern, $aRegEx
Local $iOffset = 1
Local $aKeys = MapKeys($mReplace)

For $vKey In $aKeys
    $sPattern = '<x id="' & $vKey & '"'
    $aRegEx = StringRegExp($sData, $sPattern, 1, $iOffset)
    If Not @error Then
        $iOffset = @extended
        $sData = StringRegExpReplace($sData, '^(.{' & $iOffset - StringLen($sPattern) - 1 & '})' & $sPattern & '(.*?)',  '$1<x id="' & $vKey & '" ctype="' & $mReplace[$vKey] & '"$2')
    EndIf
Next

MsgBox(0, '', $sData)

This is more optimized than a simple string replace because with each replace the offset it's increased so there will be no more parsing of that part of the string where replaces already occurs. The replace is also fast because it jumps at the exact position where the next replace will be done. Your replacement rules are vaguely described so this it's just a proof of concept and have some requirements like the IDs to be in order and without duplicates but a more refined code might be written.

When the words fail... music speaks.

Link to comment
Share on other sites

14 hours ago, leuce said:

can I perform that replacement with a single line of regular expression?

Maybe this...

$sd = ObjCreate("Scripting.Dictionary")
$sd.add("1104", "nbs1")
$sd.add("1105", "nbs2")
$sd.add("1106", "nbs3")
$sd.add("1107", "nbs4")

$str = '<trans-unit translate="no" id="bf7c95b8-58a9-4352-988c-7679b8942d49"><source><x id="1103"/><x id="1104"/>' & _
'</source></trans-unit><trans-unit id="47e1477a-98e9-4759-997f-a4e722e63ce5"><source>The quick brown fox<x id="1105"/>' & _
'ABCDE</source><seg-source><mrk mtype="seg" mid="180">Jumps over the lazy dog<x id="1106"/>FGHIJ</mrk></seg-source>' & _
'<target><mrk mtype="seg" mid="180"/></target><sdl:seg-defs><sdl:seg id="180"/></sdl:seg-defs></trans-unit>' & _
'<trans-unit translate="no" id="aa1f3715-586f-437b-8d7b-d2324b27fc8d"><source><x id="1107"/><x id="1108"/></source></trans-unit>'

$r = Execute("'" & StringRegExpReplace($str, "(<x id=""(\d{4})"")/>", _ 
            "' & ($sd.exists('$2') ? '$1 ctype=""' & $sd.item('$2') & '""/>' & $sd.remove('$2') : '$1/>') & '") & "'")

Msgbox(0,"", $r)

Each replacement will be done only once

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...