Split string with StringRegExpReplace


Hi everybody,

I am new here and I have a little question for a simple problem:

I would like to quickly access a specific token, without using stringsplit()

with this code, I can access to first token [AA]

$sTOKENFULL = "[AA]__(BB)__{CC}__#DD#"
$sTOKEN1 = StringRegExpReplace($sTOKENFULL, '__.*', '')
MsgBox(64, "", $sTOKEN1)

But how can i get the second, third, fourth token?

I hope you understand what I mean.

Thank you all in advance

$sTOKENFULL = "[AA]__(BB)__{CC}__#DD#"
$aTOKENs = StringRegExp($sTOKENFULL, '(.+?)(?:__|$)', 3)

But why not StrngSplit?

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

6 minutes ago, jchd said:

But why not StringSplit ?

I would also prefer StringSplit :

#include <Array.au3>
#include <StringConstants.au3>

Global $g_sTokenFull  = '[AA]__(BB)__{CC}__#DD#'
Global $g_sTokenDelim = '__'
Global $g_aTokenArr   = StringSplit ($g_sTokenFull, $g_sTokenDelim, $STR_ENTIRESPLIT)

; Display results :
_ArrayDisplay($g_aTokenArr, 'Token Array')

ConsoleWrite('> Token 1 = ' & $g_aTokenArr[1] & @CRLF)
ConsoleWrite('> Token 2 = ' & $g_aTokenArr[2] & @CRLF)
ConsoleWrite('> Token 3 = ' & $g_aTokenArr[3] & @CRLF)
ConsoleWrite('> Token 4 = ' & $g_aTokenArr[4] & @CRLF)


"In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move."

Jeez, my attempt at regexp generator is definitely a pile of sh*t: it really takes some much more advanced approach than it currently uses to produce simple and fast patterns.  Infering a suitable pattern from given {subject, expected result} requires too much of high level logic, close to something the current hype calls "AI".  I'm giving up this idea.

I didn't take the time to think at that pattern by myself.

4 hours ago, atvaxn said:

without using stringsplit()

Regular expressions are very powerful but can also be a burden, if you don't use them regularly. As you can see, the solutions of @Marc and @jchd already differs from each other.

When you are new to this topic, you maybe better avoid it, otherwise you will not be able to understand or extend your own code three months later. Don't use regular expressions just as an end in itself or because they appear to be cool. It's not wrong to work with string operations when they serve their purpose.

"Just my 2 cents" ;)


"In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move."

hey everybody,

wow, what a great forum. Thank you for so many answers :)
All of them work really great.

Still, is there any way to make it an oneliner?

for example:

MsgBox(64, "", StringRegExpReplace("[AA]__(BB)__{CC}__#DD#", '__.*', ''))

This compact oneliner displays "[AA]" with just one command.
Now I want it to display the second token "(BB)" or the third... and so on.

I actually use StringSplit a lot and love it too, but now I want to understand, if it is also possible to "regex" it without splitting it in an array.
Just for knowledge purpose :)

Thank all

the last snippet does the job I want.

Unfortunately, it counts the tokens from the right to the left.

; shows [AA]__(BB)__{CC}
MsgBox(64, "", StringRegExpReplace("[AA]__(BB)__{CC}__#DD#", '__[^__]*$', ''))

; shows [AA]__(BB)
MsgBox(64, "", StringRegExpReplace(StringRegExpReplace("[AA]__(BB)__{CC}__#DD#", '__[^__]*$', ''), '__[^__]*$', ''))

; shows the THIRD token, from right to left (BB)
MsgBox(64, "", StringRegExpReplace(StringRegExpReplace(StringRegExpReplace("[AA]__(BB)__{CC}__#DD#", '__[^__]*$', ''), '__[^__]*$', ''), '.*__', ''))

what pattern do I have to use, to make it count from left to right?

MsgBox ($MB_SYSTEMMODAL,"",_ArrayToString (StringRegExp("[AA]__(BB)__{CC}__#DD#", '[^_]+',3),", ")) ; show all 4
MsgBox ($MB_SYSTEMMODAL,"",StringRegExp("[AA]__(BB)__{CC}__#DD#", '[^_]+',3)[2]) ; show third


Edited by Nine
Just list the delimiter inside [^ ] like this :

MsgBox ($MB_SYSTEMMODAL,"",_ArrayToString (StringRegExp("[AA]__(BB)||{CC};;#DD#", '[^_|;]+',3),", ")) ; show all 4
MsgBox ($MB_SYSTEMMODAL,"",StringRegExp("[AA]__(BB)||{CC};;#DD#", '[^_|;]+',3)[2]) ; show third


thank you, thats also useful to know :)

but I actually mean, how to use an exact set of delimiters:

For example:
String: "[AA]_(BB)__{CC}___#DD#"
Delimiter: "___" (exact three underscores)
Token1: should be "[AA]_(BB)__{CC}"
Token2: should be "#DD#"

It must be something like [^__{3}]+ but it doesnt work

Use jchd solution :

MsgBox ($MB_SYSTEMMODAL,"",_ArrayToString (StringRegExp("[AA]__(BB)||{CC};;#DD#", '(.+?)(?:$|_{2}|;{2}|\|{2})',3),", ")) ; show all 4
MsgBox ($MB_SYSTEMMODAL,"",_ArrayToString (StringRegExp("[AA]_(BB)__{CC}___#DD#", '(.+?)(?:$|_{3})',3),", ")) ; show all)


thanks @Nine

It works perfectly. I am really happy, that my code works like a charm now :)
This is really great forum. I'm sure, it wasn't my last question :D

1 hour ago, mikell said:

For the fun :>

but helpfull fun :) it somehow also helps me understand Regex a little better. Thanks

Just a fair warning

Accessing arrays like this is fine for fixed data it is nice and concise but bad for anything dynamic /  supplied from outside the script.


MsgBox ($MB_SYSTEMMODAL,"",StringRegExp("[AA]__(BB)||{CC};;#DD#", '[^_|;]+',3)[2]) ; show third


When the string is not found its going to give you an array error and very little context

Local $aItem = StringRegExp("[AA]__(BB)||{CC};;#DD#", '[^_|;]+',3)
if Not @error then
    MsgBox ($MB_SYSTEMMODAL,"",$aItem[2]) ; show third
    ConsoleWrite("Item not found in foo" & @crlf)
;Local $aItem = StringRegExp("[AA]__(BB)||{CC};;#DD#", '[^_|;]+',3)
;if UBound($aItem) >= 3 then
;    MsgBox ($MB_SYSTEMMODAL,"",$aItem[2]) ; show third
;   ConsoleWrite("Item not found in foo" & @crlf)

Here we can catch the error,  give a detailed message, try again, etc.

Edited by Bilgus
i guess this is the opposite of the srer, but all the others looked weird

$str = "[AA]__(BB)_______{CC}__#DD#"
$n = 3

msgbox(0, '' , stringregexp($str , "([A-Z]+)" , 3)[$n -1])


