Jump to content

Array contains elements with Chr(0) after StringRegExp


Recommended Posts

I have a problem, which, at first glance, seems quite trivial. But maybe today is one of those days, when you have a plank in front of your head :mad2:.

#include <Array.au3>
Local $aLat, $aLon, $aLatLon
Local $XMLString = _
        "<lat>10.0</lat>" & @CRLF & _
        "<lon>1.10</lon>" & @CRLF & _
        "<lat>20.0</lat>" & @CRLF & _
        "<lon>2.20</lon>" & @CRLF & _
        "<lat></lat>" & @CRLF & _
        "<lon></lon>" & @CRLF & _
        "<lat>40.0</lat>" & @CRLF & _
        "<lon>4.40</lon>"

$aLat    = StringRegExp($XMLString, '<lat>(.+)<\/lat>', 3)
$aLon    = StringRegExp($XMLString, '<lon>(.+)<\/lon>', 3)
$aLatLon = StringRegExp($XMLString, '<lat>(.+)<\/lat>|<lon>(.+)<\/lon>', 3)
_ArrayDisplay($aLat, '$aLat')
_ArrayDisplay($aLon, '$aLon')
_ArrayDisplay($aLatLon, '$aLatLon')

The first two arrays ($aLat and $aLon) show the expected result. The third regex ($aLatLon) with | returns 'empty matches' (more precisely Chr(0) )  :

LatLon.png.02cd88ea3b51b50543d6f6dfaca2ab88.png

A check on https://regex101.com/r/CXR2oL/1 indicates no problems. By the way : The backshlashes <\/ are probably unnecessary, but are required by RegEx101.

Certainly someone is smarter than I am today (shouldn't be too hard :lol:).

Thanks in advance.

Musashi-C64.png

"In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move."

Link to comment
Share on other sites

Here's one of many ways that it could be done:

#include <Array.au3>

Local $aLat, $aLon, $aLatLon
Local $XMLString = _
        "<lat>10.0</lat>" & @CRLF & _
        "<lon>1.10</lon>" & @CRLF & _
        "<lat>20.0</lat>" & @CRLF & _
        "<lon>2.20</lon>" & @CRLF & _
        "<lat></lat>" & @CRLF & _
        "<lon></lon>" & @CRLF & _
        "<lat>40.0</lat>" & @CRLF & _
        "<lon>4.40</lon>"

$aLat    = StringRegExp($XMLString, '<lat>([^<]*)'        , 3)
$aLon    = StringRegExp($XMLString, '<lon>([^<]*)'        , 3)
$aLatLon = StringRegExp($XMLString, '<(?:lat|lon)>([^<]*)', 3)

_ArrayDisplay($aLat,    '$aLat')
_ArrayDisplay($aLon,    '$aLon')
_ArrayDisplay($aLatLon, '$aLatLon')

 

Edited by TheXman
Removed unnecessary matching criteria from my first example
Link to comment
Share on other sites

21 minutes ago, TheXman said:

Here's one of many ways that it could be done:

Yes, thanks for your effort, this is without doubt a feasible way (of many, as usual with RegEx) ;)!
It is also possible to write the following :

$aLatLon = StringRegExp($XMLString, '<(?:lat|lon)>([^<]+)</(?:lat|lon)>', 3)

BTW : You use * (0 or more) which also matches empty values like <lat></lat>. Small adjustment : + (1 or more) would be more suitable in this particular case.

I'm still a bit curious why my in the first posting mentioned variant doesn't work, but that's another story.

Musashi-C64.png

"In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move."

Link to comment
Share on other sites

2 minutes ago, Musashi said:

BTW : You use * (0 or more) which also matches empty values like <lat></lat>. Small adjustment : + (1 or more) would be more suitable in this particular case.

Yes, I only used an * because I didn't know exactly what you wanted in your result set, all lat/lon or just ones with values.

Link to comment
Share on other sites

44 minutes ago, Musashi said:

I'm still a bit curious why my in the first posting mentioned variant doesn't work, but that's another story.

I know why but I'm not sure I can explain it in a way that others can understand.  :)

You had 2 capture groups, let's call them G1 & G2.  In general, when the regex engine finds a match, it will stop processing the rest of that match.  So when it found a "lat" it put that value in the array and stopped processing.  When it was on a "lon" line, It looked for G1 but didn't find it, so it put a blank in the array and then it went on to look for that alternate match ("lon") on that line and put it in the array.  So as you can see below, every "lon" line had an empty G1 entry.

I hope I was able to explain it in a way that was understandable.  :sweating:

 

Regex:  <lat>(.+)<\/lat>|<lon>(.+)<\/lon>


                G1      G2
<lat>10.0</lat> 10.0    stopped
<lon>1.10</lon> blank       1.10
<lat>20.0</lat> 20.0    stopped
<lon>2.20</lon> blank       2.20
<lat>40.0</lat> 40.0    stopped
<lon>4.40</lon> blank       4.40

 

Edited by TheXman
Link to comment
Share on other sites

30 minutes ago, UEZ said:

What about

Thanks, this variant obviously works too.

11 minutes ago, TheXman said:

I hope I was able to explain in a way that was understandable.  :sweating:

Yeah, it' s understandable to me. Now that I read it, it's relatively obvious :).

Musashi-C64.png

"In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move."

Link to comment
Share on other sites

1 hour ago, TheXman said:
Regex:  <lat>(.+)<\/lat>|<lon>(.+)<\/lon>


                G1      G2
<lat>10.0</lat> 10.0    stopped
<lon>1.10</lon> blank       1.10
<lat>20.0</lat> 20.0    stopped
<lon>2.20</lon> blank       2.20
<lat>40.0</lat> 40.0    stopped
<lon>4.40</lon> blank       4.40

Oh yes, that is because two groups are used, was my first thought, but this behavior can not be prevented. Now I'm a little disappointed by regex101.com ... because everything looked perfect there.

Well, because of this problem and your good explanation, especially thanks to your great table, I am now a big step ahead.

Big thanks!

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...