Jump to content

RegExp, nested look ahead


Recommended Posts

The data I have comes from an XML file, and I want to match data using a regular expression. This working fine, but I am wondering if it possible to nest the look-ahead. Reduced to basic, I want to match the "Text" in the following example:

<Node id="Name"><SubNode>Text</SubNode></Node>

But only, if it

  • is between the tags <SubNode></SubNode>
  • and <SubNode> / </SubNode> are between the tags <Node id="Name"> and </Node>

I now that I can first match the nodes and within the resulting array I can match for the SubNode text; but i would be nice if this could be done with a single regular expression

#include <Array.au3>

$_lv_sgData = ''
$_lv_sgData &= '<NodeName id="firstId">' & @CRLF
$_lv_sgData &= '  <SubNode>Node Text 1</SubNode>' & @CRLF
$_lv_sgData &= '  <SubNode>Node Text 2</SubNode>' & @CRLF
$_lv_sgData &= '  <OtherSubNode>Node Text 10</OtherSubNode>' & @CRLF
$_lv_sgData &= '  <SubNode>Node Text 3</SubNode>' & @CRLF
$_lv_sgData &= '</NodeName>' & @CRLF
$_lv_sgData &= '...' & @CRLF
$_lv_sgData &= '<IgnoreNode>' & @CRLF
$_lv_sgData &= '  <OtherSubNode>Node Text 10</OtherSubNode>' & @CRLF
$_lv_sgData &= '  <SubNode>Node Text 4</SubNode>' & @CRLF
$_lv_sgData &= '  <SubNode>Node Text 5</SubNode>' & @CRLF
$_lv_sgData &= '  <OtherSubNode>Node Text 10</OtherSubNode>' & @CRLF
$_lv_sgData &= '  <SubNode>Node Text 6</SubNode>' & @CRLF
$_lv_sgData &= '</IgnoreNode>' & @CRLF
$_lv_sgData &= '...' & @CRLF
$_lv_sgData &= '<NodeName id="secondId">' & @CRLF
$_lv_sgData &= '  ...' & @CRLF
$_lv_sgData &= '  <SubNode>Node Text 7</SubNode>' & @CRLF
$_lv_sgData &= '  <SubNode>Node Text 8</SubNode>' & @CRLF
$_lv_sgData &= '  ...' & @CRLF
$_lv_sgData &= '  <SubNode>Node Text 9</SubNode>' & @CRLF
$_lv_sgData &= '  <OtherSubNode>Node Text 10</OtherSubNode>' & @CRLF
$_lv_sgData &= '</NodeName>' & @CRLF
$_lv_sgData &= '...' & @CRLF
$_lv_sgData &= '<NodeName id="thirdId">' & @CRLF
$_lv_sgData &= '  <OtherSubNode>Node Text 10</OtherSubNode>' & @CRLF
$_lv_sgData &= '  <OtherSubNode>Node Text 11</OtherSubNode>' & @CRLF
$_lv_sgData &= '  <OtherSubNode>Node Text 12</OtherSubNode>' & @CRLF
$_lv_sgData &= '</NodeName>' & @CRLF

$_lv_arRslt = StringRegExp($_lv_sgData, '(?sm)<NodeName id="([^=]+)">(.*?)(?=</NodeName>)', 3)

_ArrayDisplay($_lv_arRslt)

 

Link to comment
Share on other sites

How about an array of arrays ?

Local $aArray = StringRegExp($_lv_sgData, '(?sm)<NodeName id="(.+?)">(.*?)</NodeName>', 3)
Local $aFinal[UBound($aArray) / 2][2]
For $i = 0 to UBound($aArray) - 1 Step 2
  $aFinal[$i/2][0] = $aArray[$i]
  $aFinal[$i/2][1] = StringRegExp($aArray[$i+1], '(?sm)<SubNode>(.*?)<\/SubNode>', 3)
Next
_ArrayDisplay($aFinal)
_ArrayDisplay($aFinal[0][1])

ps.  maybe you should look at Microsoft.XMLDOM COM object to perform such a thing

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...