Parsing .... again

dickep · November 4, 2009

OK, I have looked and looked on this forum about parsing text. My poor old feeble brain just not get the StringRegExp stuff.

Here goes on my thought.

I want to make a source code cross reference tool (for the Autoit mainly, but want to be able to use other languages). In that I need to read the line of code, parse out everything but commas, arithmetic signs, parens, etc. that is not a word, keyword, variable. Having said that, I am having trouble with the syntax of the StringRegExp. Any further help would be greatly appreciated.

P.S. I would LIKE to make a module to call into to do this per line (can store the results in an array if necessary).

Just can't get it in my head how to use the call.

E

JRowe · November 4, 2009

Write out an example of a single line of input and the output you want.

dickep · November 4, 2009

I have attached a snippet of what I am attempting to do. You will notice that (1) it skips any comment line (whether at the beginning or later in line of code) and (2) it does not show the numbers added.

Hope this will help you understand what I would LIKE to accomplish.

Thanks

middae · November 4, 2009

Well, I dont see a snippet over here, but - I'd just read a string, break the string into an array at CRLF (char 13 and 10) then make a second array of equal length.

Then, itterate through Array1 index by index, and then take that string, run the length, and rebuild it character by character

For $x = 1 to Len($strIN)

if Mid($strIN,$x,1) = [A-A,a-a,0-9] then

$strOut = $strOut + Mid($strIN, $x, 1)

end if

that's kinda pseudo code for ya. Or, you can use Replace functions to remove non-necessary characters. Regex for stripping strings of characters isn't such a good idea in my experience. Too complex.

dickep · November 4, 2009

OK, since I screwed up with my last post, no attachment, I will put it here

***** Snippet to process ******

*******************************

1 #include <Array.au3>

2 $i = 1

3 While 1

4

5 $line = FileReadLine($hFileHandle1)

6 If @error = -1 Then ExitLoop

7 $mLineRead[$i] =$line

8 ;_ArrayDisplay($mLineRead)

9 ;MsgBox(0, "Line read:", $i & ": " & $line)

10 $mLineRead[0] = $i

11 $i= $i +1

12 redim $mLineRead[$i + 1]

13 Wend

14

14 ; now that file has been read in, we need to parse out the stuff!

15

16 _ArrayDisplay($mLineRead)

*********************************

***** Results ******************

*********************************

$i - lines 2,7,11,12

$line - lines 5,7

$hFileHandle1 - lines 5

$mLineRead - lines 7,8,12,16

While - lines 3

If - lines 6

FileReadLine - lines 5

redim - lines 12

_ArrayDisplay - lines 16

ExitLoop - lines 6

@error - lines 6

#include - lines 1

<Array.au3> - lines 1

**************************************

***** END OF SNIPPET ****************

This also discounts the formatting and any blank lines/spaces/tabs.

Thanks again

E

Bowmore · November 4, 2009

You may be interested to know that Tidy.exe when run with the /gd option produces some nice documentation including a xref report at the botton.

This is the output for the code snippet you posted.

========================================================================================================
===  Tidy report for :C:\AutoIt3Data\Scripts\test.au3
========================================================================================================

00001    #Region ;**** Directives created by AutoIt3Wrapper_GUI ****
00002    #Tidy_Parameters=/gd
00003    #EndRegion ;**** Directives created by AutoIt3Wrapper_GUI ****
00004    #include <Array.au3>
00005    $i = 1
00006  +-While 1
00007  |    
00008  |    $line = FileReadLine($hFileHandle1)
00009  v----If @error = -1 Then ExitLoop
00010  |    $mLineRead[$i] = $line
00011  |    ;_ArrayDisplay($mLineRead)
00012  |    ;MsgBox(0, "Line read:", $i & ": " & $line)
00013  |    $mLineRead[0] = $i
00014  |    $i = $i + 1
00015  |    ReDim $mLineRead[$i + 1]
00016  +-WEnd
00017    
00018    ; now that file has been read in, we need to parse out the stuff!
00019    
00020    _ArrayDisplay($mLineRead)

======================
=== xref reports =====
======================

== User functions =================================================================================================
                          Func
Function name             Row     Referenced at Row(s)
========================= ====== ==================================================================================

#### indicates that this specific variable only occurs one time in the script.
---- indicates that this specific variable isn't declared with Dim/Local/Global/Const.

== Variables ======================================================================================================
Variable name             Dim   Used in Row(s)
========================= ===== ===================================================================================
$hFileHandle1             ----- 00008
$i                        ----- 00005 00010 00013 00014 00015
$line                     ----- 00008 00010
$mLineRead                ----- 00010 00013 00015 00020
@error                    ----- 00009

dickep · November 4, 2009

No, I did not know about Tidy.

So, that brings some questions

- how do you learn about Tidy?

- can it also display the functions?

Thanks

However, I still am not understanding StringRegExp. Maybe I could get someone else to explain it better. I did find a "tutorial" on the forum, but it still left me puzzled.

Authenticity · November 5, 2009

Language tokenizer is not a trivial task. You don't need extraordinaire complex regular expressions as you need the element and semantics of the language to fit correctly. For example, rvalues sentences are never on the left of the assignment operator. Another example is that parentheses are expressed right to left and inside outside. There might be many similarities to regular expression's arsenal of spices but for my limited knowledge about tokenizers, RegExp is playing a little bit..., and I might be wrong.

Sign In

Parsing .... again

Recommended Posts

dickep

JRowe

dickep

middae

dickep

Bowmore

dickep

Authenticity

Create an account or sign in to comment

Create an account

Sign in

Browse

AutoIt Resources

Release

Beta