dickep Posted November 4, 2009 Posted November 4, 2009 OK, I have looked and looked on this forum about parsing text. My poor old feeble brain just not get the StringRegExp stuff. Here goes on my thought. I want to make a source code cross reference tool (for the Autoit mainly, but want to be able to use other languages). In that I need to read the line of code, parse out everything but commas, arithmetic signs, parens, etc. that is not a word, keyword, variable. Having said that, I am having trouble with the syntax of the StringRegExp. Any further help would be greatly appreciated. P.S. I would LIKE to make a module to call into to do this per line (can store the results in an array if necessary). Just can't get it in my head how to use the call. E
JRowe Posted November 4, 2009 Posted November 4, 2009 Write out an example of a single line of input and the output you want. [center]However, like ninjas, cyber warriors operate in silence.AutoIt Chat Engine (+Chatbot) , Link Grammar for AutoIt , Simple Speech RecognitionArtificial Neural Networks UDF , Bayesian Networks UDF , Pattern Matching UDFTransparent PNG GUI Elements , Au3Irrlicht 2Advanced Mouse Events MonitorGrammar Database GeneratorTransitions & Tweening UDFPoker Hand Evaluator[/center]
dickep Posted November 4, 2009 Author Posted November 4, 2009 I have attached a snippet of what I am attempting to do. You will notice that (1) it skips any comment line (whether at the beginning or later in line of code) and (2) it does not show the numbers added. Hope this will help you understand what I would LIKE to accomplish. Thanks
middae Posted November 4, 2009 Posted November 4, 2009 Well, I dont see a snippet over here, but - I'd just read a string, break the string into an array at CRLF (char 13 and 10) then make a second array of equal length. Then, itterate through Array1 index by index, and then take that string, run the length, and rebuild it character by character For $x = 1 to Len($strIN) if Mid($strIN,$x,1) = [A-A,a-a,0-9] then $strOut = $strOut + Mid($strIN, $x, 1) end if next Array2[y] = $strOut that's kinda pseudo code for ya. Or, you can use Replace functions to remove non-necessary characters. Regex for stripping strings of characters isn't such a good idea in my experience. Too complex.
dickep Posted November 4, 2009 Author Posted November 4, 2009 OK, since I screwed up with my last post, no attachment, I will put it here ***** Snippet to process ****** ******************************* 1 #include <Array.au3> 2 $i = 1 3 While 1 4 5 $line = FileReadLine($hFileHandle1) 6 If @error = -1 Then ExitLoop 7 $mLineRead[$i] =$line 8 ;_ArrayDisplay($mLineRead) 9 ;MsgBox(0, "Line read:", $i & ": " & $line) 10 $mLineRead[0] = $i 11 $i= $i +1 12 redim $mLineRead[$i + 1] 13 Wend 14 14 ; now that file has been read in, we need to parse out the stuff! 15 16 _ArrayDisplay($mLineRead) ********************************* ***** Results ****************** ********************************* $i - lines 2,7,11,12 $line - lines 5,7 $hFileHandle1 - lines 5 $mLineRead - lines 7,8,12,16 While - lines 3 If - lines 6 FileReadLine - lines 5 redim - lines 12 _ArrayDisplay - lines 16 ExitLoop - lines 6 @error - lines 6 #include - lines 1 <Array.au3> - lines 1 ************************************** ***** END OF SNIPPET **************** This also discounts the formatting and any blank lines/spaces/tabs. Thanks again E
Bowmore Posted November 4, 2009 Posted November 4, 2009 You may be interested to know that Tidy.exe when run with the /gd option produces some nice documentation including a xref report at the botton. This is the output for the code snippet you posted. expandcollapse popup======================================================================================================== === Tidy report for :C:\AutoIt3Data\Scripts\test.au3 ======================================================================================================== 00001 #Region ;**** Directives created by AutoIt3Wrapper_GUI **** 00002 #Tidy_Parameters=/gd 00003 #EndRegion ;**** Directives created by AutoIt3Wrapper_GUI **** 00004 #include <Array.au3> 00005 $i = 1 00006 +-While 1 00007 | 00008 | $line = FileReadLine($hFileHandle1) 00009 v----If @error = -1 Then ExitLoop 00010 | $mLineRead[$i] = $line 00011 | ;_ArrayDisplay($mLineRead) 00012 | ;MsgBox(0, "Line read:", $i & ": " & $line) 00013 | $mLineRead[0] = $i 00014 | $i = $i + 1 00015 | ReDim $mLineRead[$i + 1] 00016 +-WEnd 00017 00018 ; now that file has been read in, we need to parse out the stuff! 00019 00020 _ArrayDisplay($mLineRead) ====================== === xref reports ===== ====================== == User functions ================================================================================================= Func Function name Row Referenced at Row(s) ========================= ====== ================================================================================== #### indicates that this specific variable only occurs one time in the script. ---- indicates that this specific variable isn't declared with Dim/Local/Global/Const. == Variables ====================================================================================================== Variable name Dim Used in Row(s) ========================= ===== =================================================================================== $hFileHandle1 ----- 00008 $i ----- 00005 00010 00013 00014 00015 $line ----- 00008 00010 $mLineRead ----- 00010 00013 00015 00020 @error ----- 00009 "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to build bigger and better idiots. So far, the universe is winning."- Rick Cook
dickep Posted November 4, 2009 Author Posted November 4, 2009 No, I did not know about Tidy. So, that brings some questions - how do you learn about Tidy? - can it also display the functions? Thanks However, I still am not understanding StringRegExp. Maybe I could get someone else to explain it better. I did find a "tutorial" on the forum, but it still left me puzzled.
Authenticity Posted November 5, 2009 Posted November 5, 2009 Language tokenizer is not a trivial task. You don't need extraordinaire complex regular expressions as you need the element and semantics of the language to fit correctly. For example, rvalues sentences are never on the left of the assignment operator. Another example is that parentheses are expressed right to left and inside outside. There might be many similarities to regular expression's arsenal of spices but for my limited knowledge about tokenizers, RegExp is playing a little bit..., and I might be wrong.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now