GimK Posted August 7, 2015 Share Posted August 7, 2015 (edited) This is my first post, so first of all hello everyone !I have already looked a little bit everywhere to find an answer to my question, but if I missed it please redirect me So, actually I try to organize a text file into an array by comparing it to another. One of my files is a messy document full of text, spaces, tabulations, that I got from copying a form. The other file is a list of every title of the form.To make it clear, here is an example :File 1 :Name:Antony Lastname : Kob Age 15 height :1.95 Hobbies football, tennis, autoitFile 2 : Name Lastname Age Height HobbiesThis is of course way more simple that what I have, but the principle is here. In the end, I would like an array with all the content of the file 1 organised like that :Name Antony Lastname Kob Age 15 Height 1.95 Hobbies football tennis autoitHow can I do that ? Thanks !EDIT: I forgot to say that sometimes the title is composed of multiple words, like "Owned By :" for example, and the following text can be empty. Edited August 7, 2015 by GimK Link to comment Share on other sites More sharing options...
GimK Posted August 7, 2015 Author Share Posted August 7, 2015 (edited) Up !I managed to do a part of the code actually.expandcollapse popup#include <MsgBoxConstants.au3> #include <StringConstants.au3> #include <AutoItConstants.au3> #include <FileConstants.au3> #include <Array.au3> #include <File.au3> HotKeySet("{END}", "Terminate") Local $formTitlesPath = @ScriptDir & "\FormTitles.txt" Local $formTitles Local $all Local $current If NOT (_FileReadToArray($formTitlesPath, $formTitles)) Then fileMsgBox(@error, "FormTitles.txt") Terminate() EndIf Local $testFile = FileOpen(@ScriptDir & "\test.txt") If ($testFile == -1) Then MsgBox(0, "Oops, there's an error", "Can't open test file") Terminate() EndIf $all = FileRead($testFile) Local $charPos Local $finalSize = 2*UBound($formTitles) Local $finalArray[$finalSize] While 1 For $i = 1 To UBound($formTitles)-1 $current = $formTitles[$i] $finalArray[2*$i] = $current $charPos = StringinStr($all, $current) + StringLen($current) $finalArray[(2*$i)+1] = $charPos Next _ArrayDisplay($finalArray) WEnd Terminate() Func Terminate() Exit EndFunc ;==>Terminate ;File opening error function Func fileMsgBox($error, $file) MsgBox(0, "Oops, there's an error type " & $error, "Can't open the '" & $file & "' file.") EndFuncBut this should only create an duplicate of the $formTitle array with spaces between each, and, I believe, the starting position of what is between each title.However, regarding to the result, the position seem wrong. And I can't figure out how to catch what is in there.. Edited August 7, 2015 by GimK Link to comment Share on other sites More sharing options...
mikell Posted August 7, 2015 Share Posted August 7, 2015 Just a try#Include <Array.au3> $txt = " Owned By : Name:Antony Lastname : Kob" & @crlf & _ "Age 15 height :1.95 Hobbies football, tennis, autoit" $ref = "Owned By|Name|Lastname|Age|Height|Hobbies" $txt = StringReplace(StringStripWS($txt, 3), @crlf, @TAB) $txt1 = StringRegExpReplace($txt, '(?i)(?<!^|\w)(?=' & $ref & ')|(?<=' & $ref & ')\h*:?', @crlf) ; Msgbox(0,"1", $txt1) $res = StringSplit($txt1, @crlf, 3) Local $array[UBound($res)/2][2] For $i = 0 to UBound($res)-1 step 2 $array[$i/2][0] = $res[$i] $array[$i/2][1] = StringStripWS($res[$i+1], 3) Next _ArrayDisplay($array) Link to comment Share on other sites More sharing options...
GimK Posted August 7, 2015 Author Share Posted August 7, 2015 Hi ! Thank you for the answer.Sorry I'm pretty new with AutoIt, so I don't understand everything. Could you explain roughly what you do ? Even with the function reference of StringRegExp, I don't really understand your pattern. Following either..Thanks for your help ! Link to comment Share on other sites More sharing options...
water Posted August 7, 2015 Share Posted August 7, 2015 Regular Expressions aren't easy to understand until you work with them on a daily basis. That's at least my impression. GimK 1 My UDFs and Tutorials: Spoiler UDFs: Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki Standard UDFs: Excel - Example Scripts - Wiki Word - Wiki Tutorials: ADO - Wiki WebDriver - Wiki Link to comment Share on other sites More sharing options...
mikell Posted August 7, 2015 Share Posted August 7, 2015 (edited) water, your impression is sooo correct GimK,The first String* funcs are easy to understandExplanations for the StringRegExpReplace :'(?i)(?<!^|\w)(?=' & $ref & ')|(?<=' & $ref & ')\h*:?' (?i) : case insensitive (?<! ) : negative lookbehind, means 'not preceded by' ^|\w : beginning of string OR a word char (?=' & $ref & ') : positive lookahead, means 'followed by' (by the content of the $ref variable) | : or (alternation) (?<=' & $ref & ') : positive lookbehind, means 'preceded by' (by the content of the $ref variable) \h*:? : 0 or more horizontal whitespace + an optional colon $ref = "Owned By|Name|Lastname|Age|Height|Hobbies" : This string contains the subpattern with the keywords alternation It means ("Owned By" OR "Name" OR "Lastname" ... etc )So in usual language this regex says :" Find- positions (not preceded by the beginning of string OR by a word char) ; because "name" must not match in "Lastname" and (followed by a keyword)or- some horizontal spaces (or none) with a colon (or not) preceded by a keywordAnd replace them by a @crlf " Edited August 7, 2015 by mikell GimK 1 Link to comment Share on other sites More sharing options...
GimK Posted August 10, 2015 Author Share Posted August 10, 2015 Alright, thanks !I think I understood, the rest is clear now ! (Sorry for the delay, couldn't work on it this week end.)water you should be right, because this looks a little bit like Brainfuck for me at the moment Link to comment Share on other sites More sharing options...
GimK Posted August 10, 2015 Author Share Posted August 10, 2015 If NOT (_FileReadToArray($formTitlesPath, $formTitles)) Then fileMsgBox(@error, "FormTitles.txt") Terminate() EndIf Local $testFile = FileOpen(@ScriptDir & "\test.txt") If ($testFile == -1) Then MsgBox(0, "Oops, there's an error", "Can't open test file") Terminate() EndIf $all = FileRead($testFile) Local $fAll Local $res $formTitles = _ArrayToString($formTitles, "|") $all = StringReplace(StringStripWS($all, 3), @crlf, @TAB) $fAll = StringRegExpReplace($all, '(?i)(?<!^|\w)(?=' & $formTitles & ')|(?<=' & $formTitles & ')\h*:?', @crlf) Msgbox(0,"1", $formTitles) MsgBox(0,"1", $fAll) $res = StringSplit($fAll, @crlf, 3) _ArrayDisplay($res) Local $array[UBound($res)/2][2] For $i = 0 to UBound($res)-1 step 2 $array[$i/2][0] = $res[$i] $array[$i/2][1] = StringStripWS($res[$i+1], 3) Next _ArrayDisplay($array) Terminate()Well, I still have an issue. The list of titles seems okay, as well as the $fAll string (= $text1). But the $res array have only his fist column filled, with all titles and answers without any WS. And I guess that is why I got an "Array variable has incorrect number of subscripts or subscript dimension range exceeded : $array[$i/2][0] = $res[$i]^ ERROR"I don't see where it's coming from ? Link to comment Share on other sites More sharing options...
mikell Posted August 10, 2015 Share Posted August 10, 2015 Probably because you have not disabled the count return in element 0 when using _FileReadToArrayIf NOT (_FileReadToArray($formTitlesPath, $formTitles, $FRTA_NOCOUNT)) Then Link to comment Share on other sites More sharing options...
GimK Posted August 10, 2015 Author Share Posted August 10, 2015 (edited) I still have the same error..I changed this line $res = StringSplit($fAll, @crlf, 3)in this$res = StringSplit($fAll, @TAB, 3)And I have now a readable array in $res, even if there is a lot of blank lines, and the same dimension error with $array..But I have to admit I don't understand what is happening, since this$all = StringReplace(StringStripWS($all, 3), @crlf, @TAB) $fAll = StringRegExpReplace($all, '(?i)(?<!^|\w)(?=' & $formTitles & ')|(?<=' & $formTitles & ')\h*:?', @crlf)should put @crlf between each, and not @TAB, right ? Excepted if the StringRegExpReplace() doesn't work right Edited August 10, 2015 by GimK Link to comment Share on other sites More sharing options...
mikell Posted August 10, 2015 Share Posted August 10, 2015 Hum regex need accuracyThe pattern in post #3 was intended to work on your sample text 'File1' in post #1So if you are currently using a different text, could you please post the exact copy of the current content of "test.txt" ?BTW the regex uses @crlf as a delimiter for the output, so if one or more @crlf already exist in the original text it must be removed first (reason why I replaced it by a tab) GimK 1 Link to comment Share on other sites More sharing options...
GimK Posted August 10, 2015 Author Share Posted August 10, 2015 Alright, here is the text file, and the titles attached.Sorry I didn't post it before because I thought there would be a general solution to the problem !Thank you a lot for your time test.txt FormTitles.txt Link to comment Share on other sites More sharing options...
mikell Posted August 10, 2015 Share Posted August 10, 2015 (edited) OMGI dreaded something like thisWhere does this text come from ? a web page ? if so there is certainly a better / easier / more reliable way to goEditOK the problem was in the file "FormTitles.txt" with some titles containing either special characters or typosPlease use the one below, as is, and this codeexpandcollapse popup#include <Array.au3> #include <File.au3> Local $formTitlesPath = @ScriptDir & "\FormTitles.txt" Local $formTitles If NOT (_FileReadToArray($formTitlesPath, $formTitles)) Then MsgBox(@error, "FormTitles.txt") Terminate() EndIf Local $titles For $i = 1 to $formTitles[0] $titles &= "\Q" & $formTitles[$i] & "\E|" Next $titles = StringTrimRight($titles, 1) ; Msgbox(0,"1", $titles) Local $testFile = FileOpen(@ScriptDir & "\test.txt") If ($testFile == -1) Then MsgBox(0, "Oops, there's an error", "Can't open test file") Terminate() EndIf $all = FileRead($testFile) $all = StringReplace(StringStripWS($all, 3), @crlf, @TAB) $fAll = StringRegExpReplace($all, '(?i)(?<!^|\w)(?=' & $titles & ')|(?<=' & $titles & ')\h*:?', @crlf) ; MsgBox(0,"1", $fAll) $res = StringSplit($fAll, @crlf, 3) ; _ArrayDisplay($res) Local $array[UBound($res)/2][2] For $i = 0 to UBound($res)-1 step 2 $array[$i/2][0] = $res[$i] $array[$i/2][1] = StringStripWS($res[$i+1], 3) Next _ArrayDisplay($array) Func Terminate() Exit EndFunc ;==>TerminateFormTitles.txt Edited August 10, 2015 by mikell GimK 1 Link to comment Share on other sites More sharing options...
GimK Posted August 11, 2015 Author Share Posted August 11, 2015 (edited) Nope, it comes from IBM Notes, a collaboration platform. I looked for COM or any way to gather the data but I didn't succeed..Thanks a lot !The FormTitles.txt you gave me is the same as the one I got, maybe it is the wrong one ? Because I still have the same error as before..EDIT: Oh my bad, I forgot to change the parameters of _FileReadToArray. This is working perfectly ! Thank you a lot, I don't know what I would have done without your help. Edited August 11, 2015 by GimK Link to comment Share on other sites More sharing options...
mikell Posted August 11, 2015 Share Posted August 11, 2015 Glad I could help (© M23)BTW FormTitles.txt looks the same but is not exactly the sameExample : there was a missing space in "Drawing Title : (Match Drawing Title)" and as regex require a perfect accuracy such a typo is enough to make the whole thing fail... GimK 1 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now