tezhihi Posted May 4, 2017 Posted May 4, 2017 I have a file (see attached file) with a string all line and this problem on here is I want to separate all $00:, $03:, $10:, $20:, $25:, $30:, $40:, $45:, $110:, $115:, $120: and $T. It's mean that each $ with value start a new line ( a new paragraph). I tried with Regular Expression in notepad++ ex: Find ($00:, $01:, $03: and so on) with regex (\$)([0-9]+): and replace is \r\n\1\2 (I think \r\n is @CRLF (not sure :() ) Find $T with regex (\$T)(.*?)(\$T) and replace is \1\2\r\n\3 When I try these regex to replace in notepad on StringRegexReplace the results is incorrect . I have read some example simple about regex. Please advise me how to do that with some example on autoit . The result will be in attached photo. Thanks ahihi.txt
Moderators JLogan3o13 Posted May 4, 2017 Moderators Posted May 4, 2017 @tezhihi please wait 24 hours before bumping your thread. All of our forum members assist as they are able, and not all are in the same time zone; the person best able to assist you may not be online right now. "Profanity is the last vestige of the feeble mind. For the man who cannot express himself forcibly through intellect must do so through shock and awe" - Spencer W. Kimball How to get your question answered on this forum!
tezhihi Posted May 4, 2017 Author Posted May 4, 2017 11 minutes ago, JLogan3o13 said: @tezhihi please wait 24 hours before bumping your thread. All of our forum members assist as they are able, and not all are in the same time zone; the person best able to assist you may not be online right now. Sorry about that. Thanks you
mikell Posted May 4, 2017 Posted May 4, 2017 (edited) What about this ? $txt = FileRead("ahihi.txt") $output = StringRegExpReplace($txt, '(?<!^|:)(?=\$(?:00|03|10|20|25|30|40|45|110|115|120|T))', @crlf) FileWrite("output.txt", $output) Meaning : Search positions (not preceded by start of text or a colon) and (followed by $00, $03 and so on) and at these positions insert a crlf Edited May 4, 2017 by mikell tezhihi 1
tezhihi Posted May 5, 2017 Author Posted May 5, 2017 (edited) 10 hours ago, mikell said: What about this ? $txt = FileRead("ahihi.txt") $output = StringRegExpReplace($txt, '(?<!^|:)(?=\$(?:00|03|10|20|25|30|40|45|110|115|120|T))', @crlf) FileWrite("output.txt", $output) Meaning : Search positions (not preceded by start of text or a colon) and (followed by $00, $03 and so on) and at these positions insert a crlf Awesome. I have a more question. I want to link between 2 $ with text below: $F$%$?$%- - - - - - - - - - - - - - - - - -Footnotes- - - - - - - - - - - - - - - - - - $Tn1 Subsequent to oral argument by letter, dated December 16, 2005 Scrushy's attorney submitted to the Court a "Memo" dated 7/19/04 and apparently prepared by a Pamela Anderson. This "Memo" contains unverified, unsworn rank hearsay and the Court has not considered the contents of this document in reaching its decision.$%$?$%- - - - - - - - - - - - - - - - -End Footnotes- - - - - - - - - - - - - - - - -$E$=P1298*3 $F with $Tn1 and Delete $%$?$%- - - - - - - - - - - - - - - - -End Footnotes- - - - - - - - - - - - - - - - - and the result should be $F$Tn1 Subsequent to oral argument by letter, dated December 16, 2005 Scrushy's attorney submitted to the Court a "Memo" dated 7/19/04 and apparently prepared by a Pamela Anderson. This "Memo" contains unverified, unsworn rank hearsay and the Court has not considered the contents of this document in reaching its decision.$E$=P1298*3 Kindly check and advise me. Thanks you Edited May 5, 2017 by tezhihi
mikell Posted May 5, 2017 Posted May 5, 2017 Here it is $str = '$F$%$?$%- - - - - - - - - - - - - - - - - -Footnotes- - - - - - - - - - - - - - - - - -' & @crlf & _ '$Tn1 Subsequent to oral argument by letter, dated December 16, 2005 Scrushy''s attorney submitted to the Court a "Memo" dated 7/19/04 and apparently prepared by a Pamela Anderson. This "Memo" contains unverified, unsworn rank hearsay and the Court has not considered the contents of this document in reaching its decision.$%$?$%- - - - - - - - - - - - - - - - -End' & @crlf & _ ' Footnotes- - - - - - - - - - - - - - - - -$E$=P1298*3' ;msgbox(0,"", $str) $str2 = StringRegExpReplace($str, '\$%\$\?\$%[^\$]+', "") msgbox(0,"", $str2) This removes any non-$ char from "$%$?$%" up to the next $ char Please note that you have to first manage the single quote(s) included in the string tezhihi 1
tezhihi Posted May 5, 2017 Author Posted May 5, 2017 (edited) 58 minutes ago, mikell said: Here it is $str = '$F$%$?$%- - - - - - - - - - - - - - - - - -Footnotes- - - - - - - - - - - - - - - - - -' & @crlf & _ '$Tn1 Subsequent to oral argument by letter, dated December 16, 2005 Scrushy''s attorney submitted to the Court a "Memo" dated 7/19/04 and apparently prepared by a Pamela Anderson. This "Memo" contains unverified, unsworn rank hearsay and the Court has not considered the contents of this document in reaching its decision.$%$?$%- - - - - - - - - - - - - - - - -End' & @crlf & _ ' Footnotes- - - - - - - - - - - - - - - - -$E$=P1298*3' ;msgbox(0,"", $str) $str2 = StringRegExpReplace($str, '\$%\$\?\$%[^\$]+', "") msgbox(0,"", $str2) This removes any non-$ char from "$%$?$%" up to the next $ char Please note that you have to first manage the single quote(s) included in the string Can I search the string "$=<$=T3*1 $=L00180000845001096*001106" on regex 1. "\$=<\$=T([0-9]+)\*([0-9]+)\s\$=L.+?\*(([0-9]+){6})" or 2. "\$=<\$=T[^\$]+\$=L[^\$]+" I want to remove it Edited May 5, 2017 by tezhihi
mikell Posted May 5, 2017 Posted May 5, 2017 I saw that this string exists (with various flavours) several times in the file. To remove them all you might use a more selective expression to limit the risk of errors StringRegExpReplace($str, '\$=<\$=T3\*1\s+\$=L\d+\*\d+', "") tezhihi 1
tezhihi Posted May 5, 2017 Author Posted May 5, 2017 28 minutes ago, mikell said: I saw that this string exists (with various flavours) several times in the file. To remove them all you might use a more selective expression to limit the risk of errors StringRegExpReplace($str, '\$=<\$=T3\*1\s+\$=L\d+\*\d+', "") Oh thanks youuuu so much. He received the following bonuses for the years shown:$M05,06,13,13,13,33$GFOR$JANNUAL$JTARGET$JTOTAL$JDESCRIPTION$GYEAR$JBONUS$JBONUS$JINCENTIVE$G$J$J$JBONUS$D$Q2002$B$ 10,000,000$B$ 1,200,000$Y$ 11,200,000$BAnnual Bonus in and for 2002$Q$B$B$Y$Balso paid in 2002. Net income$Q$B$B$Y$Bwas negative $ (467 million.)$Q$B$B$Y$BApproved by the Compensation$Q$B$B$Y$BCommittee on April 29, 2002.$Q$Q2001$B$ 6,500,000$B$ 2,400,000$Y$ 8,900,000$BNet income for 2001 was negative$Q$B$B$Y$B$ (191 million). Bonuses$Q$B$B$Y$Breported in the proxy filed$Q$B$B$Y$BApril 12, 2002.$Q$Q2000$BNone.$B$ 2,154,849$Y$ 2,154,849$BNet income for 2000 was a$Q$B$B$Y$Bnegative $ (364 million).$Q$Q1999$BNone.$B$ 134,031$Y$ 134, 031$BFinancials fraudulent but no$Q$B$B$Y$Baudited restatement.$Q$Q1998$BNone.$B$ 1,577,829$Y$ 1,577,829$BFinancials fraudulent but no$Q$B$B$Y$Baudited restatement.$Q$Q1997$B$ 10,000,000$B$ 2,400,000$Y$ 12,400,000$BAnnual Bonus was reported in$Q$B$B$Y$Bproxy dated April 17, 1998, as$Q$B$B$Y$Bbeing for and earned in 1997.$Q$Q1996$B$ 8,000,000$B$ 2,400,000$Y$ 10,400,000$BAnnual Bonus was reported in$Q$B$B$Y$Bproxy dated April 9, 1997, as$Q$B$B$Y$Bbeing for, and earned in, 1996.$Q$QTotal$B$ 34,500,000$B$ 12,266,709$Y$ 46,766,709$BBefore Prejudgment Interest$X$=P1298*7 $T5. In each annual proxy on Form 14A, HealthSouth disclosed the following criteria for Incentive Bonuses paid to its executives:$=S$%$?$%Incentive Compensation: In addition to base salary, the $(Compensation$) Committee recommends to the Board of Directors cash incentive compensation for HealthSouth's executives, based on each executive's success in meeting qualitative and quantitative performance goals on an annual basis. Are I correct if i use these code below: StringRegExpReplace(FileRead("ahihi.txt"), '(?=\$(?:M))', @crlf) ;For create a new paragraph with $M (ex: $M05,06,13,13,13,33$GFOR$JA ......) StringRegExpReplace(FileRead("ahihi.txt"), '\$=P\d+\*\d+', '') ;For remove all $=P1298*x StringRegExpReplace(FileRead("ahihi.txt"), '\$=>', '') ;For remove all $=>
mikell Posted May 5, 2017 Posted May 5, 2017 Yes you are. They will all work correctly, some remarks yet : The first one will insert a crlf just before any $M encountered, so it could be done a little simpler StringRegExpReplace(FileRead("ahihi.txt"), '(?=\$M)', @crlf) If the second one is made to remove all "$=P1298*x" precisely then it could be done more selective - which is always a good idea when using regex StringRegExpReplace(FileRead("ahihi.txt"), '\$=P1298\*\d+', '') Nothing to say about the third tezhihi 1
tezhihi Posted May 8, 2017 Author Posted May 8, 2017 (edited) On 5/6/2017 at 1:49 AM, mikell said: Yes you are. They will all work correctly, some remarks yet : The first one will insert a crlf just before any $M encountered, so it could be done a little simpler StringRegExpReplace(FileRead("ahihi.txt"), '(?=\$M)', @crlf) If the second one is made to remove all "$=P1298*x" precisely then it could be done more selective - which is always a good idea when using regex StringRegExpReplace(FileRead("ahihi.txt"), '\$=P1298\*\d+', '') Nothing to say about the third Case 1: After I used the code below: #include <File.au3> $a = FileRead(@ScriptDir & '\ahihi.txt') $b = StringRegExpReplace($a, '(\$%\$\?\$%)(?=\$=B)', '\1' & @CRLF & '\2') $c = StringRegExpReplace($b, '\$%\$\?\$%[^\$]+', "") $d = StringRegExpReplace($c, '(?<!^|:)(?=\$(?:00|01|03|10|20|25|30|40|45|110|115|120|T|F|200|220))', @crlf) $e = StringRegExpReplace($d, '(?=\$M)', @crlf) $f = StringRegExpReplace($e, '(\$F)(\R)(\$Tn)', '\1\3') $g = StringRegExpReplace($f, '\$=<\$=T\d+\*\d+\s+\$=L\d+\*\d+', '') $h = StringRegExpReplace($g, '\$=P\d+\*\d+', '') $i = StringRegExpReplace($h, '(\$E)\$%\$\?\$%(\$=B)', '') $j = StringRegExpReplace($i, '\$I\s+\$U', '$I$U') $k = StringRegExpReplace($j, '\$=>', '') FileWrite("output.txt", $k) Output file has been appeared problem (please see in the red box on image ____ All $T with symbol ' : ' ). I think this problem will be solve when i use code below: StringRegExpReplace(FileRead("ahihi.txt"), '(\:)(\$T)(\$=B)', '\1' & @CRLF & '\2\3') StringRegExpReplace(FileRead("ahihi.txt"), '([a-z]+)(\:)(\$T)', '\1\2' & @CRLF & '\3') Otherwise, please advise me. Case 2: $str = '$T$F$%$?$%- - - - - - - - - - - - - - - - - -Footnotes- - - - - - - - - - - - - - - - - - n1 Liggett has been rewarded handsomely in the settlement with the Attorneys General for its historic cooperation. $%$?$%- - - - - - - - - - - - - - - - -End Footnotes- - - - - - - - - - - - - - - - -$E' ;This is one line not include @CRLF $str2 = StringRegExpReplace($str, '\$%\$\?\$%[^\$]+', "") The results is ' $F$E ' not include text data inside. I will resolve it as code below: $str = '$T$F$%$?$%- - - - - - - - - - - - - - - - - -Footnotes- - - - - - - - - - - - - - - - - - n1 Liggett has been rewarded handsomely in the settlement with the Attorneys General for its historic cooperation. $%$?$%- - - - - - - - - - - - - - - - -End Footnotes- - - - - - - - - - - - - - - - -$E' $str2 = StringRegExpReplace($str, '(\$%\$\?\$%[^\$]+)(n\d+)', '\2') Otherwise, please advise me. Edited May 8, 2017 by tezhihi
mikell Posted May 8, 2017 Posted May 8, 2017 The case 2 could be done like this StringRegExpReplace($str, '\$%\$\?\$%.+?(?=n\d|\$)', "") For case 1 this is possible StringRegExpReplace($str, '(?<=:)(?=\$T\$=B)', @CRLF) But it becomes a little difficult for me as I don't know exactly which "$x" things you want to keep and which ones you want to remove Maybe all this could be done simpler depending on what should be the final result after complete treatment of the initial file tezhihi 1
mikell Posted May 8, 2017 Posted May 8, 2017 @tezhihi 6 hours ago, mikell said: Maybe all this could be done simpler depending on what should be the final result For instance could this be correct ? $s = FileRead(@ScriptDir & '\ahihi.txt') $s = StringRegExpReplace($s, '\R|(\$F)?\$%\$\?\$%[^\$]*', "") ; footnotes $s = StringRegExpReplace($s, '(?<!^|:)(?=\$(?|00|01|03|10|20|25|30|40|45|110|115|120|200|220|T|M))', @crlf) $s = StringRegExpReplace($s, '(\$(?|[DXOENUI\?%]|=[BRIS<>]|=[PLT]\d+\*\d+))+', "") $s = StringRegExpReplace($s, '(\$[MBJGQY])+', " ") $s = StringRegExpReplace($s, '\$T', "") FileWrite("output.txt", $s) tezhihi 1
tezhihi Posted May 8, 2017 Author Posted May 8, 2017 (edited) 13 hours ago, mikell said: @tezhihi For instance could this be correct ? $s = FileRead(@ScriptDir & '\ahihi.txt') $s = StringRegExpReplace($s, '\R|(\$F)?\$%\$\?\$%[^\$]*', "") ; use this code will be deleted some text data $s = StringRegExpReplace($s, '(?<!^|:)(?=\$(?|00|01|03|10|20|25|30|40|45|110|115|120|200|220|T|M))', @crlf) $s = StringRegExpReplace($s, '(\$(?|[DXOENUI\?%]|=[BRIS<>]|=[PLT]\d+\*\d+))+', "") ; use this code will delete all $ with value. Each $ corresponding with font or indent at start paragraph and some others data. $s = StringRegExpReplace($s, '(\$[MBJGQY])+', " ") ; just make a new paragraph start with $M and end with $X. I will modify after replace all done $s = StringRegExpReplace($s, '\$T', "") ; $T is segment required in text file = indent at start paragraph FileWrite("output.txt", $s) @mikell oh no the result of output will be same with the result from output.txt of my code . Please see remark on your code. I will send you the list of data need to delete in text file. Please see new 2.txt I have multiple file like that for modify. I need to make a tool for un string all of them. Please check the correctness of the code below #include <File.au3> $a = FileRead(@ScriptDir & '\ahihi.txt') $a = StringRegExpReplace($a, '(\$%\$\?\$%)(?=\$=B)', '\1' & @CRLF & '\2') $a = StringRegExpReplace($a, '(\$%\$\?\$%[^\$]+Footnotes[^\$]+)(n\d+)', '\2') $a = StringRegExpReplace($a, '\$%\$\?\$%[^\$]+Footnotes[^\$]+', '') $a = StringRegExpReplace($a, '(\$%\$\?\$%[^\$]+)(\$E)', '\2') $a = StringRegExpReplace($a, '(?<!^|:)(?=\$(?:00|01|03|10|20|25|30|40|45|110|115|120|T|F|200|220))', @CRLF) $a = StringRegExpReplace($a, '(?=\$M)', @CRLF) $a = StringRegExpReplace($a, '(?=\$=S)', @CRLF) $a = StringRegExpReplace($a, '(?=\$%\$\?\$%)', @CRLF) $a = StringRegExpReplace($a, '(\$F)(\R)(\$Tn)', '\1\3') $a = StringRegExpReplace($a, '(\$T)(\R)(\$F)', '\3\1') $a = StringRegExpReplace($a, '(\s)(\$E)', '\2') $a = StringRegExpReplace($a, '\$=<\$=T\d+\*\d+\s+\$=L\d+\*\d+', '') $a = StringRegExpReplace($a, '\$=<\$=T\d+\*[^\$]+\$=L\d+\*\d+', '') $a = StringRegExpReplace($a, '\$=P\d+\*\d+', '') $a = StringRegExpReplace($a, '(\$E)\$%\$\?\$%(\$=B)', '') $a = StringRegExpReplace($a, '\$I\s+\$U', '$I$U') $a = StringRegExpReplace($a, '\$=>', '') $a = StringRegExpReplace($a, '(?<=:)(?=\$T\$=B)', @CRLF) $a = StringRegExpReplace($a, '([A-Z][a-z].+\:)(\$T)', '\1' & @CRLF & '\2') $a = StringRegExpReplace($a, '(\$E)(\$=B)', '\1' & @CRLF & '\2') $a = StringRegExpReplace($a, '(\$00:)(.+)', '\1') $a = StringRegExpReplace($a, '\$03:.+\R', '') $a = StringRegExpReplace($a, '\$30:.+\R', '') $a = StringRegExpReplace($a, '(\$200:)(.+)', '\1') $a = StringRegExpReplace($a, '(\$%\$\?\$%)(\R)(\$=B)', '\1\3') $a = StringRegExpReplace($a, '(\$120:)(\R)(\$T)', '\1\3') $a = StringRegExpReplace($a, '(\$120:)(\R)(\$%\$\?\$%)', '\1\3') $a = StringRegExpReplace($a, '(\$120:\$%\$\?\$%)(\R)(\$=B)', '\1\3') $a = StringRegExpReplace($a, '(\$=S)(\R)(\$%\$\?\$%)', '\1\3') $a = StringRegExpReplace($a, '(\$%\$\?\$%)(\s)', '\1') $a = StringRegExpReplace($a, '(\$01:.+)(\R\$%\$\?\$%.+)', '\1') $a = StringRegExpReplace($a, '\$(?|=L\d+\*\d+)', '') FileWrite("output.txt", $a) Edited May 9, 2017 by tezhihi
mikell Posted May 10, 2017 Posted May 10, 2017 I do follow... my previous post was just a little play - because obviously you need a final file with a very particular formatting I tried to guess the signification of the various $x (i.e. $F = Footnotes, and so on) but I quickly gave up Actually you have a bunch of SRER and some of them are redundant , but they are not so bad and they do the job To help you I need precise infos... the best would be to build manually a final file, so when comparing I can know exactly what you want tezhihi 1
tezhihi Posted May 10, 2017 Author Posted May 10, 2017 1 hour ago, mikell said: I do follow... my previous post was just a little play - because obviously you need a final file with a very particular formatting I tried to guess the signification of the various $x (i.e. $F = Footnotes, and so on) but I quickly gave up Actually you have a bunch of SRER and some of them are redundant , but they are not so bad and they do the job To help you I need precise infos... the best would be to build manually a final file, so when comparing I can know exactly what you want @mikell Ok i will send for you the completed file for compare when im in office. This file completed with the best person. I will send you the full information for processing with regex with all take note of them.
mikell Posted May 10, 2017 Posted May 10, 2017 Nice. The more I have infos, the more I can make suggestions tezhihi 1
tezhihi Posted May 11, 2017 Author Posted May 11, 2017 (edited) 6 hours ago, mikell said: Nice. The more I have infos, the more I can make suggestions Hi @mikell can you check the For Mikell.xls file with remark and check EXAMPLE.TXT (This is Results of ahihi.txt) and help me. If you need more information please advise me. Oh I have a new one file for you to try processing: New For Process.txt Edited May 11, 2017 by tezhihi
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now