StringRegExReplace in File.

tezhihi · May 4, 2017

I have a file (see attached file) with a string all line and this problem on here is I want to separate all $00:, $03:, $10:, $20:, $25:, $30:, $40:, $45:, $110:, $115:, $120: and $T. It's mean that each $ with value start a new line ( a new paragraph). I tried with Regular Expression in notepad++ ex:

Find ($00:, $01:, $03: and so on) with regex (\$)([0-9]+): and replace is \r\n\1\2 (I think \r\n is @CRLF (not sure :() )

Find $T with regex (\$T)(.*?)(\$T) and replace is \1\2\r\n\3

When I try these regex to replace in notepad on StringRegexReplace the results is incorrect . I have read some example simple about regex. Please advise me how to do that with some example on autoit . The result will be in attached photo. Thanks

ahihi.txt

tezhihi · May 4, 2017

aaaa

Edited May 4, 2017 by tezhihi

JLogan3o13 · May 4, 2017

@tezhihi please wait 24 hours before bumping your thread. All of our forum members assist as they are able, and not all are in the same time zone; the person best able to assist you may not be online right now.

tezhihi · May 4, 2017

11 minutes ago, JLogan3o13 said:

@tezhihi please wait 24 hours before bumping your thread. All of our forum members assist as they are able, and not all are in the same time zone; the person best able to assist you may not be online right now.

Sorry about that. Thanks you

mikell · May 4, 2017

What about this ?

$txt = FileRead("ahihi.txt")

$output = StringRegExpReplace($txt, '(?<!^|:)(?=\$(?:00|03|10|20|25|30|40|45|110|115|120|T))', @crlf)
FileWrite("output.txt", $output)

Meaning :
Search positions (not preceded by start of text or a colon) and (followed by $00, $03 and so on)
and at these positions insert a crlf

Edited May 4, 2017 by mikell

tezhihi · May 5, 2017

10 hours ago, mikell said:
What about this ?
$txt = FileRead("ahihi.txt")

$output = StringRegExpReplace($txt, '(?<!^|:)(?=\$(?:00|03|10|20|25|30|40|45|110|115|120|T))', @crlf)
FileWrite("output.txt", $output)
Meaning :
Search positions (not preceded by start of text or a colon) and (followed by $00, $03 and so on)
and at these positions insert a crlf

Awesome. I have a more question.

I want to link between 2 $ with text below:

$F$%$?$%- - - - - - - - - - - - - - - - - -Footnotes- - - - - - - - - - - - - - - - - -
$Tn1 Subsequent to oral argument by letter, dated December 16, 2005 Scrushy's attorney submitted to the Court a "Memo" dated 7/19/04 and apparently prepared by a Pamela Anderson. This "Memo" contains unverified, unsworn rank hearsay and the Court has not considered the contents of this document in reaching its decision.$%$?$%- - - - - - - - - - - - - - - - -End
 Footnotes- - - - - - - - - - - - - - - - -$E$=P1298*3

$F with $Tn1 and Delete $%$?$%- - - - - - - - - - - - - - - - -End Footnotes- - - - - - - - - - - - - - - - - and the result should be

$F$Tn1 Subsequent to oral argument by letter, dated December 16, 2005 Scrushy's attorney submitted to the Court a "Memo" dated 7/19/04 and apparently prepared by a Pamela Anderson. This "Memo" contains unverified, unsworn rank hearsay and the Court has not considered the contents of this document in reaching its decision.$E$=P1298*3

Kindly check and advise me. Thanks you

Edited May 5, 2017 by tezhihi

mikell · May 5, 2017

Here it is

$str = '$F$%$?$%- - - - - - - - - - - - - - - - - -Footnotes- - - - - - - - - - - - - - - - - -' & @crlf & _ 
'$Tn1 Subsequent to oral argument by letter, dated December 16, 2005 Scrushy''s attorney submitted to the Court a "Memo" dated 7/19/04 and apparently prepared by a Pamela Anderson. This "Memo" contains unverified, unsworn rank hearsay and the Court has not considered the contents of this document in reaching its decision.$%$?$%- - - - - - - - - - - - - - - - -End' & @crlf & _ 
' Footnotes- - - - - - - - - - - - - - - - -$E$=P1298*3'

;msgbox(0,"", $str)

$str2 = StringRegExpReplace($str, '\$%\$\?\$%[^\$]+', "")
msgbox(0,"", $str2)

This removes any non-$ char from "$%$?$%" up to the next $ char
Please note that you have to first manage the single quote(s) included in the string

tezhihi · May 5, 2017

58 minutes ago, mikell said:

Here it is

$str = '$F$%$?$%- - - - - - - - - - - - - - - - - -Footnotes- - - - - - - - - - - - - - - - - -' & @crlf & _ 
'$Tn1 Subsequent to oral argument by letter, dated December 16, 2005 Scrushy''s attorney submitted to the Court a "Memo" dated 7/19/04 and apparently prepared by a Pamela Anderson. This "Memo" contains unverified, unsworn rank hearsay and the Court has not considered the contents of this document in reaching its decision.$%$?$%- - - - - - - - - - - - - - - - -End' & @crlf & _ 
' Footnotes- - - - - - - - - - - - - - - - -$E$=P1298*3'

;msgbox(0,"", $str)

$str2 = StringRegExpReplace($str, '\$%\$\?\$%[^\$]+', "")
msgbox(0,"", $str2)

This removes any non-$ char from "$%$?$%" up to the next $ char
Please note that you have to first manage the single quote(s) included in the string

Can I search the string "$=<$=T3*1 $=L00180000845001096*001106" on regex

1. "\$=<\$=T([0-9]+)\*([0-9]+)\s\$=L.+?\*(([0-9]+){6})"

or

2. "\$=<\$=T[^\$]+\$=L[^\$]+"

I want to remove it

Edited May 5, 2017 by tezhihi

mikell · May 5, 2017

I saw that this string exists (with various flavours) several times in the file. To remove them all you might use a more selective expression to limit the risk of errors

StringRegExpReplace($str, '\$=<\$=T3\*1\s+\$=L\d+\*\d+', "")

tezhihi · May 5, 2017

28 minutes ago, mikell said:
I saw that this string exists (with various flavours) several times in the file. To remove them all you might use a more selective expression to limit the risk of errors
StringRegExpReplace($str, '\$=<\$=T3\*1\s+\$=L\d+\*\d+', "")

Oh thanks youuuu so much.

He received the following bonuses for the years shown:$M05,06,13,13,13,33$GFOR$JANNUAL$JTARGET$JTOTAL$JDESCRIPTION$GYEAR$JBONUS$JBONUS$JINCENTIVE$G$J$J$JBONUS$D$Q2002$B$ 10,000,000$B$ 1,200,000$Y$ 11,200,000$BAnnual Bonus in and for 2002$Q$B$B$Y$Balso paid in 2002. Net income$Q$B$B$Y$Bwas negative $ (467 million.)$Q$B$B$Y$BApproved by the Compensation$Q$B$B$Y$BCommittee on April 29, 2002.$Q$Q2001$B$ 6,500,000$B$ 2,400,000$Y$ 8,900,000$BNet income for 2001 was negative$Q$B$B$Y$B$ (191 million). Bonuses$Q$B$B$Y$Breported in the proxy filed$Q$B$B$Y$BApril 12, 2002.$Q$Q2000$BNone.$B$ 2,154,849$Y$ 2,154,849$BNet income for 2000 was a$Q$B$B$Y$Bnegative $ (364 million).$Q$Q1999$BNone.$B$ 134,031$Y$ 134, 031$BFinancials fraudulent but no$Q$B$B$Y$Baudited restatement.$Q$Q1998$BNone.$B$ 1,577,829$Y$ 1,577,829$BFinancials fraudulent but no$Q$B$B$Y$Baudited restatement.$Q$Q1997$B$ 10,000,000$B$ 2,400,000$Y$ 12,400,000$BAnnual Bonus was reported in$Q$B$B$Y$Bproxy dated April 17, 1998, as$Q$B$B$Y$Bbeing for and earned in 1997.$Q$Q1996$B$ 8,000,000$B$ 2,400,000$Y$ 10,400,000$BAnnual Bonus was reported in$Q$B$B$Y$Bproxy dated April 9, 1997, as$Q$B$B$Y$Bbeing for, and earned in, 1996.$Q$QTotal$B$ 34,500,000$B$ 12,266,709$Y$ 46,766,709$BBefore Prejudgment Interest$X$=P1298*7 $T5. In each annual proxy on Form 14A, HealthSouth disclosed the following criteria for Incentive Bonuses paid to its executives:$=S$%$?$%Incentive Compensation: In addition to base salary, the $(Compensation$) Committee recommends to the Board of Directors cash incentive compensation for HealthSouth's executives, based on each executive's success in meeting qualitative and quantitative performance goals on an annual basis.

Are I correct if i use these code below:

StringRegExpReplace(FileRead("ahihi.txt"), '(?=\$(?:M))', @crlf)     ;For create a new paragraph with $M (ex: $M05,06,13,13,13,33$GFOR$JA ......)
StringRegExpReplace(FileRead("ahihi.txt"), '\$=P\d+\*\d+', '')       ;For remove all $=P1298*x
StringRegExpReplace(FileRead("ahihi.txt"), '\$=>', '')               ;For remove all $=>

mikell · May 5, 2017

Yes you are. They will all work correctly, some remarks yet :

The first one will insert a crlf just before any $M encountered, so it could be done a little simpler

StringRegExpReplace(FileRead("ahihi.txt"), '(?=\$M)', @crlf)

If the second one is made to remove all "$=P1298*x" precisely then it could be done more selective - which is always a good idea when using regex

StringRegExpReplace(FileRead("ahihi.txt"), '\$=P1298\*\d+', '')

Nothing to say about the third

tezhihi · May 8, 2017

On 5/6/2017 at 1:49 AM, mikell said:
Yes you are. They will all work correctly, some remarks yet :

The first one will insert a crlf just before any $M encountered, so it could be done a little simpler
StringRegExpReplace(FileRead("ahihi.txt"), '(?=\$M)', @crlf)
If the second one is made to remove all "$=P1298*x" precisely then it could be done more selective - which is always a good idea when using regex
StringRegExpReplace(FileRead("ahihi.txt"), '\$=P1298\*\d+', '')
Nothing to say about the third

Case 1:

After I used the code below:

#include <File.au3>
$a = FileRead(@ScriptDir & '\ahihi.txt')
$b = StringRegExpReplace($a, '(\$%\$\?\$%)(?=\$=B)', '\1' & @CRLF & '\2')
$c = StringRegExpReplace($b, '\$%\$\?\$%[^\$]+', "")
$d = StringRegExpReplace($c, '(?<!^|:)(?=\$(?:00|01|03|10|20|25|30|40|45|110|115|120|T|F|200|220))', @crlf)
$e = StringRegExpReplace($d, '(?=\$M)', @crlf)
$f = StringRegExpReplace($e, '(\$F)(\R)(\$Tn)', '\1\3')
$g = StringRegExpReplace($f, '\$=<\$=T\d+\*\d+\s+\$=L\d+\*\d+', '')
$h = StringRegExpReplace($g, '\$=P\d+\*\d+', '')
$i = StringRegExpReplace($h, '(\$E)\$%\$\?\$%(\$=B)', '')
$j = StringRegExpReplace($i, '\$I\s+\$U', '$I$U')
$k = StringRegExpReplace($j, '\$=>', '')
FileWrite("output.txt", $k)

Output file has been appeared problem (please see in the red box on image ____ All $T with symbol ' : ' ). I think this problem will be solve when i use code below:

StringRegExpReplace(FileRead("ahihi.txt"), '(\:)(\$T)(\$=B)', '\1' & @CRLF & '\2\3')

StringRegExpReplace(FileRead("ahihi.txt"), '([a-z]+)(\:)(\$T)', '\1\2' & @CRLF & '\3')

Otherwise, please advise me.

Case 2:

$str = '$T$F$%$?$%- - - - - - - - - - - - - - - - - -Footnotes- - - - - - - - - - - - - - - - - - n1 Liggett has been rewarded handsomely in the settlement with the Attorneys General for its historic cooperation. $%$?$%- - - - - - - - - - - - - - - - -End Footnotes- - - - - - - - - - - - - - - - -$E'   ;This is one line not include @CRLF

$str2 = StringRegExpReplace($str, '\$%\$\?\$%[^\$]+', "")

The results is ' $F$E ' not include text data inside.

I will resolve it as code below:

$str = '$T$F$%$?$%- - - - - - - - - - - - - - - - - -Footnotes- - - - - - - - - - - - - - - - - - n1 Liggett has been rewarded handsomely in the settlement with the Attorneys General for its historic cooperation. $%$?$%- - - - - - - - - - - - - - - - -End Footnotes- - - - - - - - - - - - - - - - -$E'

$str2 = StringRegExpReplace($str, '(\$%\$\?\$%[^\$]+)(n\d+)', '\2')

Otherwise, please advise me.

Edited May 8, 2017 by tezhihi

mikell · May 8, 2017

The case 2 could be done like this

StringRegExpReplace($str, '\$%\$\?\$%.+?(?=n\d|\$)', "")

For case 1 this is possible

StringRegExpReplace($str, '(?<=:)(?=\$T\$=B)', @CRLF)

But it becomes a little difficult for me as I don't know exactly which "$x" things you want to keep and which ones you want to remove :sweating:
Maybe all this could be done simpler depending on what should be the final result after complete treatment of the initial file

mikell · May 8, 2017

@tezhihi

6 hours ago, mikell said:

Maybe all this could be done simpler depending on what should be the final result

For instance could this be correct ?

$s = FileRead(@ScriptDir & '\ahihi.txt')
$s = StringRegExpReplace($s, '\R|(\$F)?\$%\$\?\$%[^\$]*', "")  ; footnotes
$s = StringRegExpReplace($s, '(?<!^|:)(?=\$(?|00|01|03|10|20|25|30|40|45|110|115|120|200|220|T|M))', @crlf)
$s = StringRegExpReplace($s, '(\$(?|[DXOENUI\?%]|=[BRIS<>]|=[PLT]\d+\*\d+))+', "")
$s = StringRegExpReplace($s, '(\$[MBJGQY])+', " ")

$s = StringRegExpReplace($s, '\$T', "")
FileWrite("output.txt", $s)

tezhihi · May 8, 2017

13 hours ago, mikell said:

@tezhihi

For instance could this be correct ?

$s = FileRead(@ScriptDir & '\ahihi.txt')
$s = StringRegExpReplace($s, '\R|(\$F)?\$%\$\?\$%[^\$]*', "")  ; use this code will be deleted some text data
$s = StringRegExpReplace($s, '(?<!^|:)(?=\$(?|00|01|03|10|20|25|30|40|45|110|115|120|200|220|T|M))', @crlf) 
$s = StringRegExpReplace($s, '(\$(?|[DXOENUI\?%]|=[BRIS<>]|=[PLT]\d+\*\d+))+', "") ; use this code will delete all $ with value. Each $ corresponding with font or indent at start paragraph and some others data.
$s = StringRegExpReplace($s, '(\$[MBJGQY])+', " ") ; just make a new paragraph start with $M and end with $X. I will modify after replace all done

$s = StringRegExpReplace($s, '\$T', "") ; $T is segment required in text file = indent at start paragraph
FileWrite("output.txt", $s)

@mikell oh no the result of output will be same with the result from output.txt of my code . Please see remark on your code.

I will send you the list of data need to delete in text file. Please see new 2.txt

I have multiple file like that for modify. I need to make a tool for un string all of them. Please check the correctness of the code below

#include <File.au3>
$a = FileRead(@ScriptDir & '\ahihi.txt')
$a = StringRegExpReplace($a, '(\$%\$\?\$%)(?=\$=B)', '\1' & @CRLF & '\2')
$a = StringRegExpReplace($a, '(\$%\$\?\$%[^\$]+Footnotes[^\$]+)(n\d+)', '\2')
$a = StringRegExpReplace($a, '\$%\$\?\$%[^\$]+Footnotes[^\$]+', '')
$a = StringRegExpReplace($a, '(\$%\$\?\$%[^\$]+)(\$E)', '\2')
$a = StringRegExpReplace($a, '(?<!^|:)(?=\$(?:00|01|03|10|20|25|30|40|45|110|115|120|T|F|200|220))', @CRLF)
$a = StringRegExpReplace($a, '(?=\$M)', @CRLF)
$a = StringRegExpReplace($a, '(?=\$=S)', @CRLF)
$a = StringRegExpReplace($a, '(?=\$%\$\?\$%)', @CRLF)
$a = StringRegExpReplace($a, '(\$F)(\R)(\$Tn)', '\1\3')
$a = StringRegExpReplace($a, '(\$T)(\R)(\$F)', '\3\1')
$a = StringRegExpReplace($a, '(\s)(\$E)', '\2')
$a = StringRegExpReplace($a, '\$=<\$=T\d+\*\d+\s+\$=L\d+\*\d+', '')
$a = StringRegExpReplace($a, '\$=<\$=T\d+\*[^\$]+\$=L\d+\*\d+', '')
$a = StringRegExpReplace($a, '\$=P\d+\*\d+', '')
$a = StringRegExpReplace($a, '(\$E)\$%\$\?\$%(\$=B)', '')
$a = StringRegExpReplace($a, '\$I\s+\$U', '$I$U')
$a = StringRegExpReplace($a, '\$=>', '')
$a = StringRegExpReplace($a, '(?<=:)(?=\$T\$=B)', @CRLF)
$a = StringRegExpReplace($a, '([A-Z][a-z].+\:)(\$T)', '\1' & @CRLF & '\2')
$a = StringRegExpReplace($a, '(\$E)(\$=B)', '\1' & @CRLF & '\2')
$a = StringRegExpReplace($a, '(\$00:)(.+)', '\1')
$a = StringRegExpReplace($a, '\$03:.+\R', '')
$a = StringRegExpReplace($a, '\$30:.+\R', '')
$a = StringRegExpReplace($a, '(\$200:)(.+)', '\1')
$a = StringRegExpReplace($a, '(\$%\$\?\$%)(\R)(\$=B)', '\1\3')
$a = StringRegExpReplace($a, '(\$120:)(\R)(\$T)', '\1\3')
$a = StringRegExpReplace($a, '(\$120:)(\R)(\$%\$\?\$%)', '\1\3')
$a = StringRegExpReplace($a, '(\$120:\$%\$\?\$%)(\R)(\$=B)', '\1\3')
$a = StringRegExpReplace($a, '(\$=S)(\R)(\$%\$\?\$%)', '\1\3')
$a = StringRegExpReplace($a, '(\$%\$\?\$%)(\s)', '\1')
$a = StringRegExpReplace($a, '(\$01:.+)(\R\$%\$\?\$%.+)', '\1')
$a = StringRegExpReplace($a, '\$(?|=L\d+\*\d+)', '')
FileWrite("output.txt", $a)

Edited May 9, 2017 by tezhihi

tezhihi · May 10, 2017

@mikell follow up need your help

mikell · May 10, 2017

I do follow... my previous post was just a little play - because obviously you need a final file with a very particular formatting
I tried to guess the signification of the various $x (i.e. $F = Footnotes, and so on) but I quickly gave up :sweating:
Actually you have a bunch of SRER and some of them are redundant , but they are not so bad and they do the job
To help you I need precise infos... the best would be to build manually a final file, so when comparing I can know exactly what you want

tezhihi · May 10, 2017

1 hour ago, mikell said:

I do follow... my previous post was just a little play - because obviously you need a final file with a very particular formatting
I tried to guess the signification of the various $x (i.e. $F = Footnotes, and so on) but I quickly gave up
Actually you have a bunch of SRER and some of them are redundant , but they are not so bad and they do the job
To help you I need precise infos... the best would be to build manually a final file, so when comparing I can know exactly what you want

@mikell Ok i will send for you the completed file for compare when im in office. This file completed with the best person. I will send you the full information for processing with regex with all take note of them.

mikell · May 10, 2017

Nice. The more I have infos, the more I can make suggestions

tezhihi · May 11, 2017

6 hours ago, mikell said:

Nice. The more I have infos, the more I can make suggestions

Hi @mikell can you check the For Mikell.xls file with remark and check EXAMPLE.TXT (This is Results of ahihi.txt) and help me. If you need more information please advise me.

Oh I have a new one file for you to try processing: New For Process.txt

Edited May 11, 2017 by tezhihi

StringRegExReplace in File.

Recommended Posts

Top Posters In This Topic

Top Posters In This Topic

Popular Posts

Jos

mikell

mikell

Posted Images

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members

Similar Content