seadoggie01 Posted January 9, 2020 Share Posted January 9, 2020 I've built this Regular expression that works great on https://regex101.com/, but when I enter it and the same data into AutoIt, it gives me 4 extra empty capturing groups before each set of data. Can someone explain to me why this happens and possibly how to fix it? (?i)(?m) (?(DEFINE) (?<NS>(?:[^ \n]+)) (?<PaymentType>(?:Pharmacy|Hospital Costs|Physical Therapy costs|Medical Payment|Physician Payment|Medical Supplies, DME|Bill Review|Network Access Fee|Chiropractic Expenses)) (?<Money>\$[\d,]*\.\d*) (?<Date>\d{1,2}\/\d{1,2}\/\d{2,4}) ) ^((?&NS)) ((?&NS)) ((?&Date)) ((?&NS)) ((?&NS)) (.*) ((?&PaymentType)) ((?&Money)) ((?&Money)) NS stands for "no spaces". I added the newlines to help make the definitions more visible. I'm splitting data out of a PDF and use this RegEx to turn it into a CSV. Here's some random test data Spoiler Column 1 Column 2 Column 3 Column 4 Column 5 Column 6 Column 7 Column 8 Column 9 784070 475086 2/21/2019 951612 774400 19 Some text Network Access Fee $9,818.00 $9,818.00 321538 697220 10/16/2019 584345 157837 90 Some text Medical Supplies, DME $4,893.00 $4,893.00 717049 131510 11/24/2019 591540 434357 80 Some text Hospital Costs $9,890.00 $9,890.00 441658 578030 1/6/2019 920334 593618 92 Some text Network Access Fee $2,912.00 $2,912.00 934772 726402 12/27/2019 262470 659210 41 Some text Network Access Fee $3,515.00 $3,515.00 456371 782567 3/22/2019 232286 569047 76 Some text Bill Review $845.00 $845.00 733793 243027 10/24/2019 827310 509902 30 Some text Physician Payment $9,401.00 $9,401.00 446456 289749 12/14/2019 399924 975049 73 Some text Physical Therapy costs $5,212.00 $5,212.00 657106 762255 6/13/2019 858558 157695 53 Some text Medical Payment $5,931.00 $5,931.00 631262 523757 12/10/2019 221874 270665 85 Some text Medical Supplies, DME $592.00 $592.00 705439 821105 7/9/2019 807562 429778 32 Some text Bill Review $1,802.00 $1,802.00 All my code provided is Public Domain... but it may not work. Use it, change it, break it, whatever you want. Spoiler My Humble Contributions:Personal Function Documentation - A personal HelpFile for your functionsAcro.au3 UDF - Automating Acrobat ProToDo Finder - Find #ToDo: lines in your scriptsUI-SimpleWrappers UDF - Use UI Automation more Simply-erKeePass UDF - Automate KeePass, a password managerInputBoxes - Simple Input boxes for various variable types Link to comment Share on other sites More sharing options...
pixelsearch Posted January 9, 2020 Share Posted January 9, 2020 (edited) It seems that each of your DEFINE adds an empty group. You got 4 DEFINE's => 4 empty groups. Examples with less DEFINE... 1 Match text line : 784070 475086 2/21/2019 951612 774400 19 Some text Network Access Fee $9,818.00 $9,818.00 2 DEFINE's => 2 empty groups (?i)(?m)(?(DEFINE)(?<NS>(?:[^ \n]+))(?<Date>\d{1,2}\/\d{1,2}\/\d{2,4}))^((?&NS)) ((?&NS)) ((?&Date)) 0: 1: 2: 784070 3: 475086 4: 2/21/2019 1 DEFINE => 1 empty group (?i)(?m)(?(DEFINE)(?<NS>(?:[^ \n]+)))^((?&NS)) 0: 1: 784070 I tried what follows to get rid of the empty group, it works but I don't know why : 1 DEFINE => no empty group (?i)(?m)(?(DEFINE)(?<NS>(?:[^ \n]+)))^(?&NS) 0: 784070 Hope it will help you to solve your problem Edit: what follows shows that it's not because you got 4 ((?&NS)) in your search pattern that there are 4 empty groups. 1 DEFINE => 1 empty group (though there are 2 ((?&NS) in search pattern (?i)(?m)(?(DEFINE)(?<NS>(?:[^ \n]+)))^((?&NS)) ((?&NS)) 0: 1: 784070 2: 475086 Edited January 9, 2020 by pixelsearch seadoggie01 1 Link to comment Share on other sites More sharing options...
seadoggie01 Posted January 9, 2020 Author Share Posted January 9, 2020 (edited) Thanks! At least I can just replace all of the groups for now with their definitions. I would love to know why they're being captured though 😐 Edited January 9, 2020 by seadoggie01 All my code provided is Public Domain... but it may not work. Use it, change it, break it, whatever you want. Spoiler My Humble Contributions:Personal Function Documentation - A personal HelpFile for your functionsAcro.au3 UDF - Automating Acrobat ProToDo Finder - Find #ToDo: lines in your scriptsUI-SimpleWrappers UDF - Use UI Automation more Simply-erKeePass UDF - Automate KeePass, a password managerInputBoxes - Simple Input boxes for various variable types Link to comment Share on other sites More sharing options...
jchd Posted January 9, 2020 Share Posted January 9, 2020 That's a "feature" of the legacy PCRE versions. The currently supported branch is PCRE2 and is a complete rewrite of the library. Unfortunately AutoIt is still using the legacy version. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
seadoggie01 Posted January 10, 2020 Author Share Posted January 10, 2020 3 hours ago, jchd said: feature Ugh, okay, thank you! I'll just write my code around it 🙄 All my code provided is Public Domain... but it may not work. Use it, change it, break it, whatever you want. Spoiler My Humble Contributions:Personal Function Documentation - A personal HelpFile for your functionsAcro.au3 UDF - Automating Acrobat ProToDo Finder - Find #ToDo: lines in your scriptsUI-SimpleWrappers UDF - Use UI Automation more Simply-erKeePass UDF - Automate KeePass, a password managerInputBoxes - Simple Input boxes for various variable types Link to comment Share on other sites More sharing options...
mikell Posted January 10, 2020 Share Posted January 10, 2020 A funny way to work around (and BTW get a csv) #Include <Array.au3> $txt = "Column 1 Column 2 Column 3 Column 4 Column 5 Column 6 Column 7 Column 8 Column 9" & @crlf & _ "784070 475086 2/21/2019 951612 774400 19 Some text Network Access Fee $9,818.00 $9,818.00" & @crlf & _ "321538 697220 10/16/2019 584345 157837 90 Some text Medical Supplies, DME $4,893.00 $4,893.00" & @crlf & _ "717049 131510 11/24/2019 591540 434357 80 Some text Hospital Costs $9,890.00 $9,890.00" & @crlf & _ "441658 578030 1/6/2019 920334 593618 92 Some text Network Access Fee $2,912.00 $2,912.00" & @crlf & _ "934772 726402 12/27/2019 262470 659210 41 Some text Network Access Fee $3,515.00 $3,515.00" & @crlf & _ "456371 782567 3/22/2019 232286 569047 76 Some text Bill Review $845.00 $845.00" & @crlf & _ "733793 243027 10/24/2019 827310 509902 30 Some text Physician Payment $9,401.00 $9,401.00" & @crlf & _ "446456 289749 12/14/2019 399924 975049 73 Some text Physical Therapy costs $5,212.00 $5,212.00" & @crlf & _ "657106 762255 6/13/2019 858558 157695 53 Some text Medical Payment $5,931.00 $5,931.00" & @crlf & _ "631262 523757 12/10/2019 221874 270665 85 Some text Medical Supplies, DME $592.00 $592.00" & @crlf & _ "705439 821105 7/9/2019 807562 429778 32 Some text Bill Review $1,802.00 $1,802.00" $p = '(?(DEFINE)' & _ '(?<NS>\d+)' & _ '(?<Date>\d{1,2}/\d{1,2}/\d{2,4})' & _ '(?<PaymentType>Pharmacy|Hospital Costs|Physical Therapy costs|Medical Payment|Physician Payment|Medical Supplies, DME|Bill Review|Network Access Fee|Chiropractic Expenses)' & _ '(?<Money>\$[\d,]*\.\d*))' & _ '(?im)^((?&NS)) ((?&NS)) ((?&Date)) ((?&NS)) ((?&NS)) (.+) ((?&PaymentType)) ((?&Money)) ((?&Money))' $s = StringRegExpReplace($txt, $p, "$5;$6;$7;$8;$9;${10};${11};${12};${13}") Msgbox(0,"", $s) seadoggie01 1 Link to comment Share on other sites More sharing options...
seadoggie01 Posted January 13, 2020 Author Share Posted January 13, 2020 On 1/10/2020 at 4:53 PM, mikell said: A funny way to work around (and BTW get a csv) Thanks! I never would've thought of that. I was trying to keep in an array to input it into Excel, but I could just use text to columns for that. All my code provided is Public Domain... but it may not work. Use it, change it, break it, whatever you want. Spoiler My Humble Contributions:Personal Function Documentation - A personal HelpFile for your functionsAcro.au3 UDF - Automating Acrobat ProToDo Finder - Find #ToDo: lines in your scriptsUI-SimpleWrappers UDF - Use UI Automation more Simply-erKeePass UDF - Automate KeePass, a password managerInputBoxes - Simple Input boxes for various variable types Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now