xcaliber13 Posted January 2, 2019 Posted January 2, 2019 Hello, Ok lets get this out there..... I really struggle with Regular expressions.. So I would like a little help with this issue. I have a text file that I need to extract records from. Each record that I need to extract begins with: MSH|^~\&|Flexilab|CART the end of the record, the last line I need (the whole line) begins with: FT1| how would you write the regex to extract this data? The begining and ending strings have special characters. I am pretty sure I can figure out how to loop through the file and get the data needed. I just cannot figure the regex to be able to select the needed data. Thank you
ViciousXUSMC Posted January 2, 2019 Posted January 2, 2019 This should be pretty easy, but even with your description of the data it would be much better if you actually provide a sample (of course change the parts that are sensitive but leave all the special characters and such that would be needed for the regex. Especially since it looks like there could be some more simple patterns to run based on the description.
xcaliber13 Posted January 2, 2019 Author Posted January 2, 2019 Attached is a sample test file. ExampleData.txt
BigDaddyO Posted January 2, 2019 Posted January 2, 2019 This is a lot like the files I get from the state to validate. depending on the size of the files you are processing, you may not want to use RegEx unless you have to as it's one of the slowest ways to find data. "I process millions of rows like these to validate data" I'm assuming your doing a FileReadToArray then looping through Just do a If StringLeft($aLines[$i], 4) = "FT1|" then $aSegments = StringSplit($aLines[$i], "|") ;Since your lines end with the separator, the Last value in that line will be in $aSegments[$aSegments[0] - 1]
Nine Posted January 2, 2019 Posted January 2, 2019 Agree with @BigDaddyO. It is very easy to extract since the part you are searching for is at the beginning of the line. Also, I would test performance between FileReadToArray and a loop with FileReadLine. If there are millions of those lines, maybe processing directly from the file would be better... “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Screen Scraping Multi-Threading Made Easy
TheXman Posted January 2, 2019 Posted January 2, 2019 (edited) Here's a very basic couple of regular expressions that will extract your records. Of course there are numerous ways to do it with regular expressions. This is just one. #include <Constants.au3> #include <Array.au3> example() Func example() Local $sData Local $aResult $sData = FileRead("ExampleData.txt") ;Get MSH records $aResult = StringRegExp($sData, "(?m)(^MSH\|\^~\\&\|Flexilab\|CART.*)", $STR_REGEXPARRAYGLOBALMATCH) _ArrayDisplay($aResult) ;Get last FT1 record $aResult = StringRegExp($sData, "(?m).*(^FT1.*)", $STR_REGEXPARRAYMATCH) _ArrayDisplay($aResult) EndFunc Edited January 2, 2019 by TheXman Removed unused variable ($sExtractedRecords) CryptoNG UDF: Cryptography API: Next Gen jq UDF: Powerful and Flexible JSON Processor | jqPlayground: An Interactive JSON Processor Xml2Json UDF: Transform XML to JSON | HttpApi UDF: HTTP Server API | Roku Remote: Example Script About Me How To Ask Good Questions On Technical And Scientific Forums (Detailed) | How to Ask Good Technical Questions (Brief) "Any fool can know. The point is to understand." -Albert Einstein "If you think you're a big fish, it's probably because you only swim in small ponds." ~TheXman
mikell Posted January 2, 2019 Posted January 2, 2019 (edited) #Include <Array.au3> $txt = FileRead("ExampleData.txt") $res = StringRegExp($txt, '(?ms)^MSH.*?FT1\|\N+', 3) _ArrayDisplay($res) For $i = 0 to UBound($res)-1 $r = StringSplit($res[$i], @crlf, 3) _ArrayDisplay($r, $i) Next Comments :(?m) multiline mode^ in this mode, ^ means "start of line"\| pipe char, escaped\N+ one or more non-newline characters Edited January 2, 2019 by mikell
xcaliber13 Posted January 2, 2019 Author Posted January 2, 2019 Mikell Thank you! That is getting the last line of the record but I am still having a hard time with getting the begining line. MSH|^~\&|Flexilab|CART I tried this: $Msh = StringRegExp($txt, '(?m)^MSH\|\^~\\\&\|FLexilab\|CART\N+', 3) But does not pull the line. I believe it is because I am not escaping the escape characters correctly? And once I am able to get the start of the record and the end of the record I would use Stringbetween to get the whole record?
mikell Posted January 2, 2019 Posted January 2, 2019 (edited) 8 minutes ago, xcaliber13 said: That is getting the last line of the record (...) to get the whole record? Sorry. I first misunderstood your question, I edited my previous post Edited January 2, 2019 by mikell
xcaliber13 Posted January 2, 2019 Author Posted January 2, 2019 Mikell, That is just about what I need. Except that code $res = StringRegExp($txt, '(?ms)^MSH.*?FT1\|\N+', 3) is pulling every line that begins with MSH. I just need the lines that begin with MSH|^~\&|Flexilab|CART. I cannot figure out how to regex this pattern. Again thank you for your help.
xcaliber13 Posted January 2, 2019 Author Posted January 2, 2019 Thank you everyone. With Mikell and sheer dumb luck I got it to work! $res = StringRegExp($txt, '(?ms)^MSH\|\^\~\\\&\|Flexilab\|CART.*?FT1\|\N+', 3) Should be able to complete the script now. Again thank you to all for the help.
mikell Posted January 2, 2019 Posted January 2, 2019 (edited) Oh I see. Damn... I misunderstood again #Include <Array.au3> $txt = FileRead("ExampleData.txt") $mark = "^MSH\|\^~\\&\|Flexilab\|CART" $res = StringRegExp($txt, '(?ms)' & $mark & '.*?FT1\|\N+', 3) _ArrayDisplay($res) For $i = 0 to UBound($res)-1 $r = StringSplit($res[$i], @crlf, 3) _ArrayDisplay($r, $i) Next Edit Too late Edited January 2, 2019 by mikell
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now