adityaparakh Posted February 23, 2018 Share Posted February 23, 2018 Hello , I am reading line by line from a text file and trying to fix errors. How do I find the position of RegEx search match. Input File : 659855424638 Michelle Heidt 978-240-0653 214-585-8297 michellemheidt@gustr.com Maxillofacial radiologist "Michelle Heidt, 2095 Pearlman Avenue, Franklin, Massachusetts, United States, 2038" 659855424639 Emilee Akins 904-724-3260 502-463-3665 emileerakins@armyspy.com Forest and conservation worker "Emilee Akins, 2054 Boundary Street, Jacksonville, Florida, United States, 32211" 659855424640 Lori Girouard 512-963-1160 413-772-3313 lorilgirouard@teleworm.us Agricultural and food science technician "Lori Girouard, 4603 Short Street, Austin, Texas, United States, 78741"659855424628 Samantha Richardson 407-856-8677 973-447-6977 samanthatrichardson@example.com Budget officer "Samantha Richardson, 4599 McDonald Avenue, McDonald, McDonald, United States, 12345" 1. CAPITALsmallsmallCAPITALsmallsmall - First Occurence of this - Insert a space , before the second Capital Error Eg. 659855424628 SamanthaRichardson 407-856-8677 973-447-6977 samanthatrichardson@example.com Budget officer "Samantha Richardson, 4599 McDonald Avenue, McDonald, McDonald, United States, 12345" 2. Alphabetnumbernumber , Insert a space , before the number Error Eg. 659855424628 Samantha Richardson407-856-8677 973-447-6977 samanthatrichardson@example.com Budget officer "Samantha Richardson, 4599 McDonald Avenue, McDonald, McDonald, United States, 12345" 3. Quote touching the alphabet , Note : second quote from right. Insert a space , before the quote. Error Eg. 659855424628 Samantha Richardson 407-856-8677 973-447-6977 samanthatrichardson@example.com Budget officer"Samantha Richardson, 4599 McDonald Avenue, McDonald, McDonald, United States, 12345" 4. Too Long Email , Put a space after the .com Error Eg. 659855424628 Samantha Richardson 407-856-8677 973-447-6977 samanthatrichardson@example.comBudget officer "Samantha Richardson, 4599 McDonald Avenue, McDonald, McDonald, United States, 12345" Can you please assist in how we can search with a regex pattern and get the poistion. StringRegEx - only returns true or false StringInStr - requires specific string and cannot search for regex Also , a bit of guidance , on how we can read and overwrite the line , maintaining the position. a brief snippet on how to read the entire file , maintaining a 2D arry would help. $segments = StringSplit($theLine, " ") Examples of RegEx If StringRegExp($segments[2], "[0-9]") Then If StringRegExp($segments[2], "([A-Z])\w+([A-Z])\w+") Then If StringRegExp($segments[4], "\d{3}-\d{3}-\d{4}") Then If StringRegExp($segments[6],'^[\_]*([a-z0-9]+(\.|\_*)?)+@([a-z][a-z0-9\-]+(\.|\-*\.))+[a-z]{2,6}$',0) = 1 Then Link to comment Share on other sites More sharing options...
jguinch Posted February 23, 2018 Share Posted February 23, 2018 I'm not sure to understand... maybe this work : Local $string = "abcdefghijklmnopqrstuvmxyz" MsgBox(0, "", _StringPos($string, "z") ) Func _StringPos($sString, $sSearch) Local $aReplace = StringRegExp($string, "(?s)(.*?)\Q" & $sSearch & "\E", 1) If @error Then Return 0 Return StringLen($aReplace[0]) + 1 EndFunc Spoiler Network configuration UDF, _DirGetSizeByExtension, _UninstallList Firefox ConfigurationArray multi-dimensions, Printer Management UDF Link to comment Share on other sites More sharing options...
jguinch Posted February 23, 2018 Share Posted February 23, 2018 1 hour ago, adityaparakh said: StringRegEx - only returns true or false no, it can return an array of matches Spoiler Network configuration UDF, _DirGetSizeByExtension, _UninstallList Firefox ConfigurationArray multi-dimensions, Printer Management UDF Link to comment Share on other sites More sharing options...
mrflibblehat Posted February 23, 2018 Share Posted February 23, 2018 (edited) StringRegExp can return more than true or false. Check the optional Flag. https://www.autoitscript.com/autoit3/docs/functions/StringRegExp.htm If you look at the 4th parameter, "offset", this could be used in a for loop to read line by line. get lines in file. for each line run match with line number in offset if match print line or offset in this case just an idea. There may be a more elegant way to do it Check Reply #2 from jguinch, I haven't used this function in a while. Edited February 23, 2018 by mrflibblehat [font="'courier new', courier, monospace;"]Pastebin UDF | Prowl UDF[/font] Link to comment Share on other sites More sharing options...
adityaparakh Posted February 23, 2018 Author Share Posted February 23, 2018 Hello / mrfibblehat , Thanks for the reply , jguinch - I am trying to understand your function , can you please explain the RegularExpression and what is being returned. Thanks for the assistance. I am a newbie. I am not very good with RegEx. Can you please advise for this cases : $theLine = HoldstheLine $segments = StringSplit($theLine, " ") 1.SamanthaRichardson - Find this pattern ie twoCaps in single word Insert space before second Capital letter. 2.dson407 - word continued by numbers without space Insert space before Number 3.cer"S - No space before doubleQuote Insert space before doubleQuote 4.ple.comBu - letters after .com Insert space after .com Please suggest , how I can modify the segments and also the entire line. Link to comment Share on other sites More sharing options...
jguinch Posted February 23, 2018 Share Posted February 23, 2018 (?s) Single-line or DotAll: . matches anything including a newline sequence . Matches any single character except, by default, a newline sequence. Matches newlines as well when option (?s) is active. *? 0 or more, lazy (takes the smallest match) \Q...\E Verbatim sequence: metacharacters loose their special meaning between \Q and \E (if you have special characters in your search string) Spoiler Network configuration UDF, _DirGetSizeByExtension, _UninstallList Firefox ConfigurationArray multi-dimensions, Printer Management UDF Link to comment Share on other sites More sharing options...
mikell Posted February 23, 2018 Share Posted February 23, 2018 4 hours ago, adityaparakh said: How do I find the position of RegEx search match Use @extended Local $string = "abcdefghijklmnopqrstuvmxyz" MsgBox(0, "", _StringPos($string, "d") ) Func _StringPos($sString, $sSearch) StringRegExp($string, "\Q" & $sSearch & "\E", 1) Return @error ? 0 : @extended - StringLen($sSearch) EndFunc jguinch 1 Link to comment Share on other sites More sharing options...
jguinch Posted February 23, 2018 Share Posted February 23, 2018 (edited) I didn't remember about this... (it's in the 1st example in the help page) edit : instead of using StringLen : Local $string = "abcdefghijklmnopqrstuvmxyz" MsgBox(0, "", _StringPos($string, "a") ) Func _StringPos($sString, $sSearch) StringRegExp($string, "(?=\Q" & $sSearch & "\E)", 1) Return @error ? 0 : @extended EndFunc Edited February 23, 2018 by jguinch Spoiler Network configuration UDF, _DirGetSizeByExtension, _UninstallList Firefox ConfigurationArray multi-dimensions, Printer Management UDF Link to comment Share on other sites More sharing options...
adityaparakh Posted February 23, 2018 Author Share Posted February 23, 2018 I will try now. Thank you for the reply. Finding this one difficult , as locating the position of the second Capital seems challenging. Can you please help. (Two Upper in single Word) $inputString = "MichaelJackson" $outputSting = "Michael Jackson" I will use the answer for this , and try on the rest patterns. Trying with this , "([A-Z])\w+([A-Z])\w+") but getting confused with the positioning. Link to comment Share on other sites More sharing options...
adityaparakh Posted February 23, 2018 Author Share Posted February 23, 2018 8 minutes ago, adityaparakh said: I will try now. Thank you for the reply. Finding this one difficult , as locating the position of the second Capital seems challenging. Can you please help. (Two Upper in single Word) $inputString = "MichaelJackson" $outputSting = "Michael Jackson" This pattern has to be searched for in the entire text file and action taken. I will use the answer for this , and try on the rest patterns. Trying with this , "([A-Z])\w+([A-Z])\w+") but getting confused with the positioning. Link to comment Share on other sites More sharing options...
jguinch Posted February 23, 2018 Share Posted February 23, 2018 Local $sIutputstring = "MichaelJackson" Local $sOutputstring = StringRegExpReplace($sIutputstring, "\w\K(?=[[:upper:]])", " ") ConsoleWrite($sOutputstring) \w Matches any "word" character: any digit, any letter or underscore "_" \K Resets start of match at the current point in subject string (the character before the mathing upper letter won't be part of the replacement) (?=X) Positive look-ahead: matches when the subpattern X matches starting at the current position. [:upper:] ASCII uppercase letters (same as [A-Z]). Spoiler Network configuration UDF, _DirGetSizeByExtension, _UninstallList Firefox ConfigurationArray multi-dimensions, Printer Management UDF Link to comment Share on other sites More sharing options...
jchd Posted February 23, 2018 Share Posted February 23, 2018 Try this: ; read file with FileReadToArray Local $s = [ _ '659855424638 Michelle Heidt 978-240-0653 214-585-8297 michellemheidt@gustr.com Maxillofacial radiologist "Michelle Heidt, 2095 Pearlman Avenue, Franklin, Massachusetts, United States, 2038"', _ '659855424639 Emilee Akins 904-724-3260 502-463-3665 emileerakins@armyspy.com Forest and conservation worker "Emilee Akins, 2054 Boundary Street, Jacksonville, Florida, United States, 32211"', _ '659855424640 Lori Girouard 512-963-1160 413-772-3313 lorilgirouard@teleworm.us Agricultural and food science technician "Lori Girouard, 4603 Short Street, Austin, Texas, United States, 78741"', _ '659855424628 SamanthaRichardson407-856-8677 973-447-6977 samanthatrichardson@example.comBudget officer "Samantha Richardson, 4599 McDonald Avenue, McDonald, McDonald, United States, 12345"', _ '659855424628 Samantha Richardson 407-856-8677 973-447-6977 samanthatrichardson@example.com Budget officer "Samantha Richardson, 4599 McDonald Avenue, McDonald, McDonald, United States, 12345"' _ ] Local $a For $i = 0 To UBound($s) - 1 $a = StringRegExp($s[$i], '(\d+\s[A-Z][a-z]+)\s?([A-Z][a-z]+)\s?([\d -]+[^@]+@[-a-z_]+\.[-a-z_]+(?:\.[-a-z_]+)?)\s?([A-Z][^"]+[a-z])\s?(".*)', 3) ConsoleWrite(@error & ' ' & @extended & @LF) $s[$i] = _ArrayToString($a, ' ') ConsoleWrite($s[$i] & @LF) Next ; delete input file and write with FileWriteFromArray (be sure you have a safe copy of input first! Exit This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
jguinch Posted February 23, 2018 Share Posted February 23, 2018 ;~ 1.SamanthaRichardson - Find this pattern ie twoCaps in single word ;~ Insert space before second Capital letter. $sIutputstring = "SamanthaRichardson" $sOutputstring = StringRegExpReplace($sIutputstring, "\w\K(?=[[:upper:]])", " ") ConsoleWrite($sIutputstring & " => " & $sOutputstring & @CRLF) ;~ 2.dson407 - word continued by numbers without space ;~ Insert space before Number $sIutputstring = "dson407" $sOutputstring = StringRegExpReplace($sIutputstring, "[A-Za-z]\K(?=\d)", " ") ConsoleWrite($sIutputstring & " => " & $sOutputstring & @CRLF) ;~ 3.cer"S - No space before doubleQuote ;~ Insert space before doubleQuote $sIutputstring = "cer""S" $sOutputstring = StringRegExpReplace($sIutputstring, "[A-Za-z]\K(?="")", " ") ConsoleWrite($sIutputstring & " => " & $sOutputstring & @CRLF) ;~ 4.ple.comBu - letters after .com ;~ Insert space after .com $sIutputstring = "ple.comBu" $sOutputstring = StringRegExpReplace($sIutputstring, "[A-Za-z]\K(?=\.com)", " ") ConsoleWrite($sIutputstring & " => " & $sOutputstring & @CRLF) adityaparakh 1 Spoiler Network configuration UDF, _DirGetSizeByExtension, _UninstallList Firefox ConfigurationArray multi-dimensions, Printer Management UDF Link to comment Share on other sites More sharing options...
adityaparakh Posted February 27, 2018 Author Share Posted February 27, 2018 On 2/23/2018 at 7:54 PM, jguinch said: ;~ 1.SamanthaRichardson - Find this pattern ie twoCaps in single word ;~ Insert space before second Capital letter. $sIutputstring = "SamanthaRichardson" $sOutputstring = StringRegExpReplace($sIutputstring, "\w\K(?=[[:upper:]])", " ") ConsoleWrite($sIutputstring & " => " & $sOutputstring & @CRLF) ;~ 2.dson407 - word continued by numbers without space ;~ Insert space before Number $sIutputstring = "dson407" $sOutputstring = StringRegExpReplace($sIutputstring, "[A-Za-z]\K(?=\d)", " ") ConsoleWrite($sIutputstring & " => " & $sOutputstring & @CRLF) ;~ 3.cer"S - No space before doubleQuote ;~ Insert space before doubleQuote $sIutputstring = "cer""S" $sOutputstring = StringRegExpReplace($sIutputstring, "[A-Za-z]\K(?="")", " ") ConsoleWrite($sIutputstring & " => " & $sOutputstring & @CRLF) ;~ 4.ple.comBu - letters after .com ;~ Insert space after .com $sIutputstring = "ple.comBu" $sOutputstring = StringRegExpReplace($sIutputstring, "[A-Za-z]\K(?=\.com)", " ") ConsoleWrite($sIutputstring & " => " & $sOutputstring & @CRLF) Thank you @jguinch , It was really helpful. Inspiring. Having Knowledge can simplify hours of work. Link to comment Share on other sites More sharing options...
adityaparakh Posted March 1, 2018 Author Share Posted March 1, 2018 @jguinch $lineInput = 798678 168165 TANGOSOLINC T-1480240304 4 August 2004 Randy Johnston "May 9, 2004" 11:11:00 AM GREGGORY DAY offthewallsl@home.com LEONARD ALLENSTEIN 1935 N PROSPECT ST WALDORF VA 58575 (865) 932-7685 VGCC |47855444555| 2-3 4-4 1-3 5 refId-InvoiceNumber-CompanyName-CourierNumber-CourierDate-PersoneName-Date-Time-PersoneName-Email-PerSonName-Address-City-State-Zip-Phone-Group-GroupCode-ProductList Can you please assist with usage of StringRegExpReplace. Wish to enclose both before and after the following patterns with pipe symbol | 1. 10 July 2004 or 01 April 2014 or 1 May 2014 2. "June 17, 2004" or "November 30, 1999" ;quotes exists in original data 3. 05:16:00 PM or 07:44:00 AM or 04:11:00 PM 4. Email Address 5. (856) 845-5184 or (860) 667-1874 6. 4-4 or 12-4 or 18-18 or 4-18 7. WALDORF VA 58575 (865) 932-7685 or FONTANA IN 59094 (859) 689-2150 or LONG BEACH ma 58886 (860) 741-0435 or SAN BERNARDINO fl 59138 (858) 488-3780 This one is City | State |Zip | Phone. Once we find Phone. We can start from right-to-left the first number would be zip then twoCharacter-StateCode. For City , I have a different approach - I am planning to have a text file , which will have the list of cities , if it matches with first word right to left after state - use it. Else for two words (eg. Los Angeles , San Francisco etc) and then use. But to be able to use. I need to segment it with "|" 8. E-1480240304 or T-1516958759 or W-055373 or W-055373373 or Hope you can please help , I have been trying various combinations but really struggling. Will be thankful for your help. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now