zuladabef Posted June 7, 2021 Share Posted June 7, 2021 I have a text file that I am reading line by line. I am trying to replace the blank spaces with a TAB character. But, I only want to do that after one of the listed Ethnicities is found. The number of spaces that occurs before the ethnicity is variable. Can RegEx accomplish this? Is there another solution if not? Thanks! ;~ ;Parse Ethnicity Local $iUboundOfFile = _FileCountLines($sFilePath_DataBuild) For $iCC = 1 To $iUboundOfFile -1 Local $sFileRead_LineByLine = FileReadLine($hFileOpen, $iCC) ; Read the line of the file using the handle returned by FileOpen. ;Find Ethinicity Local $aEthnicities = ["BLA ", "WHI ", "AMI ", " OTH ", "CHI ", "MEX "] For $xCC = 0 To UBound($aEthnicities) - 1 ;Loop through the ethnicities, one by one. I may not need this, but not sure. ;If "BLA " is found, then replace all the leading spaces (between it and the string before it) with a TAB character. Next Next Link to comment Share on other sites More sharing options...
TheXman Posted June 7, 2021 Share Posted June 7, 2021 12 minutes ago, zuladabef said: Can RegEx accomplish this? Yes Musashi 1 CryptoNG UDF: Cryptography API: Next Gen jq UDF: Powerful and Flexible JSON Processor | jqPlayground: An Interactive JSON Processor Xml2Json UDF: Transform XML to JSON | HttpApi UDF: HTTP Server API | Roku Remote: Example Script About Me How To Ask Good Questions On Technical And Scientific Forums (Detailed) | How to Ask Good Technical Questions (Brief) "Any fool can know. The point is to understand." -Albert Einstein "If you think you're a big fish, it's probably because you only swim in small ponds." ~TheXman Link to comment Share on other sites More sharing options...
zuladabef Posted June 7, 2021 Author Share Posted June 7, 2021 ;~ ;Parse Ethnicity Local $iUboundOfFile = _FileCountLines($sFilePath_DataBuild) For $iCC = 1 To $iUboundOfFile -1 Local $sFileRead_LineByLine = FileReadLine($hFileOpen, $iCC) ; Read the line of the file using the handle returned by FileOpen. ;Find Ethinicity Local $aEthnicities = ["BLA ", "WHI ", "AMI ", " OTH ", "CHI ", "MEX "] For $xCC = 0 To UBound($aEthnicities) - 1 ;Loop through the ethnicities, one by one. I may not need this, but not sure. ;If "BLA " is found, then replace all the leading spaces (between it and the string before it) with a TAB character. Local $iPositionNumberOfEthnicity = StringInStr($sFileRead_LineByLine, $aEthnicities[$xCC], 0, 1) If $iPositionNumberOfEthnicity == 0 Then ExitLoop $sOutPut_AfterStep1 = StringRegExpReplace($sFileRead_LineByLine, "(^\s*" & $aEthnicities[$xCC] & ")" , $aEthnicities[$xCC] & @TAB, 1) ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : $sOutPut_AfterStep1 = ' & $sOutPut_AfterStep1 & @CRLF & '>Error code: ' & @error & @CRLF) ;### Debug Console Next Next I tried adding a few lines, but it is not getting me there. What do you suggest? Link to comment Share on other sites More sharing options...
JockoDundee Posted June 7, 2021 Share Posted June 7, 2021 1 hour ago, TheXman said: Yes Not even a period! pseakins 1 Code hard, but don’t hard code... Link to comment Share on other sites More sharing options...
Solution TheXman Posted June 7, 2021 Solution Share Posted June 7, 2021 (edited) 1 hour ago, zuladabef said: What do you suggest? In the future, especially when dealing with regular expression, I would suggest that you provide accurate test data that has as many different test data scenarios as possible. I would also suggest that you provide exactly what the expected result should look like using the data that you provide. Since you provided no data, I used my own. You can adapt the regular expression as necessary. As with any language, including regular expressions, there are multiple ways to achieve the same result. The example below is just one very simple way to accomplish what you described. It is not case-sensitive and only replaces the defined ethnicities when they appear as whole words (not part of a bigger string of characters). Const $TEST_DATA = "Date: 2021-06-07 Ethnicity: BLA " & @CRLF & _ "Date: 2021-06-07 Ethnicity: BLACK " & @CRLF & _ "Date: 2021-06-07 Ethnicity: WHI" & @CRLF & _ "Date: 2021-06-07 Ethnicity: WHITE" & @CRLF & _ "Date: 2021-06-07 Ethnicity: AMI" & @CRLF & _ "Date: 2021-06-07 Ethnicity: Oth" & @CRLF & _ "Date: 2021-06-07 Ethnicity: MEX " & @CRLF & _ "Date: 2021-06-07 Ethnicity: CHI " & @CRLF & _ "Date: 2021-06-07 Ethnicity: ESK " & @CRLF example() Func example() ConsoleWrite(StringRegExpReplace($TEST_DATA, "(?i)\h+\b(BLA|WHI|AMI|OTH|MEX|CHI)\b", @TAB & "\1") & @CRLF) EndFunc Console Output: Date: 2021-06-07 Ethnicity: BLA Date: 2021-06-07 Ethnicity: BLACK Date: 2021-06-07 Ethnicity: WHI Date: 2021-06-07 Ethnicity: WHITE Date: 2021-06-07 Ethnicity: AMI Date: 2021-06-07 Ethnicity: Oth Date: 2021-06-07 Ethnicity: MEX Date: 2021-06-07 Ethnicity: CHI Date: 2021-06-07 Ethnicity: ESK Edited June 7, 2021 by TheXman CryptoNG UDF: Cryptography API: Next Gen jq UDF: Powerful and Flexible JSON Processor | jqPlayground: An Interactive JSON Processor Xml2Json UDF: Transform XML to JSON | HttpApi UDF: HTTP Server API | Roku Remote: Example Script About Me How To Ask Good Questions On Technical And Scientific Forums (Detailed) | How to Ask Good Technical Questions (Brief) "Any fool can know. The point is to understand." -Albert Einstein "If you think you're a big fish, it's probably because you only swim in small ponds." ~TheXman Link to comment Share on other sites More sharing options...
zuladabef Posted June 8, 2021 Author Share Posted June 8, 2021 @TheXman Great point about providing test data, I will make sure to include that next time. I am trying to get a better understanding of the switches you so graciously provided. Let's see if I understand it correctly: (?i)\h+\b(BLA|WHI|AMI|OTH|MEX|CHI)\b", @TAB & "\1") & @CRLF) (?i) - this makes it NOT case sensitive? \h+ - 1 or more horizontal spaces? \b match the empty string at the beginning or end of a word. We need this at the start and end of the string(s) to signify to match the whole word? \1 - I think this means to back reference whichever of the strings was found. If so, this is super cool. Do these seem accurate? Link to comment Share on other sites More sharing options...
TheXman Posted June 8, 2021 Share Posted June 8, 2021 (edited) 1 hour ago, zuladabef said: Do these seem accurate? Yes, for the most part. 1 hour ago, zuladabef said: \1 - I think this means to back reference whichever of the strings was found. If so, this is super cool. \1 is a back reference to the first capture group. Which in this case is: (BLA|WHI|AMI|OTH|MEX|CHI) Spoiler (?i)\h+\b(BLA|WHI|AMI|OTH|MEX|CHI)\b Use these options for the whole regular expression «(?i)» Case insensitive «i» Match a single character that is a “hortizonal whitespace character” (tab or any Unicode space separator) «\h+» Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» Assert position at a word boundary (position preceded or followed—but not both—by an ASCII letter, digit, or underscore) «\b» Match the regex below and capture its match into backreference number 1 «(BLA|WHI|AMI|OTH|MEX|CHI)» Match this alternative (attempting the next alternative only if this one fails) «BLA» Match the character string “BLA” literally (case insensitive) «BLA» Or match this alternative (attempting the next alternative only if this one fails) «WHI» Match the character string “WHI” literally (case insensitive) «WHI» Or match this alternative (attempting the next alternative only if this one fails) «AMI» Match the character string “AMI” literally (case insensitive) «AMI» Or match this alternative (attempting the next alternative only if this one fails) «OTH» Match the character string “OTH” literally (case insensitive) «OTH» Or match this alternative (attempting the next alternative only if this one fails) «MEX» Match the character string “MEX” literally (case insensitive) «MEX» Or match this alternative (the entire group fails if this one fails to match) «CHI» Match the character string “CHI” literally (case insensitive) «CHI» Assert position at a word boundary (position preceded or followed—but not both—by an ASCII letter, digit, or underscore) «\b» Created with RegexBuddy Edited June 8, 2021 by TheXman CryptoNG UDF: Cryptography API: Next Gen jq UDF: Powerful and Flexible JSON Processor | jqPlayground: An Interactive JSON Processor Xml2Json UDF: Transform XML to JSON | HttpApi UDF: HTTP Server API | Roku Remote: Example Script About Me How To Ask Good Questions On Technical And Scientific Forums (Detailed) | How to Ask Good Technical Questions (Brief) "Any fool can know. The point is to understand." -Albert Einstein "If you think you're a big fish, it's probably because you only swim in small ponds." ~TheXman Link to comment Share on other sites More sharing options...
zuladabef Posted June 8, 2021 Author Share Posted June 8, 2021 @TheXmanOkay that makes sense, thank you! Now that I understanding this a bit better, I think I should probably rewrite some of my previous code from the same script. Is it okay to post here, or should I create a new post? I'll make sure to add some test data this time too. Link to comment Share on other sites More sharing options...
JockoDundee Posted June 8, 2021 Share Posted June 8, 2021 1 hour ago, zuladabef said: Is it okay to post here, or should I create a new post? Either way, don’t forget to mark the topic as solved and give proper credit. Code hard, but don’t hard code... Link to comment Share on other sites More sharing options...
zuladabef Posted June 8, 2021 Author Share Posted June 8, 2021 (edited) So how would I modify the Regex pattern if I want to look for a nine digit string where the first seven characters are digits and the last two are either "LP" or "UP"? For example: 1018003LP 1016001UP 1031002UP 1015004LP I was thinking something like this would do the trick, but it's not working. $sOutPut_AfterStep1 = StringRegExpReplace($sFileRead_LineByLine, "(?i)\h+\b(\d{7}\w{2})\b", @TAB & "\1" & @TAB) *EDIT: Actually, that does seem to be working now! Edited June 8, 2021 by zuladabef Solved it, I think Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now