Pogi904 Posted April 20, 2014 Share Posted April 20, 2014 Hello! I didn't find this on the forums so I would appreciate any help. I was wondering if there was a way to StringSplit using multiple whole words as delimiters. For example, $rtext = "We advised you to clear your cache and cookies." $asentence = StringSplit($rtext,"We ", 1) ; <- How should this be written to split $rtext on "We " and "you " so that $asentence will be: ;asentence[0] = 3 ;asentence[1] = "" ;asentence[2] = "advised " ;asentence[3] = "to clear your cache and cookies." Let me know if this is possible or if I have to use other means of achieving this. Thanks in advance! Link to comment Share on other sites More sharing options...
mikell Posted April 20, 2014 Share Posted April 20, 2014 #Include <Array.au3> $rtext = "We advised you to clear your cache and cookies." $asentence = StringRegExp($rtext, 'We\s*(\w+)\s*you\s*(.+)', 3) _ArrayDisplay($asentence) ? Link to comment Share on other sites More sharing options...
Pogi904 Posted April 20, 2014 Author Share Posted April 20, 2014 #Include <Array.au3> $rtext = "We advised you to clear your cache and cookies." $asentence = StringRegExp($rtext, 'We\s*(\w+)\s*you\s*(.+)', 3) _ArrayDisplay($asentence) ? Well, that satisfies one example. But I want the split to happen for any sentence pattern. For example $rtext could be: $rtext = You explained that we need to call you back at a later time. ; <- So asentence should be: ;asentence[0] = 3 ;asentence[1] = "" ;asentence[2] = "explained that " ;asentence[3] = "need to call you back at a later time." Link to comment Share on other sites More sharing options...
Solution jguinch Posted April 20, 2014 Solution Share Posted April 20, 2014 (edited) in you the 2nd exemple, it sould be (without case sensitive, of course)... $rtext = You explained that we need to call you back at a later time. ; <- So asentence should be: ;asentence[0] = 3 ;asentence[1] = "" ;asentence[2] = "explained that " ;asentence[3] = "need to call" ;asentence[4] = "back at a later time." No ? It can be something like this : #Include <Array.au3> Local $rtext = "We advised you to clear your cache and cookies." Local $aDelimiters[] = [ "you", "we" ] $asentense = _StringSplitMultiple($rtext, $aDelimiters) _ArrayDisplay($asentense) ; $iFlag = 0 : case-sensitive ; $iFlag = 1 : case-insensitive Func _StringSplitMultiple($sString, $aDelims, $iFlag = 1) Local $sPattern = "(.*?)(?:" If $iFlag Then $sPattern = "(?i)" & $sPattern For $i = 0 To UBound($aDelims) - 1 $sPattern &= $aDelims[$i] & "\b|" Next $sPattern &= "$)" Local $aResult = StringRegExp($sString, $sPattern , 3) If IsArray($aResult) Then For $i = UBOund($aResult) - 1 To 1 Step -1 $aResult[$i] = $aResult[$i - 1] Next Else Return SetError(1, 0, -1) EndIf $aResult[0] = UBound($aResult) - 1 Return $aResult EndFunc Edited April 20, 2014 by jguinch Spoiler Network configuration UDF, _DirGetSizeByExtension, _UninstallList Firefox ConfigurationArray multi-dimensions, Printer Management UDF Link to comment Share on other sites More sharing options...
jchd Posted April 20, 2014 Share Posted April 20, 2014 (edited) As often in general and with regular expressions in particular, the devil hides in the detail. If a "sentence" is assumed to be what I would call "well formed", i.e. without parasitic whitespaces or made-up pitfalls, then the job is already non-trivial for a single regexp. But if the beef is supposed to cook for any input string, then a more precise definition of "word" is needed. Sample inputs (whitespaces matter): "we you we" " no magic word here " "we love you" " we love you, do we " "loving you shall we" "you-tube" a.s.o. Also is the solution allowed to produce inelegant empty strings? EDIT in bold! Edited April 20, 2014 by jchd This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
jchd Posted April 20, 2014 Share Posted April 20, 2014 Watch the effect of this (overly simple) pattern on some "sentences": #Include <Array.au3> Local $rtext = [ _ "We Remember you we advised you to clear your cache and cookies, now we are not guilty: you are.", _ "we you we", _ " no magic word here ", _ "we love you", _ " we love you, do we ", _ "loving you shall we", _ "you-tube" _ ] For $s In $rtext $asentence = StringRegExp($s, '(?i)\b((?:(?!\bwe\b|\byou\b).)+)', 3) _ArrayDisplay($asentence) Next Still parasitic empty captures. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
jchd Posted April 21, 2014 Share Posted April 21, 2014 I apologize for following up to myself this way but my findings deserve a separate post so that readers are aware of progress. I've hit a bug (or is that a mis-feature?) of our PCRE implementation, which caused me some headache. Here is the one-liner able to split any sentence on a list of taboo words properly, while removing whitespaces around the taboo words. Run it and see for yourself whether it fits the bill. Also watch the second loop and observe we get a parasitic empty capture corresponding to the pseudo-group named "taboo". The PCRE interface makes a difference for named definitions like the DEFINE for "taboo" and numbered groups but StringRegExp considers the group as an effective part of the result. #Include <Array.au3> Local $rtext = [ _ "We Remember you we advised you to clear your cache and cookies, now we are not guilty: you are.", _ "we you wE", _ "you owe us $100", _ " no magic word here ", _ "we love you", _ " we love you, do we ", _ "loving you shall we", _ "you_tube", _ "you-tube" _ ] For $s In $rtext $asentence = StringRegExp($s, '(?ix) (?: \h* (?<!\pL) (?:we|you) (?!\pL) \h* )* ( (?: (?! \h* (?<!\pL) (?:we|you) (?!\pL) \h* ) \N)* ) (?: \h* (?<!\pL) (?:we|you) (?!\pL) \h* )* \K', 3) _ArrayDisplay($asentence) Next ; results should be identical using a DEFINE special condition, but our PCRE implementation returns an empty ghost capture for the DEFINE, which IMHO it shouldn't do. For $s In $rtext $asentence = StringRegExp($s, '(?ix) (?(DEFINE) (?<taboo> \h* (?<!\pL) (?:we|you) (?!\pL) \h*) ) (?: (?&taboo) )* ( (?: (?! (?&taboo) ) \N)* ) (?: (?&taboo) )* \K', 3) _ArrayDisplay($asentence) Next pL is a character property which is true for letters, same as [[:alpha:]] in non-Unicode mode. w is not applicable since it regards the underscore as a word character. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
jguinch Posted April 22, 2014 Share Posted April 22, 2014 Really impressive, for my small brain The problem does not appear with an online regex test : http://regex101.com/r/yE0nU3 Spoiler Network configuration UDF, _DirGetSizeByExtension, _UninstallList Firefox ConfigurationArray multi-dimensions, Printer Management UDF Link to comment Share on other sites More sharing options...
jchd Posted April 22, 2014 Share Posted April 22, 2014 (edited) Exactly. To be fair, regex101.com offers the PHP flavor of PCRE which might behave somehow differently than the genuine PCRE library. Perl also has a number of differing behaviors, all of them pointed out in the PCRE reference documents. RegexBuddy v4 correctly displays more details than regex101.com: in full detail mode it lists group "taboo" and then the actual capture but points out that Group "taboo" did not participate in the match. In normal mode, it only shows the actual capturing group for each match. I'll post a detailed bug ticket as soon as I have more information about the reason for this (minor but annoying) issue. Simple code to demonstrate the issue, without even needing invokation of the DEFINEd subroutine: _ArrayDisplay(StringRegExp("bbb", "(?x) (a)? (b+)", 3)) _ArrayDisplay(StringRegExp("bbb", "(?x) (?(DEFINE) (?<head> a)) (b+)", 3)) Edit: this is now Trac ticket #2696. Edited April 23, 2014 by jchd This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Pogi904 Posted April 22, 2014 Author Share Posted April 22, 2014 in you the 2nd exemple, it sould be (without case sensitive, of course)... $rtext = You explained that we need to call you back at a later time. ; <- So asentence should be: ;asentence[0] = 3 ;asentence[1] = "" ;asentence[2] = "explained that " ;asentence[3] = "need to call" ;asentence[4] = "back at a later time." No ? It can be something like this : #Include <Array.au3> Local $rtext = "We advised you to clear your cache and cookies." Local $aDelimiters[] = [ "you", "we" ] $asentense = _StringSplitMultiple($rtext, $aDelimiters) _ArrayDisplay($asentense) ; $iFlag = 0 : case-sensitive ; $iFlag = 1 : case-insensitive Func _StringSplitMultiple($sString, $aDelims, $iFlag = 1) Local $sPattern = "(.*?)(?:" If $iFlag Then $sPattern = "(?i)" & $sPattern For $i = 0 To UBound($aDelims) - 1 $sPattern &= $aDelims[$i] & "\b|" Next $sPattern &= "$)" Local $aResult = StringRegExp($sString, $sPattern , 3) If IsArray($aResult) Then For $i = UBOund($aResult) - 1 To 1 Step -1 $aResult[$i] = $aResult[$i - 1] Next Else Return SetError(1, 0, -1) EndIf $aResult[0] = UBound($aResult) - 1 Return $aResult EndFunc Jguinch, this seems to work well enough for my situation. Thank you! (You mispelled "sentence" on your example though, but don't worry about it haha =P) As often in general and with regular expressions in particular, the devil hides in the detail. If a "sentence" is assumed to be what I would call "well formed", i.e. without parasitic whitespaces or made-up pitfalls, then the job is already non-trivial for a single regexp. But if the beef is supposed to cook for any input string, then a more precise definition of "word" is needed. Sample inputs (whitespaces matter): "we you we" " no magic word here " "we love you" " we love you, do we " "loving you shall we" "you-tube" a.s.o. Also is the solution allowed to produce inelegant empty strings? EDIT in bold! I don't really mind the empty strings in my case since I can always just filter that out, but other people may preffer it. You have done a great job with that in your solution Jchd. Very detailed with your regex, it's amazing. Link to comment Share on other sites More sharing options...
jchd Posted April 22, 2014 Share Posted April 22, 2014 PCRE is amazing, I'm only a janitor. AFAICT you only get an empty string as result when the "sentence" contain only taboo words or is itself empty. Of course it's trivial to use a longer list of splitting words. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now