JockoDundee Posted December 30, 2020 Share Posted December 30, 2020 hi! I have a string of chess moves that I need to clean-up before I can properly proccess them. Typically a computer chess move consists of a from square and a to square cat-ted together into a 4 char string, for exampe d2d4. A game, or game fragment is just a series of these like so: e2e4 e7e5 g1f3 b8c6 f1b5 a7a6 b5a4 g8f6 e1g1 f6e4 d2d4 b7b5 a4b3 d7d5 d4e5 c8e6 c1e3 f8c5 d1d3 This makes proccessing them quick since I can just use a fixed offset. Even though the vast majority of strings contain 4 chars, sometimes, after a pawn is promoted to a queen, a move will consist of 5 chars; for instance, notice below in the three examples how a few of the strings contain 5 chars. This 5th character is of no importance to me. g1f1 h3h2 f1e2 h2h1q a1h1 h8h1 f2f3n h1h4 g1f3 h3h2 f1e2 h2h1n a1h1 h8h1 f4f1 h1h3 f2f8 e7f8 e4f2 h3h2 e3e4 d5e4 f2e4 h2h1q d4g1 h1e4 g1f2 g7g5 This obviously destroys my fixed offsets, so I have written code that forces each move to be limited to just four characters. However, it is rather inelegant, looping thru the movelist and fixing each one seperately. So I was thinking one of you geniuses must know a way to do it better, possibly in a single search/replace statement? tl;dr To be clear, the challenge is to truncate all 5 char strings to 4 char strings, i.e. change something like this: g1f1 h3h2 f1e2 h2h1q a1h1 h8h1 f2f3n h1h4 g1f3 h3h2 f1e2 h2h1n a1h1 h8h1 f4f1 h1h3 f2f8 e7f8 e4f2 h3h2 e3e4 d5e4 f2e4 h2h1q d4g1 h1e4 g1f2 g7g5 into this: g1f1 h3h2 f1e2 h2h1 a1h1 h8h1 f2f3 h1h4 g1f3 h3h2 f1e2 h2h1 a1h1 h8h1 f4f1 h1h3 f2f8 e7f8 e4f2 h3h2 e3e4 d5e4 f2e4 h2h1 d4g1 h1e4 g1f2 g7g5 in one RegEx statement if possible. Thanks! Code hard, but don’t hard code... Link to comment Share on other sites More sharing options...
AspirinJunkie Posted December 30, 2020 Share Posted December 30, 2020 \b\w{4}\K\w No AutoIt Here but the pattern should work. Replace this with empty string. JockoDundee 1 Link to comment Share on other sites More sharing options...
mikell Posted December 30, 2020 Share Posted December 30, 2020 2 hours ago, JockoDundee said: the challenge is to truncate all 5 char strings to 4 char strings In regex translated to common language this could mean "remove a char if it is preceded by 4 chars" StringRegExpReplace($txt, '(?<=\w{4})\w', "") JockoDundee 1 Link to comment Share on other sites More sharing options...
JockoDundee Posted December 30, 2020 Author Share Posted December 30, 2020 Thanks Guys! I'm not sure which one to use expandcollapse popup1: \b\w{4}\K\w 2: (?<=\w{4})\w ORG: f4e5 d7d6 g1f3 d6e5 f3e5q f8d6 e5f3 g8f6 g2g3 f6g4 b1c3q h7h5 d2d4s h5h4 h1g1 b8c6 c1g5 f7f6 g5f4 h4g3 h2g3 g7g5 f4d6 ---------------------------------------------------------------------------------------------------------- 1: f4e5 d7d6 g1f3 d6e5 f3e5 f8d6 e5f3 g8f6 g2g3 f6g4 b1c3 h7h5 d2d4 h5h4 h1g1 b8c6 c1g5 f7f6 g5f4 h4g3 h2g3 g7g5 f4d6 2: f4e5 d7d6 g1f3 d6e5 f3e5 f8d6 e5f3 g8f6 g2g3 f6g4 b1c3 h7h5 d2d4 h5h4 h1g1 b8c6 c1g5 f7f6 g5f4 h4g3 h2g3 g7g5 f4d6 ORG: d2d3 e5f4 c1f4q d7d5 e2e4 b8c6 b1c3 g8f6 g1f3 ---------------------------------------------------------------------------------------------------------- 1: d2d3 e5f4 c1f4 d7d5 e2e4 b8c6 b1c3 g8f6 g1f3 2: d2d3 e5f4 c1f4 d7d5 e2e4 b8c6 b1c3 g8f6 g1f3 ORG: f4f5 d7d5 d2d3 c8f5r b1c3 g8f6t g1f3 b8c6 ---------------------------------------------------------------------------------------------------------- 1: f4f5 d7d5 d2d3 c8f5 b1c3 g8f6 g1f3 b8c6 2: f4f5 d7d5 d2d3 c8f5 b1c3 g8f6 g1f3 b8c6 ORG: b2b4 e5f4 g1f3 f8b4 c2c3s b4e7 d1a4 g8f6 ---------------------------------------------------------------------------------------------------------- 1: b2b4 e5f4 g1f3 f8b4 c2c3 b4e7 d1a4 g8f6 2: b2b4 e5f4 g1f3 f8b4 c2c3 b4e7 d1a4 g8f6 ORG: h2h4 f8e7d g1f3s e5e4 f3g5 d7d5 e2e3 h7h6 ---------------------------------------------------------------------------------------------------------- 1: h2h4 f8e7 g1f3 e5e4 f3g5 d7d5 e2e3 h7h6 2: h2h4 f8e7 g1f3 e5e4 f3g5 d7d5 e2e3 h7h6 ORG: h2h3 d8h4 g2g3 h4g3 ---------------------------------------------------------------------------------------------------------- 1: h2h3 d8h4 g2g3 h4g3 2: h2h3 d8h4 g2g3 h4g3 ORG: g2g4e d8h4 ---------------------------------------------------------------------------------------------------------- 1: g2g4 d8h4 2: g2g4 d8h4 Code hard, but don’t hard code... Link to comment Share on other sites More sharing options...
AspirinJunkie Posted December 30, 2020 Share Posted December 30, 2020 Create criterias like simplicity, length, performance or anything else. If no other criteria than the wanted result exist then roll a dice. Link to comment Share on other sites More sharing options...
pseakins Posted December 31, 2020 Share Posted December 31, 2020 I don't understand why, in certain cases, anyone would want to mess around with string regular expressions when a simple string expression is demonstrably faster and much easier to understand, especially to anyone trying to maintain someone else's code. RexEx may have its place but not in this case. $x = StringLeft($x, 4) ; this is simpler and WAY faster than $x = StringRegExpReplace($x, '(?<=\w{4})\w', "") ; this Faster? Yes, in a million iterations on a test array the string expression comes in at 19.7 seconds while the RegEx takes 23.2 seconds. Phil Seakins Link to comment Share on other sites More sharing options...
JockoDundee Posted December 31, 2020 Author Share Posted December 31, 2020 5 hours ago, pseakins said: I don't understand why, in certain cases, anyone would want to mess around with string regular expressions when a simple string expression is demonstrably faster and much easier to understand... Why indeed? Maybe because this: $x = StringLeft($x, 4) cuts off the whole movelist except for the first, when used on the sample string: f4e5 d7d6 g1f3 d6e5 f3e5q f8d6 e5f3 g8f6 g2g3 f6g4 b1c3q h7h5 d2d4s h5h4 h1g1 b8c6 c1g5 f7f6 g5f4 h4g3 h2g3 g7g5 f4d6 to your point, currently I am doing something like $arrMoveList=StringSplit($sMoveList, " ",2) For $sMove In $arrMoveList $sFixedMoveList&=StringLeft($sMove,4) & " " Next but that's not quite the same, right? Anyway, I tried your test - there's good news and bad: Good news - You're the fastest - Bad news - your output is off: MoveList: f4e5 d7d6 g1f3 d6e5 f3e5q f8d6 e5f3 g8f6 g2g3 f6g4 b1c3q h7h5 d2d4s h5h4 h1g1 b8c6 c1g5 f7f6 g5f4 h4g3 h2g3 g7g5 f4d6 d8d6 d1d2 Solutions: 1: \b\w{4}\K\w 2: (?<=\w{4})\w 3: My Loop Code 4: StringLeft($txt, 4) Output: 1: f4e5 d7d6 g1f3 d6e5 f3e5 f8d6 e5f3 g8f6 g2g3 f6g4 b1c3 h7h5 d2d4 h5h4 h1g1 b8c6 c1g5 f7f6 g5f4 h4g3 h2g3 g7g5 f4d6 d8d6 d1d2 2: f4e5 d7d6 g1f3 d6e5 f3e5 f8d6 e5f3 g8f6 g2g3 f6g4 b1c3 h7h5 d2d4 h5h4 h1g1 b8c6 c1g5 f7f6 g5f4 h4g3 h2g3 g7g5 f4d6 d8d6 d1d2 3: f4e5 d7d6 g1f3 d6e5 f3e5 f8d6 e5f3 g8f6 g2g3 f6g4 b1c3 h7h5 d2d4 h5h4 h1g1 b8c6 c1g5 f7f6 g5f4 h4g3 h2g3 g7g5 f4d6 d8d6 d1d2 4: f4e5 Timings for 1000000: 1: 1000000 Runs 5.061 2: 1000000 Runs 6.291 3: 1000000 Runs 37.46 4: 1000000 Runs 1.145 Anyway, let me have the rest of the code you had in mind, and I can plug it in.... FrancescoDiMuro 1 Code hard, but don’t hard code... Link to comment Share on other sites More sharing options...
mikell Posted December 31, 2020 Share Posted December 31, 2020 On 12/30/2020 at 9:10 AM, JockoDundee said: I'm not sure which one to use The most important is to understand how it works, so you can to build your own next time @pseakins BTW if an array was needed as the output, regex is easier and faster indeed #Include <Array.au3> $txt = "g1f1 h3h2 f1e2 h2h1q a1h1 h8h1 f2f3n h1h4" & @crlf & _ "g1f3 h3h2 f1e2 h2h1n a1h1 h8h1 f4f1 h1h3" & @crlf & _ "f2f8 e7f8 e4f2 h3h2 e3e4 d5e4 f2e4 h2h1q d4g1 h1e4 g1f2 g7g5" $res = StringRegExp($txt, '\b\w{4}', 3) _ArrayDisplay($res) pseakins 1 Link to comment Share on other sites More sharing options...
AspirinJunkie Posted December 31, 2020 Share Posted December 31, 2020 6 hours ago, pseakins said: I don't understand why, in certain cases, anyone would want to mess around with string regular expressions when a simple string expression is demonstrably faster and much easier to understand, especially to anyone trying to maintain someone else's code. RexEx may have its place but not in this case. You don't understand because you modified the task. As jockodundee stated, the task in a large string is to shorten all 5-character words to 4-character words. You modify the task so that all single moves are already separated in an array. But that was not the task. JockoDundee 1 Link to comment Share on other sites More sharing options...
pseakins Posted December 31, 2020 Share Posted December 31, 2020 1 hour ago, AspirinJunkie said: You don't understand because you modified the task Agreed. I did not accept @JockoDundee's challenge and provide a solution. I was just trying to make a case that string expressions generally would be faster than regular expressions. I guess I picked the wrong battle. As @JockoDundee pointed out, my expression when used on his short string would give the wrong result, which of course is obvious. Phil Seakins Link to comment Share on other sites More sharing options...
pseakins Posted December 31, 2020 Share Posted December 31, 2020 (edited) 16 hours ago, JockoDundee said: Anyway, let me have the rest of the code you had in mind, and I can plug it in My code would either use a loop, or if the 5th character is always a known value would use StringReplace(). I'm pretty sure this would be faster when working with a 10Mb string but with your 127 byte string there is no contest. '(?<=\w{4})\w' is significantly faster. EDIT: Wrong, I misspoke. '(?<=\w{4})\w' is the slower of the two regex expressions. Edited January 1, 2021 by pseakins Corrected text Phil Seakins Link to comment Share on other sites More sharing options...
pseakins Posted December 31, 2020 Share Posted December 31, 2020 (edited) 10 hours ago, pseakins said: I'm pretty sure this would be faster when working with a 10Mb string It's not. I just ran a test. The second regex is way in front. I should have done this 40 minutes ago, this is a bad start to the new year. EDIT: Actually the second regex is the slower of the two regex expressions Edited January 1, 2021 by pseakins Corrected text Phil Seakins Link to comment Share on other sites More sharing options...
AspirinJunkie Posted December 31, 2020 Share Posted December 31, 2020 4 minutes ago, pseakins said: I should have done this 40 minutes ago, this is a bad start to the new year. It is not - it is the end of 2020 - a year anyone tries hard to forget. So nothing happend right now 😉 Or you live in Australia - than you're right indeed. mikell 1 Link to comment Share on other sites More sharing options...
pseakins Posted December 31, 2020 Share Posted December 31, 2020 1 hour ago, AspirinJunkie said: Or you live in Australia - than you're right indeed. I do live in Australia. Phil Seakins Link to comment Share on other sites More sharing options...
JockoDundee Posted December 31, 2020 Author Share Posted December 31, 2020 2 hours ago, pseakins said: It's not. I just ran a test. The second regex is way in front. I should have done this 40 minutes ago, this is a bad start to the new year. Can you show your test? Code hard, but don’t hard code... Link to comment Share on other sites More sharing options...
AspirinJunkie Posted December 31, 2020 Share Posted December 31, 2020 1 hour ago, pseakins said: I do live in Australia. Happy new year! 🥳 Link to comment Share on other sites More sharing options...
pseakins Posted January 1, 2021 Share Posted January 1, 2021 6 hours ago, JockoDundee said: Can you show your test? My dyslexia may have misled you, I tend to reverse things as they come out of my head. '\b\w{4}\K\w' is the faster expression. Here's my test code. expandcollapse popup$sMovelist = "f4e5 d7d6 g1f3 d6e5 f3e5q f8d6 e5f3 g8f6 g2g3 f6g4 b1c3q h7h5 d2d4s h5h4 h1g1 b8c6 c1g5 f7f6 g5f4 h4g3 h2g3 g7g5 f4d6 d8d6 d1d2" $iLoop = 50000 $hTimer = TimerInit() For $i = 1 To $iLoop $sNewList = StringReplace($sMovelist, "q", "") $sNewList = StringReplace($sNewList, "s", "") Next $fDiff1 = TimerDiff($hTimer) $hTimer = TimerInit() For $i = 1 To $iLoop $sNewList = StringRegExpReplace($sMovelist, '\b\w{4}\K\w', "") Next $fDiff2 = TimerDiff($hTimer) $hTimer = TimerInit() For $i = 1 To $iLoop $sNewList = StringRegExpReplace($sMovelist, '(?<=\w{4})\w', "") Next $fDiff3 = TimerDiff($hTimer) $sNewList = "" ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : $fDiff1=[' & $fDiff1 & '] Error code: ' & @error & @CRLF) ;### Debug Console ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : $fDiff2=[' & $fDiff2 & '] Error code: ' & @error & @CRLF) ;### Debug Console ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : $fDiff3=[' & $fDiff3 & '] Error code: ' & @error & @CRLF) ;### Debug Console $x = "" For $i = 1 To 78125 ; 1000000 / (127 + 1) = 78125 $x &= $sMovelist & " " ; create 1mB list Next $sMovelist = StringTrimRight($x, 1) ; remove trailing space $x = "" $hTimer = TimerInit() $sNewList = StringReplace($sMovelist, "q", "") $sNewList = StringReplace($sNewList, "s", "") $fDiff1 = TimerDiff($hTimer) $sNewList = "" ; just in case memory collection skews timing test $hTimer = TimerInit() $sNewList = StringRegExpReplace($sMovelist, '\b\w{4}\K\w', "") $fDiff2 = TimerDiff($hTimer) $sNewList = "" $hTimer = TimerInit() $sNewList = StringRegExpReplace($sMovelist, '(?<=\w{4})\w', "") $fDiff3 = TimerDiff($hTimer) ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : $fDiff1=[' & $fDiff1 & '] Error code: ' & @error & @CRLF) ;### Debug Console ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : $fDiff2=[' & $fDiff2 & '] Error code: ' & @error & @CRLF) ;### Debug Console ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : $fDiff3=[' & $fDiff3 & '] Error code: ' & @error & @CRLF) ;### Debug Console ; Using a modified form of the ALT-D Debug Console insertions. The output fits on one line and the variable is delineated. Console output; @@ Debug(24) : $fDiff1=[2496.65864193368] Error code: 0 @@ Debug(25) : $fDiff2=[357.503320831953] Error code: 0 @@ Debug(26) : $fDiff3=[490.190716918503] Error code: 0 @@ Debug(50) : $fDiff1=[2966.93059827757] Error code: 0 @@ Debug(51) : $fDiff2=[387.96407343657] Error code: 0 @@ Debug(52) : $fDiff3=[562.891279040785] Error code: 0 Phil Seakins Link to comment Share on other sites More sharing options...
iamtheky Posted January 1, 2021 Share Posted January 1, 2021 (edited) while "remove any character if preceded by 4 characters" is solid Is this task also: "remove any letter that is followed by whitespace"... maybe room to speed it up there? $txt = "g1f1 h3h2 f1e2 h2h1q a1h1 h8h1 f2f3n h1h4" & @crlf & _ "g1f3 h3h2 f1e2 h2h1n a1h1 h8h1 f4f1 h1h3" & @crlf & _ "f2f8 e7f8 e4f2 h3h2 e3e4 d5e4 f2e4 h2h1q d4g1 h1e4 g1f2 g7g5" msgbox (0, '' , StringRegExpReplace($txt , "\D\s" , " ")) Edited January 1, 2021 by iamtheky ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__) Link to comment Share on other sites More sharing options...
mikell Posted January 1, 2021 Share Posted January 1, 2021 Doesn't work if the 5 chars sequence is the last one in the string $txt = "f2f8 e7f8 e4f2 h3h2n e3e4 d5e4 f2e4 h2h1q" msgbox (0, '' , StringRegExpReplace($txt , "\D\s" , " ")) Suggestion using the "remove any letter that is followed by whitespace or end of string" concept : StringRegExpReplace($txt , '[[:alpha:]](?=\s|$)', "") Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now