Burgaud Posted July 17, 2016 Share Posted July 17, 2016 I would like to OCR Scan jpg files and extract numbers. This seems to do the trick: StringRegExp($temp, '[0-9\.]+', 3) However, I realized that the OCR app oftentimes would translate capital I and small L (l) as 1 as well so much so words like "Will" are oftentimes OCRed and recognized as "11". I noticed that numbers are preceeded by either +, - or space. Thus I would like to match only numbers if they are preceeded by either of these three chars [+1 ] without having those chars as part of the regex result. How do I do that? Sorry, I am basic regex to know how to do it Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted July 17, 2016 Moderators Share Posted July 17, 2016 Burgaud, Use a" lookbehind" to check if the match is preceded by one of those 3 characters: Global $aList[] = ["+11", "i11", "-11", "a11", " 11"] For $i = 0 To UBound($aList) - 1 $fMatch = False If StringRegExp($aList[$i], "(?<=[ +-])(\d+)") Then $fMatch = True EndIf ConsoleWrite($aList[$i] & " : " & $fMatch & @CRLF) Next M23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area Link to comment Share on other sites More sharing options...
Burgaud Posted July 17, 2016 Author Share Posted July 17, 2016 (?<=[ +-]) This re's lookbehind has eluded me for many years. Thanks for your simple script, i finally get to understand this awesome function. +1 for the education Link to comment Share on other sites More sharing options...
iamtheky Posted July 18, 2016 Share Posted July 18, 2016 whats the difference, and benefits, between that regex and [\s\+\-](\d+) ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__) Link to comment Share on other sites More sharing options...
mikell Posted July 18, 2016 Share Posted July 18, 2016 (edited) iamtheky, Because the OP's particular sample is very simple, the answer is : none Furthermore, regex101 says that the lookbehind consumes more steps than the simple expression Edit But I strongly suspect that Melba chose the lookbehind for teaching purpose, to introduce the concept - which is extremely powerful and useful in more complex situations Edited July 18, 2016 by mikell Robjong 1 Link to comment Share on other sites More sharing options...
iamtheky Posted July 18, 2016 Share Posted July 18, 2016 sure, due to my sucking with regex lookbehind never even enters as a possibility. I cant figure out why to use it and when to use it, until i figure out exactly what it is. ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__) Link to comment Share on other sites More sharing options...
jchd Posted July 18, 2016 Share Posted July 18, 2016 (edited) See http://www.pcre.org/original/doc/html/pcrepattern.html for more information about feaures and gory details. EDIT: forgot to mention that AutoIt currently uses the "legacy" version of PCRE, now nicknamed PCRE1. As the PCRE main webpage explains, PCRE has been substantively rewritten as PCRE2. While 99.9% of the regexp features are compatible, some corner cases have been fixed or changed. The main changes are in the library interface functions. So do not refer to PCRE2 documentation until a new version of AutoIt is made available with explicit support for it. Edited July 18, 2016 by jchd Danyfirex 1 This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
jguinch Posted July 18, 2016 Share Posted July 18, 2016 @iamtheky : To understand how look arround assertions work, here is an example : You have the string A123B456C789 and you want to capture each numbers enclosed by a letter (123 and 456)What comes to your mind is probably a regex like [A-Z](\d+)[A-Z] : in this case, you will have only one result (123) because the regex consumes the specified characters (A123B) and then continues to search from the position after the first match. It remains "456C789" in the chain : the B has been consumed, so 456 cannot be considered enclosed by 2 letters. To avoid the regex consumes characters, you can use a look arround assertion : [A-Z](\d+)(?=[A-Z]) With this regex, the letter after the number is not consumed : it means take one letter, then a number, and look after to see if there is a letter. "Look arround" does not consume characters, so the first match for this regex consumes A123, that's all. Next search starts from B, so the capture works, 456 is captured. The last search starts from C and so on .I could have used a look before for this job : (?<=[A-Z])(\d+)[A-Z], it works as well. I hope it will help you to understand dmob and Danyfirex 2 Spoiler Network configuration UDF, _DirGetSizeByExtension, _UninstallList Firefox ConfigurationArray multi-dimensions, Printer Management UDF Link to comment Share on other sites More sharing options...
iamtheky Posted July 18, 2016 Share Posted July 18, 2016 (edited) But is there a larger thing you are solving for that would in your case not just use: \D?(\d+)\D Or is that a look around as well? Edited July 18, 2016 by iamtheky ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__) Link to comment Share on other sites More sharing options...
jguinch Posted July 18, 2016 Share Posted July 18, 2016 \D? takes any non-digits. If it matches, the non digit is consumed, but with "?" it is not mandatory, so it works with 123 even if there is no letter before (it's not what I wanted to do in my example). Look at the difference of the two regex here (the consumed characters appear in blue and green) : https://regex101.com/r/uG2lZ7/1https://regex101.com/r/uG2lZ7/2 The first link works, the second no. It's not easy to explain, I spent a lot of time doing tests before to understand Another example : check if a string contains some desired words. You have to check if a string contains the words "iamtheky", "King" and "Regex", the order does not matter. ^(?=.*iamtheky)(?=.*King)(?=.*Regex) : the "look after" assertion retains the current position (beginnig) and searchs on the right. First, it looks for .*iamtheky and comes back at the current position. If the string is found, the regex continues the job (looks at the right for .*King and comes back again). If the string is not found, the regex fails. So it matchs with both iamtheky is the future King of Regex and The future Regex's King is iamtheky Danyfirex 1 Spoiler Network configuration UDF, _DirGetSizeByExtension, _UninstallList Firefox ConfigurationArray multi-dimensions, Printer Management UDF Link to comment Share on other sites More sharing options...
iamtheky Posted July 18, 2016 Share Posted July 18, 2016 ah, i didnt realize it made it unnecessary. ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__) Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now