LoneWolf92 Posted October 8, 2012 Share Posted October 8, 2012 Hej there david well the first the reason you dont se what i did to the underline was becuas i dindt opload the change. instead i descriped what i did so that otheres could change it if they nedded. i notest with the font i use it sometime think that the lowest line of an capital I for example is an underline and removes it that way it looks like this ~~~~~~~~~ ~~#####~~ ~~~~#~~~~ ~~~~#~~~~ ~~~~#~~~~ ~~~~#~~~~ ~~~~#~~~~ ~~~~#~~~~ ~~~~~~~~~ instead of this ~~~~~~~~~ ~~#####~~ ~~~~#~~~~ ~~~~#~~~~ ~~~~#~~~~ ~~~~#~~~~ ~~~~#~~~~ ~~~~#~~~~ ~~#####~~ ~~~~~~~~~ witch ended op with a lot of errors then most of the characters unicness disapered to fix this i just maked sure that the underline was allways set to false so it whoulded get deleted hopes this explains it better Best of luck Sebastian. Link to comment Share on other sites More sharing options...
dgm5555 Posted October 29, 2012 Share Posted October 29, 2012 (edited) EDIT: THE CODE HAS BEEN CONSIDERABLY UPDATED AND IMPROVED, SEE MY POST ON 7/12/12(the following notes probably still apply though, but you won't need SCITe UDF by default)This is a new improved version compared with my code above.It's roughly 2.8 times faster overall with character recognition.It can grab the entire desktop within a few milliseconds, thus eliminating errors with the screen changing in the middle of a grab.Once the screen is loaded, it's only about 1ms to recognise a 5x13pixel block (approx 1 character). Unfortunately loading the array from the screen takes about 3.5ms/block of which half is returning a colour from the grabbed screen image.It intrinsically manages multiple lines of text (but not tables or long vertical lines).There are likely to be a few bugs with this code, but I'll fix them as I find them. Because of this, there is a reasonable amount of 'scrap' debug code which hasn't been taken out yet.It #includes _PixelGetColor.au3 () for the fast screen grab, and also SCITe UDF () for debugging (this latter could be commented out without loss of function)Hope it's usefulDavidOCR.au3_PixelGetColor.au3SciTE UDF.au3 Edited December 7, 2012 by dgm5555 Link to comment Share on other sites More sharing options...
jchd Posted November 1, 2012 Share Posted November 1, 2012 I've had the need to OCR text on webpages presented as images (in an attempt to foul website automation!). I didn't go thru the different snippets shown in this thread but if I get the idea, an efficient way to OCR display on screen (same as generated image) is to make the OCR ask the user to tell it what character an unknown pattern represents. Using this idea I coded an OCR module using an SQLite database for font storage/query where the table holds successive vertical rows of gray-level pixels indexed by frequency and common few columns (e.g. B and D share the same first pixel column). This has worked like a charm, with the added bonus that the whole thing doesn't have fixed bounds: it can hold and recognize any font, style and size at the expense of a sligtly larger DB. In practice, the DB is loaded in memory for speed and character recognition is very fast. The delicate part is indeed discrimination of spacing: inter-character, one or more whitespaces, etc. I don't have easy access to this code anymore but I can dig it out some day if there is interest. I'll have to make it generic for public consumption but the core is there. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
dgm5555 Posted November 2, 2012 Share Posted November 2, 2012 Hi jchd, Does your system of categorising/indexing characters by columns mean it can self-learn characters? In any case I'd be interested in seeing your code as mine doesn't have any form of character index - just loads the entire database and uses stringInstr to look for character matches, but my database is currently only 10kb, so not currently big enough that this is too slow. Did you have the same problem as me that the screen grab/conversion time is the bottle-neck? If not how did you get around it? I'm currently wondering about converting my script to FreeBasic so hopefully it would run faster, but if you had any ways you did things faster would be great (and I have no idea how to make such a program integrate with AutoIt scripts, so that's probably just random musing). You noted it recognised any style. How did you manage italics/strikethough/underline? Cheers David Link to comment Share on other sites More sharing options...
jchd Posted November 7, 2012 Share Posted November 7, 2012 Sorry for late reply.No it can't self-learn. that's why it should be classified as "user-guided pseudo-OCR".In my application, I had the images generated by and loaded off a website, so no issue with grabbing them off screen.Any style is irrelevant to this code: A A A A etc and size variants are all distincts graphs (not the same pixel matrix).So each new variant (font, style, size, character) has its own entry. That means that each new variant will be shown to the user in a not-so-pretty fixed-font pixel matrix using ⎕ ░ ▒ ▓ █ (I worked with grey levels and mapped them to this gross representation) so that the user will decide which Unicode character(s) that represents.Some fonts had no box spacing between some pairs (even sequences) of characters, just like 'fi' or 'fl and many other (depends on used font). That's no problem with this approach: just tell the code which character [sequence] it stands for and next time it's found it will be handled automagically.Just one thing (which was never an issue in my use cases): if the slant of italics is such that successive italics characters almost never fit in separate individual rectangular box, then you end up recognizing/storing way too many character sequences, up to full words. That would make the approach completely impractical. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Rutger83 Posted November 9, 2012 Share Posted November 9, 2012 (edited) I onces made this when I just started with autoit, its messy, particialy in dutch and did I mention messy http://www.sendspace.com/file/yqfhh5I also used a db, I've included a opensource sqllite editor with in the rar file (thats why its 6mb)It might give some people some ideas, but to be hounest I am ashamed for the code and all but blabla... it might be usefull for some people.Its not finshed and has some issues.1) It cant read a black font because I used _FFKeepColor () (which keeps the defined colours but makes all the undefined colours..... black)2) Its has no colour tollerance for the pixelsearch (not quite sure why that is might just be no tollerance defined in the (did I mention shitty? ) code3) its in some sort of dutch/english (I'll see if I can find some space to upload a few screenshots to explain the gui.)start with frame.au3 or frame.exe Edited November 9, 2012 by Rutger83 Link to comment Share on other sites More sharing options...
OleksiiSamodid Posted November 18, 2012 Share Posted November 18, 2012 Hello guys! I'm using this library for a while, so i want to share some things about it: 1. ;$pixelColour = Not colourMatchBlack($pixelColour[0]) ; approx 20% faster than colourMatchWhite ;$pixelColour = Not colourMatchWhite($pixelColour[0]) $pixelColour = Not colourMatchBox($pixelColour[0]) It seems like colourMatchBlack and colourMatchWhite doesn't know anything about $bkgndColour, so i switched to "colourMatchBox" for processing of white letters on dark background. 2. "Rows removing logic" in underline and strikethrough section is not working correctly sometimes when recognizing single characters like "2". it produces such result: ...#####... ..########. .####.####. ####....### ###.....### ###.....### ###.....### ........### .......#### .......###. ......####. .....####.. ....####... ....####... ...####.... ..####..... ..###...... .####...... .###....... Link to comment Share on other sites More sharing options...
dgm5555 Posted December 7, 2012 Share Posted December 7, 2012 (edited) EDIT: THE CODE HAS BEEN UPDATED AND IMPROVED, SEE MY POST ON 12/11/13 I've updated the code quite a bit since my 29/10/12 post. There are a few more options able to be selected, and I ultimately decided to get rid of the colourmatchblack/white but go with a threshold routine which is nearly as fast. I'd still consider it a 'beta' (so I've left much of the debugging code in but commented out) The instructions/description etc are at the top of the OCR file. Cheers David OCR.au3_PixelGetColor.au3 Edited November 12, 2013 by dgm5555 Link to comment Share on other sites More sharing options...
Wisenlucky Posted December 19, 2012 Share Posted December 19, 2012 Hello everyone. I don't know if it's the right place to ask, but: Is there any difference between mouseocr() and _OCR function options by default? I teached the script characters I want to recognize, mouseocr() function recognizes the characters 100%. But _OCR(same area) cannot recognize the text line. And by the way thanks for a great software! Link to comment Share on other sites More sharing options...
dmob Posted January 15, 2013 Share Posted January 15, 2013 @Wisenlucky I had the same problem. Try playing around with $searchColourVariation (sixth option) of the _OCR function. Compare with mouseOCR. Kudos to civilcalc, great code, works very well for me. Link to comment Share on other sites More sharing options...
dgm5555 Posted November 12, 2013 Share Posted November 12, 2013 (edited) Over the past year of use, I've made a few mods (but not much), and core code is unchanged. I discovered that the function got slower over time, and was cursing windows until I realised my font data file had been bloated horrifically with logging of unrecognised characters. I thus added a cleaner function to remove them. I've also slightly altered the default options, with defaults for initial training (when logging is useful) and another set once usage is stable (when it probably isn't) which also stops asking for user input. I thus thought an update was in order. I'll claim much of the kudos offerred by dmob, as civilcalc's code and subsequent discussion was only a stub, but did give me an idea of how to start with screen OCR (which is why my post appeared under hers in the first place - I was hoping for a bit of constructive input, but it never eventuated). Hopefully the 700 downloads so far of my code mean at least someone other than me is using it. Cheers David OCR.au3_PixelGetColor.au3 Edited November 12, 2013 by dgm5555 Synapsee, lnuxunl, KrinnyAit and 1 other 4 Link to comment Share on other sites More sharing options...
JotaPx Posted February 19, 2014 Share Posted February 19, 2014 Hi there! How do us noobs install this wonder? Best regards Link to comment Share on other sites More sharing options...
silvercat Posted May 7, 2014 Share Posted May 7, 2014 This OCR works exactly like I need it to, until I move some windows around.. The script I've pasted still reads the absolute screen values of 321, 231 to 512, 242, not the pixels relative to my window (as I'd expect). Opt("MouseCoordMode", 0) Opt("PixelCoordMode", 0) HotKeySet("{F7}", "GetString") Func GetString() $string = _OCR(321, 231, 512, 242, 0xFFFFFF) MsgBox(0, "", $string) EndFunc I'm assuming this has to do with the _PixelGetColor.au3 script that isn't compatible with this option (or overrides it?). Any idea how to fix this? Link to comment Share on other sites More sharing options...
georgia4ver Posted July 7, 2015 Share Posted July 7, 2015 Hello dgm5555 i am not sure i you are using this webpage already but there is some things i want to ask you please if u can help me write me on email cotnechimakadze@Gmail.com Link to comment Share on other sites More sharing options...
Valnurat Posted August 4, 2015 Share Posted August 4, 2015 Does this Work?How do I use this? Yours sincerely Kenneth. Link to comment Share on other sites More sharing options...
JohnOne Posted August 4, 2015 Share Posted August 4, 2015 You read the thread from the beginning. AutoIt Absolute Beginners Require a serial Pause Script Video Tutorials by Morthawt ipify Monkey's are, like, natures humans. Link to comment Share on other sites More sharing options...
lnuxunl Posted October 22, 2015 Share Posted October 22, 2015 The best native Autoit OCR ever usedThanxI use it in some software & credits to you dgm5555 Link to comment Share on other sites More sharing options...
fosil Posted October 13, 2016 Share Posted October 13, 2016 When I'm trying this i keep getting "0" as my output. What am I doing wrong? Link to comment Share on other sites More sharing options...
fosil Posted October 13, 2016 Share Posted October 13, 2016 42 minutes ago, fosil said: When I'm trying this i keep getting "0" as my output. What am I doing wrong? Nevermind I was using an older build from a previous post. But now when I try it i get a GUI popping up that just loops through all letters of the alphabet over and over. Anyone can help with the fix? Link to comment Share on other sites More sharing options...
MaximusCZ Posted January 25, 2017 Share Posted January 25, 2017 Yo dgm5555, What I found here is eeeeexcelent piece of code that fits perfectly my needs. I tried Tesseract, Sikuli, and searched for several days to get recognition rate up but never succeded, this works out of the box with 100% accuracy (due to inbuilt training). THANK YOU SO MUCH, give me your paypal address so I can buy you a pack of beers! PS: When I run OCR with $ocrOptDontSaveWindowWithFont, program fails because its only setting $saveWindowForFont in case its not set. Adding $saveWindowForFont = "" just before If NOT BitAND($ocrOptions, $ocrOptDontSaveWindowWithFont) Then $saveWindowForFont = @TAB & WinGetTitle("[ACTIVE]") works nicely dmob 1 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now