Jump to content

UWPOCR - Windows Platform Optical character recognition API Implementation


Danyfirex
 Share

Recommended Posts

i am getting an unterminated string error. This is from Console:

C:\Users\drreh\Desktop\autoit\UWP_test.au3 (152) : ==> Unterminated string.: 
<meta name="optimizely-datafile" content="{&quot;version&quot;: &quot;4&quot;, &quot;rollouts&quot;: [], &quot;typedAudiences&quot;: [],
........
&quot;anonymizeIP&quot;: true, &quot;projectId&quot;: &quot;16737760170&quot;, &quot;variables&quot;: [], &quot;featureFlags&quot;: [], &quot;experiments&quot;: [{&quot;status&quot;: &quot;Running&quot;, &quot;audienceIds&quot;: [], &quot;variations&quot;: [{&quot;variables&quot;: [], uot;, &quot;key&quot;: &qu

I added the "........." in middle to abbreviate

Link to comment
Share on other sites

Link to comment
Share on other sites

omg. I have tried scanning this image with Tesseract with no luck. Now I am using UWPOCR. the 44 does NOT want to be recognized.

The sample image is attached. I have it open in Windows Snipping tool. any ideas?

#include <GUIConstantsEx.au3>
#include <ScreenCapture.au3>
#include <UWPOCR.au3>
WinActivate ("Snipping")
sleep(400)
    _GDIPlus_Startup()
    Local $hHBmp = _ScreenCapture_Capture("", 0, 0, 500, 500)
    Local $hBitmap = _GDIPlus_BitmapCreateFromHBITMAP($hHBmp)
    _WinAPI_DeleteObject($hHBmp)
    Local $sOCRTextResult = _UWPOCR_GetText($hBitmap, Default, True)
ConsoleWrite ($sOCRTextResult)
     Local $hGUI = GUICreate("GDI+ test", 800, 800, -1, -1)
    GUISetState()

    Local $hGraphics = _GDIPlus_GraphicsCreateFromHWND($hGUI)
    _GDIPlus_GraphicsDrawImage($hGraphics, $hBitmap, 0, 0)

    While 1
        Switch GUIGetMsg()
            Case $GUI_EVENT_CLOSE
                ExitLoop
        EndSwitch
    WEnd

    ;cleanup resources
    _GDIPlus_GraphicsDispose($hGraphics)
    _GDIPlus_BitmapDispose($hBitmap)
    _GDIPlus_Shutdown()
    GUIDelete($hGUI)

 

tesseract problem.PNG

Link to comment
Share on other sites

Danyfirex,

Great job as usual ! 👌

Now we can trash the old fashioned COM OCR library

https://support.microsoft.com/en-us/topic/install-modi-for-use-with-microsoft-office-2010-4fbd3076-6d01-9cb7-c574-3bbabc9eead9

Which I still use daily as an integrated component of of Greenshot...

https://audministrator.wordpress.com/2017/08/07/greenshot-adding-ocr/ 

Thanks for sharing !!

Link to comment
Share on other sites

Updated. Added new function to get the Bounding Rect of the words.

 

Saludos

Link to comment
Share on other sites

Hi,

 

strange, but this looks a bit wrong for me.

image.png.828db7642dc39a992b432112ef57e42a.png

 

Programming today is a race between software engineers striving to
build bigger and better idiot-proof programs, and the Universe
trying to produce bigger and better idiots.
So far, the Universe is winning.

Link to comment
Share on other sites

1 hour ago, funkey said:

image.png.828db7642dc39a992b432112ef57e42a.png

 

@funkey is this your own demo ? Or is this part of  @Danyfirex UDF ?
 

Edited by mLipok

Signature beginning:
Please remember: "AutoIt"..... *  Wondering who uses AutoIt and what it can be used for ? * Forum Rules *
ADO.au3 UDF * POP3.au3 UDF * XML.au3 UDF * IE on Windows 11 * How to ask ChatGPT for AutoIt Codefor other useful stuff click the following button:

Spoiler

Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind. 

My contribution (my own projects): * Debenu Quick PDF Library - UDF * Debenu PDF Viewer SDK - UDF * Acrobat Reader - ActiveX Viewer * UDF for PDFCreator v1.x.x * XZip - UDF * AppCompatFlags UDF * CrowdinAPI UDF * _WinMergeCompare2Files() * _JavaExceptionAdd() * _IsBeta() * Writing DPI Awareness App - workaround * _AutoIt_RequiredVersion() * Chilkatsoft.au3 UDF * TeamViewer.au3 UDF * JavaManagement UDF * VIES over SOAP * WinSCP UDF * GHAPI UDF - modest begining - comunication with GitHub REST APIErrorLog.au3 UDF - A logging Library * Include Dependency Tree (Tool for analyzing script relations) * Show_Macro_Values.au3 *

 

My contribution to others projects or UDF based on  others projects: * _sql.au3 UDF  * POP3.au3 UDF *  RTF Printer - UDF * XML.au3 UDF * ADO.au3 UDF SMTP Mailer UDF * Dual Monitor resolution detection * * 2GUI on Dual Monitor System * _SciLexer.au3 UDF * SciTE - Lexer for console pane

Useful links: * Forum Rules * Forum etiquette *  Forum Information and FAQs * How to post code on the forum * AutoIt Online Documentation * AutoIt Online Beta Documentation * SciTE4AutoIt3 getting started * Convert text blocks to AutoIt code * Games made in Autoit * Programming related sites * Polish AutoIt Tutorial * DllCall Code Generator * 

Wiki: Expand your knowledge - AutoIt Wiki * Collection of User Defined Functions * How to use HelpFile * Good coding practices in AutoIt * 

OpenOffice/LibreOffice/XLS Related: WriterDemo.au3 * XLS/MDB from scratch with ADOX

IE Related:  * How to use IE.au3  UDF with  AutoIt v3.3.14.x * Why isn't Autoit able to click a Javascript Dialog? * Clicking javascript button with no ID * IE document >> save as MHT file * IETab Switcher (by LarsJ ) * HTML Entities * _IEquerySelectorAll() (by uncommon) * IE in TaskSchedulerIE Embedded Control Versioning (use IE9+ and HTML5 in a GUI) * PDF Related:How to get reference to PDF object embeded in IE * IE on Windows 11

I encourage you to read: * Global Vars * Best Coding Practices * Please explain code used in Help file for several File functions * OOP-like approach in AutoIt * UDF-Spec Questions *  EXAMPLE: How To Catch ConsoleWrite() output to a file or to CMD *

I also encourage you to check awesome @trancexx code:  * Create COM objects from modules without any demand on user to register anything. * Another COM object registering stuffOnHungApp handlerAvoid "AutoIt Error" message box in unknown errors  * HTML editor

winhttp.au3 related : * https://www.autoitscript.com/forum/topic/206771-winhttpau3-download-problem-youre-speaking-plain-http-to-an-ssl-enabled-server-port/

"Homo sum; humani nil a me alienum puto" - Publius Terentius Afer
"Program are meant to be read by humans and only incidentally for computers and execute" - Donald Knuth, "The Art of Computer Programming"
:naughty:  :ranting:, be  :) and       \\//_.

Anticipating Errors :  "Any program that accepts data from a user must include code to validate that data before sending it to the data store. You cannot rely on the data store, ...., or even your programming language to notify you of problems. You must check every byte entered by your users, making sure that data is the correct type for its field and that required fields are not empty."

Signature last update: 2023-04-24

Link to comment
Share on other sites

I have nothing changed. I just used the files from GitHub.

Maybe it is because german language?

Programming today is a race between software engineers striving to
build bigger and better idiot-proof programs, and the Universe
trying to produce bigger and better idiots.
So far, the Universe is winning.

Link to comment
Share on other sites

Oh I see . I was not aware that  Examples/04 - Get OCR Words To 2DArray.au3 was added.

Signature beginning:
Please remember: "AutoIt"..... *  Wondering who uses AutoIt and what it can be used for ? * Forum Rules *
ADO.au3 UDF * POP3.au3 UDF * XML.au3 UDF * IE on Windows 11 * How to ask ChatGPT for AutoIt Codefor other useful stuff click the following button:

Spoiler

Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind. 

My contribution (my own projects): * Debenu Quick PDF Library - UDF * Debenu PDF Viewer SDK - UDF * Acrobat Reader - ActiveX Viewer * UDF for PDFCreator v1.x.x * XZip - UDF * AppCompatFlags UDF * CrowdinAPI UDF * _WinMergeCompare2Files() * _JavaExceptionAdd() * _IsBeta() * Writing DPI Awareness App - workaround * _AutoIt_RequiredVersion() * Chilkatsoft.au3 UDF * TeamViewer.au3 UDF * JavaManagement UDF * VIES over SOAP * WinSCP UDF * GHAPI UDF - modest begining - comunication with GitHub REST APIErrorLog.au3 UDF - A logging Library * Include Dependency Tree (Tool for analyzing script relations) * Show_Macro_Values.au3 *

 

My contribution to others projects or UDF based on  others projects: * _sql.au3 UDF  * POP3.au3 UDF *  RTF Printer - UDF * XML.au3 UDF * ADO.au3 UDF SMTP Mailer UDF * Dual Monitor resolution detection * * 2GUI on Dual Monitor System * _SciLexer.au3 UDF * SciTE - Lexer for console pane

Useful links: * Forum Rules * Forum etiquette *  Forum Information and FAQs * How to post code on the forum * AutoIt Online Documentation * AutoIt Online Beta Documentation * SciTE4AutoIt3 getting started * Convert text blocks to AutoIt code * Games made in Autoit * Programming related sites * Polish AutoIt Tutorial * DllCall Code Generator * 

Wiki: Expand your knowledge - AutoIt Wiki * Collection of User Defined Functions * How to use HelpFile * Good coding practices in AutoIt * 

OpenOffice/LibreOffice/XLS Related: WriterDemo.au3 * XLS/MDB from scratch with ADOX

IE Related:  * How to use IE.au3  UDF with  AutoIt v3.3.14.x * Why isn't Autoit able to click a Javascript Dialog? * Clicking javascript button with no ID * IE document >> save as MHT file * IETab Switcher (by LarsJ ) * HTML Entities * _IEquerySelectorAll() (by uncommon) * IE in TaskSchedulerIE Embedded Control Versioning (use IE9+ and HTML5 in a GUI) * PDF Related:How to get reference to PDF object embeded in IE * IE on Windows 11

I encourage you to read: * Global Vars * Best Coding Practices * Please explain code used in Help file for several File functions * OOP-like approach in AutoIt * UDF-Spec Questions *  EXAMPLE: How To Catch ConsoleWrite() output to a file or to CMD *

I also encourage you to check awesome @trancexx code:  * Create COM objects from modules without any demand on user to register anything. * Another COM object registering stuffOnHungApp handlerAvoid "AutoIt Error" message box in unknown errors  * HTML editor

winhttp.au3 related : * https://www.autoitscript.com/forum/topic/206771-winhttpau3-download-problem-youre-speaking-plain-http-to-an-ssl-enabled-server-port/

"Homo sum; humani nil a me alienum puto" - Publius Terentius Afer
"Program are meant to be read by humans and only incidentally for computers and execute" - Donald Knuth, "The Art of Computer Programming"
:naughty:  :ranting:, be  :) and       \\//_.

Anticipating Errors :  "Any program that accepts data from a user must include code to validate that data before sending it to the data store. You cannot rely on the data store, ...., or even your programming language to notify you of problems. You must check every byte entered by your users, making sure that data is the correct type for its field and that required fields are not empty."

Signature last update: 2023-04-24

Link to comment
Share on other sites

Check system settings: zoom and smoothing.

Do they affect the action?

Signature beginning:
Please remember: "AutoIt"..... *  Wondering who uses AutoIt and what it can be used for ? * Forum Rules *
ADO.au3 UDF * POP3.au3 UDF * XML.au3 UDF * IE on Windows 11 * How to ask ChatGPT for AutoIt Codefor other useful stuff click the following button:

Spoiler

Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind. 

My contribution (my own projects): * Debenu Quick PDF Library - UDF * Debenu PDF Viewer SDK - UDF * Acrobat Reader - ActiveX Viewer * UDF for PDFCreator v1.x.x * XZip - UDF * AppCompatFlags UDF * CrowdinAPI UDF * _WinMergeCompare2Files() * _JavaExceptionAdd() * _IsBeta() * Writing DPI Awareness App - workaround * _AutoIt_RequiredVersion() * Chilkatsoft.au3 UDF * TeamViewer.au3 UDF * JavaManagement UDF * VIES over SOAP * WinSCP UDF * GHAPI UDF - modest begining - comunication with GitHub REST APIErrorLog.au3 UDF - A logging Library * Include Dependency Tree (Tool for analyzing script relations) * Show_Macro_Values.au3 *

 

My contribution to others projects or UDF based on  others projects: * _sql.au3 UDF  * POP3.au3 UDF *  RTF Printer - UDF * XML.au3 UDF * ADO.au3 UDF SMTP Mailer UDF * Dual Monitor resolution detection * * 2GUI on Dual Monitor System * _SciLexer.au3 UDF * SciTE - Lexer for console pane

Useful links: * Forum Rules * Forum etiquette *  Forum Information and FAQs * How to post code on the forum * AutoIt Online Documentation * AutoIt Online Beta Documentation * SciTE4AutoIt3 getting started * Convert text blocks to AutoIt code * Games made in Autoit * Programming related sites * Polish AutoIt Tutorial * DllCall Code Generator * 

Wiki: Expand your knowledge - AutoIt Wiki * Collection of User Defined Functions * How to use HelpFile * Good coding practices in AutoIt * 

OpenOffice/LibreOffice/XLS Related: WriterDemo.au3 * XLS/MDB from scratch with ADOX

IE Related:  * How to use IE.au3  UDF with  AutoIt v3.3.14.x * Why isn't Autoit able to click a Javascript Dialog? * Clicking javascript button with no ID * IE document >> save as MHT file * IETab Switcher (by LarsJ ) * HTML Entities * _IEquerySelectorAll() (by uncommon) * IE in TaskSchedulerIE Embedded Control Versioning (use IE9+ and HTML5 in a GUI) * PDF Related:How to get reference to PDF object embeded in IE * IE on Windows 11

I encourage you to read: * Global Vars * Best Coding Practices * Please explain code used in Help file for several File functions * OOP-like approach in AutoIt * UDF-Spec Questions *  EXAMPLE: How To Catch ConsoleWrite() output to a file or to CMD *

I also encourage you to check awesome @trancexx code:  * Create COM objects from modules without any demand on user to register anything. * Another COM object registering stuffOnHungApp handlerAvoid "AutoIt Error" message box in unknown errors  * HTML editor

winhttp.au3 related : * https://www.autoitscript.com/forum/topic/206771-winhttpau3-download-problem-youre-speaking-plain-http-to-an-ssl-enabled-server-port/

"Homo sum; humani nil a me alienum puto" - Publius Terentius Afer
"Program are meant to be read by humans and only incidentally for computers and execute" - Donald Knuth, "The Art of Computer Programming"
:naughty:  :ranting:, be  :) and       \\//_.

Anticipating Errors :  "Any program that accepts data from a user must include code to validate that data before sending it to the data store. You cannot rely on the data store, ...., or even your programming language to notify you of problems. You must check every byte entered by your users, making sure that data is the correct type for its field and that required fields are not empty."

Signature last update: 2023-04-24

Link to comment
Share on other sites

@funkey I don't know what could be the issue. try with some German text and check if you get correct bounding box.

 

Saludos

Link to comment
Share on other sites

Hello, I tried a bit and then I changed the font and it works as expected. So it sems there is a problem with 'Comic Sans MS' on my PC. Most other 'easy readable' fonts work well with the bounding rects.

Thanks for your great UDF!!

Programming today is a race between software engineers striving to
build bigger and better idiot-proof programs, and the Universe
trying to produce bigger and better idiots.
So far, the Universe is winning.

Link to comment
Share on other sites

Link to comment
Share on other sites

Hi @Danyfirex, I tried a bit with your fantastic UDF and wanted to use the angle. But it seems not to work.

I only get values like this: '1.37102911403396e-311'

 

I have no clue, what is going wrong, but I saw this can be a nullable double value. I don't know how to handle this. Maybe this is not a problem as yo declare $iAngle to 0, and if TextAngle is called with null in angle then it will not be assigned.

 

BTW: To keep the angle in @extended your wrapper functions should look like this:

Func _UWPOCR_GetText($sImageFilePathOrhBitmap, $sLanguageTagToUse = Default, $bUseOcrLine = False)
    Local $oErrorHandler = ObjEvent("AutoIt.Error", __UWPOCR_ErrorHandler)
    #forceref $oErrorHandler
    _UWPOCR_Log("_UWPOCR_GetText")
    Local $sRes = __UWPOCR_GetText($sImageFilePathOrhBitmap, $sLanguageTagToUse, $bUseOcrLine)
    Return SetError(@error, @extended, $sRes)
EndFunc   ;==>_UWPOCR_GetText

Func _UWPOCR_GetWordsRectTo2DArray($sImageFilePathOrhBitmap, $sLanguageTagToUse = Default)
    Local $oErrorHandler = ObjEvent("AutoIt.Error", "__UWPOCR_ErrorHandler")
    _UWPOCR_Log("_UWPOCR_GetWordsRectTo2DArray")
    #forceref $oErrorHandler
    Local $aRes = __UWPOCR_GetText($sImageFilePathOrhBitmap, $sLanguageTagToUse, False, True)
    Return SetError(@error, @extended, $aRes)
EndFunc   ;==>_UWPOCR_GetWordsRectTo2DArray

 

Edit: Forgot to say that @extended only can hold Int32 values, so you loose precision, when using it for the angle.

Edited by funkey

Programming today is a race between software engineers striving to
build bigger and better idiot-proof programs, and the Universe
trying to produce bigger and better idiots.
So far, the Universe is winning.

Link to comment
Share on other sites

  • 1 month later...

I'm trying to use UWPOCR and I'm getting some odd results:

1) The phrase "XML" or "xml" is being ignored by the OCR as is "PDF".  It's almost like "hide known file extensions" is somehow happening.  I have it off in WindowsExplorer but maybe there's a more global setting?

2) A double underline "__" is causing a line break or something.  If I OCR a list of phrases containing that string, the returned text is jumbled.

I've attached a sample jpg and the resulting text.  I've put an Explorer window just below my target data so you an see the text is otherwise list.txtrecognized.

$imagefilepath1="./scrape.jpg"
$list = _UWPOCR_GetText($imagefilepath1)

 

I've toyed with all sorts of area values to eliminate the vertical line artifact, change size, etc. with nothing working.  Elsewhere I suffer from the "__" issue but I might be able to work around that if I can get "XML" to translate.

I'm hoping there's some parameter buried in the script that allows some configuration options?

scrape.jpg

Link to comment
Share on other sites

I've tried adjusting brightness/contrast with no improvement.

I created a Notepad image that's roughly the same and it translates fine, so there's something more diabilical.  The file extentions issue might be a red herring.

Link to comment
Share on other sites

It may need some image processing I think. Some threshold,medianBlur,dilate etc.

 

Saludos

Link to comment
Share on other sites

I did a Gimp2 "threshold" and messed with the brightness/contrast with no improvement.  I'll try some more.

Tried scaling as well.  The string "Autoit" is a lot thinner and dimmer and it is seen.

Edited by jimg
Link to comment
Share on other sites

  • 3 weeks later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...