Jump to content

Recommended Posts

Posted

i am getting an unterminated string error. This is from Console:

C:\Users\drreh\Desktop\autoit\UWP_test.au3 (152) : ==> Unterminated string.: 
<meta name="optimizely-datafile" content="{&quot;version&quot;: &quot;4&quot;, &quot;rollouts&quot;: [], &quot;typedAudiences&quot;: [],
........
&quot;anonymizeIP&quot;: true, &quot;projectId&quot;: &quot;16737760170&quot;, &quot;variables&quot;: [], &quot;featureFlags&quot;: [], &quot;experiments&quot;: [{&quot;status&quot;: &quot;Running&quot;, &quot;audienceIds&quot;: [], &quot;variations&quot;: [{&quot;variables&quot;: [], uot;, &quot;key&quot;: &qu

I added the "........." in middle to abbreviate

Posted
Posted

omg. I have tried scanning this image with Tesseract with no luck. Now I am using UWPOCR. the 44 does NOT want to be recognized.

The sample image is attached. I have it open in Windows Snipping tool. any ideas?

#include <GUIConstantsEx.au3>
#include <ScreenCapture.au3>
#include <UWPOCR.au3>
WinActivate ("Snipping")
sleep(400)
    _GDIPlus_Startup()
    Local $hHBmp = _ScreenCapture_Capture("", 0, 0, 500, 500)
    Local $hBitmap = _GDIPlus_BitmapCreateFromHBITMAP($hHBmp)
    _WinAPI_DeleteObject($hHBmp)
    Local $sOCRTextResult = _UWPOCR_GetText($hBitmap, Default, True)
ConsoleWrite ($sOCRTextResult)
     Local $hGUI = GUICreate("GDI+ test", 800, 800, -1, -1)
    GUISetState()

    Local $hGraphics = _GDIPlus_GraphicsCreateFromHWND($hGUI)
    _GDIPlus_GraphicsDrawImage($hGraphics, $hBitmap, 0, 0)

    While 1
        Switch GUIGetMsg()
            Case $GUI_EVENT_CLOSE
                ExitLoop
        EndSwitch
    WEnd

    ;cleanup resources
    _GDIPlus_GraphicsDispose($hGraphics)
    _GDIPlus_BitmapDispose($hBitmap)
    _GDIPlus_Shutdown()
    GUIDelete($hGUI)

 

tesseract problem.PNG

Posted

Danyfirex,

Great job as usual ! 👌

Now we can trash the old fashioned COM OCR library

https://support.microsoft.com/en-us/topic/install-modi-for-use-with-microsoft-office-2010-4fbd3076-6d01-9cb7-c574-3bbabc9eead9

Which I still use daily as an integrated component of of Greenshot...

https://audministrator.wordpress.com/2017/08/07/greenshot-adding-ocr/ 

Thanks for sharing !!

Posted

Updated. Added new function to get the Bounding Rect of the words.

 

Saludos

Posted

Hi,

 

strange, but this looks a bit wrong for me.

image.png.828db7642dc39a992b432112ef57e42a.png

 

Programming today is a race between software engineers striving to
build bigger and better idiot-proof programs, and the Universe
trying to produce bigger and better idiots.
So far, the Universe is winning.

Posted (edited)
  On 2/7/2022 at 6:35 AM, funkey said:

image.png.828db7642dc39a992b432112ef57e42a.png

 

Expand  

@funkey is this your own demo ? Or is this part of  @Danyfirex UDF ?
 

Edited by mLipok

Signature beginning:
Please remember: "AutoIt"..... *  Wondering who uses AutoIt and what it can be used for ? * Forum Rules *
ADO.au3 UDF * POP3.au3 UDF * XML.au3 UDF * IE on Windows 11 * How to ask ChatGPT for AutoIt Codefor other useful stuff click the following button:

  Reveal hidden contents

Signature last update: 2023-04-24

Posted

I have nothing changed. I just used the files from GitHub.

Maybe it is because german language?

Programming today is a race between software engineers striving to
build bigger and better idiot-proof programs, and the Universe
trying to produce bigger and better idiots.
So far, the Universe is winning.

Posted

Oh I see . I was not aware that  Examples/04 - Get OCR Words To 2DArray.au3 was added.

Signature beginning:
Please remember: "AutoIt"..... *  Wondering who uses AutoIt and what it can be used for ? * Forum Rules *
ADO.au3 UDF * POP3.au3 UDF * XML.au3 UDF * IE on Windows 11 * How to ask ChatGPT for AutoIt Codefor other useful stuff click the following button:

  Reveal hidden contents

Signature last update: 2023-04-24

Posted

Check system settings: zoom and smoothing.

Do they affect the action?

Signature beginning:
Please remember: "AutoIt"..... *  Wondering who uses AutoIt and what it can be used for ? * Forum Rules *
ADO.au3 UDF * POP3.au3 UDF * XML.au3 UDF * IE on Windows 11 * How to ask ChatGPT for AutoIt Codefor other useful stuff click the following button:

  Reveal hidden contents

Signature last update: 2023-04-24

Posted

@funkey I don't know what could be the issue. try with some German text and check if you get correct bounding box.

 

Saludos

Posted

Hello, I tried a bit and then I changed the font and it works as expected. So it sems there is a problem with 'Comic Sans MS' on my PC. Most other 'easy readable' fonts work well with the bounding rects.

Thanks for your great UDF!!

Programming today is a race between software engineers striving to
build bigger and better idiot-proof programs, and the Universe
trying to produce bigger and better idiots.
So far, the Universe is winning.

Posted
Posted (edited)

Hi @Danyfirex, I tried a bit with your fantastic UDF and wanted to use the angle. But it seems not to work.

I only get values like this: '1.37102911403396e-311'

 

I have no clue, what is going wrong, but I saw this can be a nullable double value. I don't know how to handle this. Maybe this is not a problem as yo declare $iAngle to 0, and if TextAngle is called with null in angle then it will not be assigned.

 

BTW: To keep the angle in @extended your wrapper functions should look like this:

Func _UWPOCR_GetText($sImageFilePathOrhBitmap, $sLanguageTagToUse = Default, $bUseOcrLine = False)
    Local $oErrorHandler = ObjEvent("AutoIt.Error", __UWPOCR_ErrorHandler)
    #forceref $oErrorHandler
    _UWPOCR_Log("_UWPOCR_GetText")
    Local $sRes = __UWPOCR_GetText($sImageFilePathOrhBitmap, $sLanguageTagToUse, $bUseOcrLine)
    Return SetError(@error, @extended, $sRes)
EndFunc   ;==>_UWPOCR_GetText

Func _UWPOCR_GetWordsRectTo2DArray($sImageFilePathOrhBitmap, $sLanguageTagToUse = Default)
    Local $oErrorHandler = ObjEvent("AutoIt.Error", "__UWPOCR_ErrorHandler")
    _UWPOCR_Log("_UWPOCR_GetWordsRectTo2DArray")
    #forceref $oErrorHandler
    Local $aRes = __UWPOCR_GetText($sImageFilePathOrhBitmap, $sLanguageTagToUse, False, True)
    Return SetError(@error, @extended, $aRes)
EndFunc   ;==>_UWPOCR_GetWordsRectTo2DArray

 

Edit: Forgot to say that @extended only can hold Int32 values, so you loose precision, when using it for the angle.

Edited by funkey

Programming today is a race between software engineers striving to
build bigger and better idiot-proof programs, and the Universe
trying to produce bigger and better idiots.
So far, the Universe is winning.

  • 1 month later...
Posted

I'm trying to use UWPOCR and I'm getting some odd results:

1) The phrase "XML" or "xml" is being ignored by the OCR as is "PDF".  It's almost like "hide known file extensions" is somehow happening.  I have it off in WindowsExplorer but maybe there's a more global setting?

2) A double underline "__" is causing a line break or something.  If I OCR a list of phrases containing that string, the returned text is jumbled.

I've attached a sample jpg and the resulting text.  I've put an Explorer window just below my target data so you an see the text is otherwise list.txtrecognized.

$imagefilepath1="./scrape.jpg"
$list = _UWPOCR_GetText($imagefilepath1)

 

I've toyed with all sorts of area values to eliminate the vertical line artifact, change size, etc. with nothing working.  Elsewhere I suffer from the "__" issue but I might be able to work around that if I can get "XML" to translate.

I'm hoping there's some parameter buried in the script that allows some configuration options?

scrape.jpg

Posted

I've tried adjusting brightness/contrast with no improvement.

I created a Notepad image that's roughly the same and it translates fine, so there's something more diabilical.  The file extentions issue might be a red herring.

Posted

It may need some image processing I think. Some threshold,medianBlur,dilate etc.

 

Saludos

Posted (edited)

I did a Gimp2 "threshold" and messed with the brightness/contrast with no improvement.  I'll try some more.

Tried scaling as well.  The string "Autoit" is a lot thinner and dimmer and it is seen.

Edited by jimg
  • 3 weeks later...
Posted

Farsi  (Right to Left) language tag is "fa" and I give it a png file with Farsi text with normal font. It returns nothing. But English works perfect. Any idea to fix it?

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...