codecog2578 Posted November 3, 2023 Posted November 3, 2023 Good Morning, I'm hoping that someone with more experience using Tesseract can assist me with some challenges I'm facing. I'm new to AutoIt and OCRs in general but I'm a programmer by trade so I'm not new to the concepts used, generally speaking. My current issue I'm facing is that I'm really struggling to get accurate text extraction out of a program that I'm working on building automation for. I'm locked to this AutoIt/OCR approach for a variety of boring reasons that aren't really relevant here. Here is my current code - it's currently messy as I've been experimenting with a variety of approaches but this should shed some insight into what I've attempted thus far. Also sorry for the formatting, it went super wonky when I pasted it in here. expandcollapse popup#include <WinAPIProc.au3> #include <WinAPIRes.au3> #include <WinAPISys.au3> #include <Array.au3> #include <StringConstants.au3> #include <ScreenCapture.au3> #include <GDIPlus.au3> Global $oAutoIt = ObjCreate("AutoItX3.Control") Global $titleToFind = "C:\Windows\system32\cmd.exe" Global $outputFile = @ScriptDir & "\output.txt" Global $tesseractPath = "C:\Program Files\Tesseract-OCR\tesseract.exe" Global $errorLogFile = @ScriptDir & "\error_log.txt" Global $screenshotPath = @ScriptDir & "\screenshot.png" Func ConvertToGrayscale($inputFilePath, $outputFilePath) _GDIPlus_Startup() Local $hBitmap = _GDIPlus_BitmapCreateFromFile($inputFilePath) If @error Then _GDIPlus_Shutdown() Return False EndIf Local $hImageAttributes = _GDIPlus_ImageAttributesCreate() Local $tColorMatrix = _GDIPlus_ColorMatrixCreateGrayScale() _GDIPlus_ImageAttributesSetColorMatrix($hImageAttributes, 0, True, $tColorMatrix) Local $iWidth = _GDIPlus_ImageGetWidth($hBitmap) Local $iHeight = _GDIPlus_ImageGetHeight($hBitmap) Local $hBitmapGray = _GDIPlus_BitmapCreateFromScan0($iWidth, $iHeight) Local $hGraphic = _GDIPlus_ImageGetGraphicsContext($hBitmapGray) _GDIPlus_GraphicsDrawImageRectRect($hGraphic, $hBitmap, 0, 0, $iWidth, $iHeight, 0, 0, $iWidth, $iHeight, $hImageAttributes) _GDIPlus_ImageSaveToFile($hBitmapGray, $outputFilePath) _GDIPlus_BitmapDispose($hBitmap) _GDIPlus_BitmapDispose($hBitmapGray) _GDIPlus_GraphicsDispose($hGraphic) _GDIPlus_ImageAttributesDispose($hImageAttributes) _GDIPlus_Shutdown() Return True EndFunc Func BinarizeImage($inputFilePath, $outputFilePath) _GDIPlus_Startup() Local $dpi = 300 Local $hBitmap = _GDIPlus_BitmapCreateFromFile($inputFilePath) If @error Then _GDIPlus_Shutdown() Return False EndIf Local $iWidth = _GDIPlus_ImageGetWidth($hBitmap) Local $iHeight = _GDIPlus_ImageGetHeight($hBitmap) _GDIPlus_BitmapSetResolution($hBitmap, $dpi, $dpi) Local $hGraphic = _GDIPlus_ImageGetGraphicsContext($hBitmap) Local $hBitmapBW = _GDIPlus_BitmapCreateFromScan0($iWidth, $iHeight) Local $hGraphicBW = _GDIPlus_ImageGetGraphicsContext($hBitmapBW) _GDIPlus_GraphicsDrawImageRectRect($hGraphicBW, $hBitmap, 0, 0, $iWidth, $iHeight, 0, 0, $iWidth, $iHeight) Local $hBitmapBW2 = _GDIPlus_BitmapCreateFromScan0($iWidth, $iHeight, $GDIP_PXF01INDEXED) Local $hGraphicBW2 = _GDIPlus_ImageGetGraphicsContext($hBitmapBW2) _GDIPlus_GraphicsDrawImageRectRect($hGraphicBW2, $hBitmap, 0, 0, $iWidth, $iHeight, 0, 0, $iWidth, $iHeight) _GDIPlus_ImageSaveToFile($hBitmapBW2, $outputFilePath) _GDIPlus_BitmapDispose($hBitmap) _GDIPlus_BitmapDispose($hBitmapBW) _GDIPlus_BitmapDispose($hBitmapBW2) _GDIPlus_GraphicsDispose($hGraphic) _GDIPlus_GraphicsDispose($hGraphicBW) _GDIPlus_GraphicsDispose($hGraphicBW2) _GDIPlus_Shutdown() Return True EndFunc Func FindActiveCmdWindow() Local $aWindowsList = WinList() For $i = 1 To $aWindowsList[0][0] If StringInStr($aWindowsList[$i][0], $titleToFind) Then If _WinAPI_IsWindowVisible($aWindowsList[$i][1]) Then Return $aWindowsList[$i][1] EndIf EndIf Next Return 0 EndFunc Func ScrapeCmdWindow() Local $hCmdWindow = FindActiveCmdWindow() If $hCmdWindow Then Local $aWinPos = WinGetPos($hCmdWindow) Local $x = $aWinPos[0] Local $y = $aWinPos[1] Local $width = $aWinPos[2] Local $height = $aWinPos[3] Local $screenshotPath = @ScriptDir & "\screenshot.png" _ScreenCapture_Capture($screenshotPath, $x, $y, $x + $width, $y + $height) Local $grayscalePath = @ScriptDir & "\grayscale_screenshot.png" If ConvertToGrayscale($screenshotPath, $grayscalePath) Then ConsoleWrite("Grayscale screenshot saved to: " & $grayscalePath & @CRLF) $imagemagickCommand = '"' & "C:\Program Files\ImageMagick-7.1.1-Q16-HDRI\convert" & '" "' & $grayscalePath & '" -density 300 -contrast - normalize -resize 300% "' & @ScriptDir & '\grayscale_screenshot.png"' ConsoleWrite("Running ImageMagick with command: " & $imagemagickCommand & @CRLF) $exitCode = RunWait($imagemagickCommand, @ScriptDir, @SW_HIDE) If $exitCode = 0 Then ConsoleWrite("ImageMagick processing completed." & @CRLF) Else WriteErrorLog("ImageMagick processing encountered an error. Exit code: " & $exitCode) Local $imagemagickStdErr = StdoutRead($STDERR_CHILD) WriteErrorLog("ImageMagick Error Output: " & $imagemagickStdErr) MsgBox($MB_ICONERROR, "Error", "ImageMagick processing failed.") EndIf ; End ImageMagick processing ; Continue with Tesseract OCR as before Local $binarizedPath = @ScriptDir & "\binarized_screenshot.png" If BinarizeImage($grayscalePath, $binarizedPath) Then ConsoleWrite("Binarized screenshot saved to: " & $binarizedPath & @CRLF) Local $ocrOutput = "" Local $tesseractCommand = '"' & $tesseractPath & '" "' & $grayscalePath & '" "' & @ScriptDir & '\ocr_output" --psm 4' ConsoleWrite("Running Tesseract with command: " & $tesseractCommand & @CRLF) Local $exitCode = RunWait($tesseractCommand, @ScriptDir, @SW_HIDE, $STDERR_CHILD) If $exitCode = 0 Then If FileExists(@ScriptDir & "\ocr_output.txt") Then $ocrOutput = FileRead(@ScriptDir & "\ocr_output.txt") FileDelete(@ScriptDir & "\ocr_output.txt") EndIf Else WriteErrorLog("Tesseract OCR encountered an error. Exit code: " & $exitCode) Local $tesseractStdErr = StdoutRead($STDERR_CHILD) WriteErrorLog("Tesseract Error Output: " & $tesseractStdErr) EndIf Clean up and delete the original, grayscale, and binarized screenshot files If FileExists($screenshotPath) Then FileDelete($screenshotPath) EndIf If FileExists($grayscalePath) Then FileDelete($grayscalePath) EndIf If FileExists($binarizedPath) Then FileDelete($binarizedPath) EndIf If FileExists($outputFile) Then FileDelete($outputFile) EndIf FileWrite($outputFile, $ocrOutput) MsgBox($MB_ICONINFORMATION, "Success", "Content saved to " & $outputFile) Else WriteErrorLog("Binarization failed.") MsgBox($MB_ICONERROR, "Error", "Binarization failed.") EndIf Else WriteErrorLog("Grayscale conversion failed.") MsgBox($MB_ICONERROR, "Error", "Grayscale conversion failed.") EndIf Else WriteErrorLog("No active cmd window found.") MsgBox($MB_ICONERROR, "Error", "No active cmd window found.") EndIf EndFunc Func WriteErrorLog($message) Local $errorFile = FileOpen($errorLogFile, 1) If $errorFile = -1 Then MsgBox($MB_ICONERROR, "Error", "Unable to open the error log file.") Exit EndIf FileWriteLine($errorFile, @YEAR & "/" & @MON & "/" & @MDAY & " " & @HOUR & ":" & @MIN & ":" & @SEC & " - " & $message) FileClose($errorFile) EndFunc ScrapeCmdWindow() I've tried a variety of preprocessing techniques but I think a big part of the issue is the original color scheme of the window I'm trying to extract stuff from. Please see the attached examples of the original window as well as when I send it grayscale. Ignore the blocked out bits, that's just for protecting private data. In the greyscale the green arrow is indicating the body text which I can extract reasonably well, but the red arrows indicate headers that Tesseract absolutely butchers. I can't binarize it or I lose the headers entirely as the header text is the same color as the background. Additionally the vertical and horizontal lines are often interpreted by Tesseract as random characters and my attempts to remove them thus far have been unsuccessful. So, the heart of my question is this...for all you Tesseract experts, how would YOU go about cleaning up the original horrid blue UI to a format that Tesseract can parse reasonably accurately? It's not critical that it's 100% accurate - I mostly need it to reliably extract the headers so my automations for this program know which sub menu the end user is sitting on. I'm locked to using an OCR due to this being a remote application running inside of a cmd window within another remote environment. It's a messy setup and thus far I'm sitting at probably 50% accuracy on pulling things out. I've been playing with ImageMagick as well to work on preprocessing the image to a point where it's easy for Tesseract to deal with but thus far no luck increasing the accuracy in any notable way. If someone out there could assist me that'd be appreciated! I don't expect code but I'm a bit baffled on how to go about processing the original image so Tesseract can extract everything I need; so I'm looking for some insight and direction. Thank you!
BigDaddyO Posted November 3, 2023 Posted November 3, 2023 I've had luck in the past with identifying the background color then using GDI+ to remove that color from the entire image... BUT! it looks like your background color is the same color as your actual text you are looking for so that's not going to help. I know you said you have to use OCR, but If all your looking for is text off the screen somewhere, have you tried using the Cmd options to select all/copy and then look in the clipboard text for the possible header text you want? or, see if the Au3Info object spy can read the text from the window so you can use WinGetText("C:\Windows\system32\cmd.exe", "")
ioa747 Posted November 4, 2023 Posted November 4, 2023 first of all i will agree with BigDaddyO, if you can select all/copy text in the clipboard and then look what you want, the result will be more accurate (especially when we have to deal with numbers) If not, then First make a copy from cmd shortcut in your @ScriptDir Then right-click and select Properties In the shortcut tab adjust the target to call your program Then In the Font tab adjust the font, to Lucida Console (where the zeros have no bisection) Then In the Colors tab adjust the Colors Then, since all of that is working, and run some tests Maybe it's better, instead of reading them all together, and then extracting the result, let him read them one by one. E.G. you don't need to read the word DISALLOWED, but give him the coordinates from where he will read his value good luck I know that I know nothing
codecog2578 Posted November 6, 2023 Author Posted November 6, 2023 Hey guys! Thanks for the replies thus far. I should have mentioned this in the original post but part of what makes this intensely frustrating is that what you see in the GUI I posted is actually NOT just a standard cmd window. It's an entire remote machine with the view truncated to show JUST the program I showed screencaps of - and the kicker is I can't interface with that remote environment where the program lives beyond what I'm seeing in the pseudo cmd window. So there's no way I've discovered to scrape information out of the GUI with any sort of copying or reading like I'd usually do, hence the struggle with the OCR approach...and then as @BigDaddyO pointed the color modification is troublesome too due to the header text being the same color as the background text. So from here I'm uncertain where to go, really. The text extraction with any of the pre processing I've done is just far too erratic to be useful and the third party that owns this program I'm attempting to automate won't provide me with any meaningful access to read things in a way that, you know, would make sense.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now