ducphu Posted June 9, 2021 Posted June 9, 2021 Hi all, I'm new to Autoit and I have a question regarding on how to work with PDF file. Let say, I have a pdf file (pls see the example that I attached). I need to read the file line by line and highlight the line if a condition is met, e.g. if the score is 90 and above. Can it be achieved with autoit? Any guidance are much welcome. Thank you all. Score.pdf
Luke94 Posted June 9, 2021 Posted June 9, 2021 After a quick search, I found this. Take a look at @jguinch's reply.
Luke94 Posted June 9, 2021 Posted June 9, 2021 If you have Word, this will read the entire document and split each line into an array however the ID column is cut off. I assume it's because the PDF file "contains interactive features". #include <Array.au3> #include <Word.au3> _Func('...\Score.pdf') Func _Func($sFile) Local $oWord = _Word_Create() If @ERROR Then ConsoleWrite('Error: _Word_Create' & @CRLF) Exit EndIf Local $oDoc = _Word_DocOpen($oWord, $sFile) If @ERROR Then _Word_Quit($oDoc) ConsoleWrite('Error: _Word_DocOpen' & @CRLF) Exit EndIf Local $oRange = $oDoc.Range Local $sText = $oRange.Text ConsoleWrite($sText) Local $aLines = StringSplit($sText, @CRLF) _ArrayDisplay($aLines) _Word_Quit($oDoc) EndFunc
ducphu Posted June 10, 2021 Author Posted June 10, 2021 Hi, thanks for the reply. I come out with some general idea as below. 1. Use Xpdf to export the PDF file to text file 2. Use FileOpen to open the text file 3. Use _FileCountLines to get the number of lines 4. Loop each line, use FileReadLine to read the line 5. Use StringRegExp to check if the line matches the format. And extract the values e.g. Score value 6. Determine if extracted values meet criteria. E.g. score >= 90 7. If yes, highlight the line in PDF file (PDF comment/highlight button) 8. Save the PDF and delete the text file Anything I missed/ or wrong here? And I have concern at step 7. How to do it? If the PDF file contains pic/chart, will the line number in the text file and pdf file tally?
Luke94 Posted June 10, 2021 Posted June 10, 2021 Must it be a PDF file? There may be an easier alternative.
Luke94 Posted June 10, 2021 Posted June 10, 2021 The "input" file as in you create it and upload it/do whatever with it but it must be a PDF?
JockoDundee Posted June 10, 2021 Posted June 10, 2021 5 hours ago, ducphu said: Save the PDF and delete the text file Do the PDFs you are intending to use definitely allow editing? Can you edit them now manually? If so, using what product exactly? Code hard, but don’t hard code...
ducphu Posted June 10, 2021 Author Posted June 10, 2021 Ok, let me try to clarify. Basically I have an input file, which is PDF. Currently, user need to open the file manually, check through the records and highlight the records which has Score >= 90. The example I attached is the OUTPUT. The INPUT is the same file but without records highlighted. What I want to achieve is to get this process done automatically using Autoit. And yes, the PDF file allowed user to edit.
JockoDundee Posted June 10, 2021 Posted June 10, 2021 39 minutes ago, ducphu said: Currently, user need to open the file manually what editor does the user use to edit the file and re-save it? Code hard, but don’t hard code...
Luke94 Posted June 10, 2021 Posted June 10, 2021 Do you have access to Word & Excel? If so, does this function read all the columns for the input file? (Where no records have been highlighted) 18 hours ago, Luke94 said: If you have Word, this will read the entire document and split each line into an array however the ID column is cut off. I assume it's because the PDF file "contains interactive features". #include <Array.au3> #include <Word.au3> _Func('...\Score.pdf') Func _Func($sFile) Local $oWord = _Word_Create() If @ERROR Then ConsoleWrite('Error: _Word_Create' & @CRLF) Exit EndIf Local $oDoc = _Word_DocOpen($oWord, $sFile) If @ERROR Then _Word_Quit($oDoc) ConsoleWrite('Error: _Word_DocOpen' & @CRLF) Exit EndIf Local $oRange = $oDoc.Range Local $sText = $oRange.Text ConsoleWrite($sText) Local $aLines = StringSplit($sText, @CRLF) _ArrayDisplay($aLines) _Word_Quit($oDoc) EndFunc What I'm thinking is to read the PDF file with the above function, move it into Excel, highlight the records as requested and save as a PDF file. Might be a long-ass way about it but it would get you what you want. There will be an easier solution I would have thought, I just don't know of it.
ducphu Posted June 10, 2021 Author Posted June 10, 2021 11 minutes ago, JockoDundee said: what editor does the user use to edit the file and re-save it? Acrobat reader
ducphu Posted June 10, 2021 Author Posted June 10, 2021 Ok, I wrote some codes as below expandcollapse popupFunc _XPDF_ToText($sPDFFile, $sTXTFile, $iFirstPage = 1, $iLastPage = 0, $bLayout = True) Local $sXPDFToText = @ScriptDir & "\pdftotext.exe" Local $sOptions If NOT FileExists($sPDFFile) Then Return SetError(1, 0, 0) If NOT FileExists($sXPDFToText) Then Return SetError(2, 0, 0) If $iFirstPage <> 1 Then $sOptions &= " -f " & $iFirstPage If $iLastPage <> 0 Then $sOptions &= " -l " & $iLastPage If $bLayout = True Then $sOptions &= " -layout" Local $iReturn = ShellExecuteWait ( $sXPDFToText , $sOptions & ' "' & $sPDFFile & '" "' & $sTXTFile & '"', @ScriptDir, "", @SW_HIDE) If $iReturn = 0 Then Return 1 Return 0 EndFunc #include <MsgBoxConstants.au3> #include <File.au3> _XPDF_ToText("C:\Users\Duc Phu\Desktop\Score.pdf","C:\Users\Duc Phu\Desktop\temp.txt",1,0,true) ; Open temp text file Local $hFileOpen = FileOpen("C:\Users\Duc Phu\Desktop\temp.txt",0) ; Read the fist line of the file using the handle returned by FileOpen Local $sFileRead = FileReadLine($hFileOpen, 1) ; Retrieve the number of lines in the temp file Local $iCountLines = _FileCountLines($hFileOpen) Local $ReadLine[$iCountLines] Local $ReadLineFull[$iCountLines] Local $ReadLineScore[$iCountLines] For $i = 1 to $iCountLines $ReadLine[$i-1] = FileReadLine("C:\Users\Duc Phu\Desktop\temp.txt",$i) Local $RegResult = StringRegExp($ReadLine[$i-1],'[0-9]+\s+[A-Za-z]+\s+([0-9]+)',2) If Not @error Then ;If regex found matches $ReadLineScore[$i-1] = $RegResult[1] ;If score >=90 then write the match to $ReadLineFull array. We need this array for PDF searching and highlighting later on If $ReadLineScore[$i-1] >= 90 Then $ReadLineFull[$i-1] = $RegResult[0] Else $ReadLineFull[$i-1] = "-" EndIf Else ; If not $ReadLineFull[$i-1] = "-" $ReadLineScore[$i-1] = "-" EndIf Next ; Close the handle returned by FileOpen. FileClose($hFileOpen) ;Here, we have $ReadLineFull array. We need to loop through the array, if value <> "-" then we will need to ; 1st to open the PDF file ; Send Ctrl+F to open the arobat reader search box ; Send Ctrl+V to paste the value to the search box ; Wait sec to ensure the search result is returned ; Click on the full matched result. The line containing the result should be selected ; Send control click to the highlight button to highlight the line Do you think that the idea here is doable ;Here, we have $ReadLineFull array. We need to loop through the array, if value <> "-" then we will need to ; 1st to open the PDF file ; Send Ctrl+F to open the arobat reader search box ; Send Ctrl+V to paste the value to the search box ; Wait sec to ensure the search result is returned ; Click on the full matched result. The line containing the result should be selected ; Send control click to the highlight button to highlight the line
ducphu Posted June 10, 2021 Author Posted June 10, 2021 I managed to complete this project. Below is the full code expandcollapse popup#include <MsgBoxConstants.au3> #include <FileConstants.au3> #include <File.au3> Func _XPDF_ToText($sPDFFile, $sTXTFile, $iFirstPage = 1, $iLastPage = 0, $bLayout = True) Local $sXPDFToText = @ScriptDir & "\pdftotext.exe" Local $sOptions If NOT FileExists($sPDFFile) Then Return SetError(1, 0, 0) If NOT FileExists($sXPDFToText) Then Return SetError(2, 0, 0) If $iFirstPage <> 1 Then $sOptions &= " -f " & $iFirstPage If $iLastPage <> 0 Then $sOptions &= " -l " & $iLastPage If $bLayout = True Then $sOptions &= " -layout" Local $iReturn = ShellExecuteWait ( $sXPDFToText , $sOptions & ' "' & $sPDFFile & '" "' & $sTXTFile & '"', @ScriptDir, "", @SW_HIDE) If $iReturn = 0 Then Return 1 Return 0 EndFunc Func FileSelection() ; Display an open dialog to select a list of file(s). Global $sFileOpenDialog = FileOpenDialog("Select file(s)", @DesktopDir & "\", "Adobe PDF Files (*.pdf)", BitOR($FD_FILEMUSTEXIST, $FD_MULTISELECT)) If @error Then ; Display the error message. MsgBox(0, "", "No file(s) were selected.") Exit ; Change the working directory (@WorkingDir) back to the location of the script directory as FileOpenDialog sets it to the last accessed folder. ;FileChangeDir(@ScriptDir) Else ; Change the working directory (@WorkingDir) back to the location of the script directory as FileOpenDialog sets it to the last accessed folder. ;FileChangeDir(@ScriptDir) ; Replace instances of "|" with @CRLF in the string returned by FileOpenDialog. ;$sFileOpenDialog = StringReplace($sFileOpenDialog, "|", @CRLF) ; Display the list of selected files. ;MsgBox(0, "", "You chose the following files:" & @CRLF & $sFileOpenDialog) EndIf EndFunc FileSelection() Local $FilesArr = StringSplit($sFileOpenDialog, "|") Local $Dir = $FilesArr[1] Local $File[$FilesArr[0]-1] For $iFile = 0 to $FilesArr[0]-1-1 $File[$iFile] = $FilesArr[$iFile+2] Local $CurrentFile = $Dir & "\" & $File[$iFile] _XPDF_ToText($CurrentFile,@ScriptDir & "\temp.txt",1,0,true) ; Open temp text file Local $hFileOpen = FileOpen(@ScriptDir & "\temp.txt",0) ; Retrieve the number of lines in the temp file Local $iCountLines = _FileCountLines($hFileOpen) Local $ReadLine[$iCountLines] Local $ReadLineFull[$iCountLines] Local $ReadLineScore[$iCountLines] For $iLine = 1 to $iCountLines $ReadLine[$iLine-1] = FileReadLine(@ScriptDir & "\temp.txt",$iLine) Local $RegResult = StringRegExp($ReadLine[$iLine-1],'[0-9]+\s+[A-Za-z]+\s+([0-9]+)',2) If Not @error Then ;If regex found matches $ReadLineScore[$iLine-1] = $RegResult[1] ;If score >=90 then write the match to $ReadLineFull array. We need this array for PDF searching and highlighting later on If $ReadLineScore[$iLine-1] >= 90 Then $ReadLineFull[$iLine-1] = $ReadLine[$iLine-1] Else $ReadLineFull[$iLine-1] = "-" EndIf Else ; If not $ReadLineFull[$iLine-1] = "-" $ReadLineScore[$iLine-1] = "-" EndIf Next ; Close the handle returned by FileOpen. FileClose($hFileOpen) ;Here, we have $ReadLineFull array. We need to loop through the array, if value <> "-" then we will need to ; Send Ctrl+F to open the arobat reader search box ; Send Ctrl+V to paste the value to the search box ; Wait sec to ensure the search result is returned ; Send Enter key ; Send control click to the highlight button to highlight the line ; 1st to open the PDF file ShellExecute($CurrentFile,"","","",@SW_MAXIMIZE) ; Wait 5 seconds for the Notepad window to exist $WinActive = WinWaitActive("[CLASS:AcrobatSDIWindow]", "", 5) If $WinActive = 0 Then MsgBox(0,"Error", "No Acrobat Reader window") Exit Else For $iLine = 1 to $iCountLines If $ReadLineFull[$iLine-1] <> "-" Then ClipPut($ReadLineFull[$iLine-1]) Send("^f") Sleep(1000) Send("^v") Sleep(1000) Send("{ENTER}") Sleep(1000) ControlFocus("[CLASS:AcrobatSDIWindow]","","[CLASS:AVL_AVView; INSTANCE:38]") Sleep(1000) ControlClick("[CLASS:AcrobatSDIWindow]","","[CLASS:AVL_AVView; INSTANCE:38]", "left", 1, 65, 10) Sleep(1000) EndIf Next Sleep(1000) Send("^s") Sleep(1000) WinClose("[CLASS:AcrobatSDIWindow]", "") Sleep(5000) EndIf Next MsgBox(0,"AutoProcess","Done. File(s) saved and closed.",5)
mLipok Posted June 11, 2021 Posted June 11, 2021 (edited) On 6/10/2021 at 11:39 AM, ducphu said: Acrobat reader No. Reader is not editor. Do you thought about Acrobat Profesional ? If so.... take a look here: Edited June 11, 2021 by mLipok Signature beginning:* Please remember: "AutoIt"..... * Wondering who uses AutoIt and what it can be used for ? * Forum Rules ** ADO.au3 UDF * POP3.au3 UDF * XML.au3 UDF * IE on Windows 11 * How to ask ChatGPT for AutoIt Code * for other useful stuff click the following button: Spoiler Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind. My contribution (my own projects): * Debenu Quick PDF Library - UDF * Debenu PDF Viewer SDK - UDF * Acrobat Reader - ActiveX Viewer * UDF for PDFCreator v1.x.x * XZip - UDF * AppCompatFlags UDF * CrowdinAPI UDF * _WinMergeCompare2Files() * _JavaExceptionAdd() * _IsBeta() * Writing DPI Awareness App - workaround * _AutoIt_RequiredVersion() * Chilkatsoft.au3 UDF * TeamViewer.au3 UDF * JavaManagement UDF * VIES over SOAP * WinSCP UDF * GHAPI UDF - modest begining - comunication with GitHub REST API * ErrorLog.au3 UDF - A logging Library * Include Dependency Tree (Tool for analyzing script relations) * Show_Macro_Values.au3 * My contribution to others projects or UDF based on others projects: * _sql.au3 UDF * POP3.au3 UDF * RTF Printer - UDF * XML.au3 UDF * ADO.au3 UDF * SMTP Mailer UDF * Dual Monitor resolution detection * * 2GUI on Dual Monitor System * _SciLexer.au3 UDF * SciTE - Lexer for console pane * Useful links: * Forum Rules * Forum etiquette * Forum Information and FAQs * How to post code on the forum * AutoIt Online Documentation * AutoIt Online Beta Documentation * SciTE4AutoIt3 getting started * Convert text blocks to AutoIt code * Games made in Autoit * Programming related sites * Polish AutoIt Tutorial * DllCall Code Generator * Wiki: * Expand your knowledge - AutoIt Wiki * Collection of User Defined Functions * How to use HelpFile * Good coding practices in AutoIt * OpenOffice/LibreOffice/XLS Related: WriterDemo.au3 * XLS/MDB from scratch with ADOX IE Related: * How to use IE.au3 UDF with AutoIt v3.3.14.x * Why isn't Autoit able to click a Javascript Dialog? * Clicking javascript button with no ID * IE document >> save as MHT file * IETab Switcher (by LarsJ ) * HTML Entities * _IEquerySelectorAll() (by uncommon) * IE in TaskScheduler * IE Embedded Control Versioning (use IE9+ and HTML5 in a GUI) * PDF Related: * How to get reference to PDF object embeded in IE * IE on Windows 11 * I encourage you to read: * Global Vars * Best Coding Practices * Please explain code used in Help file for several File functions * OOP-like approach in AutoIt * UDF-Spec Questions * EXAMPLE: How To Catch ConsoleWrite() output to a file or to CMD *I also encourage you to check awesome @trancexx code: * Create COM objects from modules without any demand on user to register anything. * Another COM object registering stuff * OnHungApp handler * Avoid "AutoIt Error" message box in unknown errors * HTML editor * winhttp.au3 related : * https://www.autoitscript.com/forum/topic/206771-winhttpau3-download-problem-youre-speaking-plain-http-to-an-ssl-enabled-server-port/ "Homo sum; humani nil a me alienum puto" - Publius Terentius Afer"Program are meant to be read by humans and only incidentally for computers and execute" - Donald Knuth, "The Art of Computer Programming" , be and \\//_. Anticipating Errors : "Any program that accepts data from a user must include code to validate that data before sending it to the data store. You cannot rely on the data store, ...., or even your programming language to notify you of problems. You must check every byte entered by your users, making sure that data is the correct type for its field and that required fields are not empty." Signature last update: 2023-04-24
seadoggie01 Posted June 20, 2021 Posted June 20, 2021 I've been out for a week (family vacation) It looks like this is printed from Word... could you edit this in Word before printing it to PDF? (the document properties say "Application: Acrobat PDFMaker 20 for Word") All my code provided is Public Domain... but it may not work. Use it, change it, break it, whatever you want. Spoiler My Humble Contributions:Personal Function Documentation - A personal HelpFile for your functionsAcro.au3 UDF - Automating Acrobat ProToDo Finder - Find #ToDo: lines in your scriptsUI-SimpleWrappers UDF - Use UI Automation more Simply-erKeePass UDF - Automate KeePass, a password managerInputBoxes - Simple Input boxes for various variable types
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now