Kiran_L Posted September 22, 2017 Posted September 22, 2017 Hi guys, I am trying to read a pdf file with unstructured data. I dontot know how to handle pdf activities in AutoIt, Can you help me with any UDF to open the PDF and read the doc. Thanks for your time.
KickStarter15 Posted September 22, 2017 Posted September 22, 2017 Hi @Kiran_L, Try checking this old thread and this link. You might get an idea somehow. Else, can you post your made code so far so that anyone can easily help. Programming is "To make it so simple that there are obviously no deficiencies" or "To make it so complicated that there are no obvious deficiencies" by C.A.R. Hoare.
mLipok Posted September 22, 2017 Posted September 22, 2017 You can use mupdf https://mupdf.com/downloads/ or other commercial solutions like QuickPDF (look in my signature for QuickPDF.au3 UDF) Signature beginning:* Please remember: "AutoIt"..... * Wondering who uses AutoIt and what it can be used for ? * Forum Rules ** ADO.au3 UDF * POP3.au3 UDF * XML.au3 UDF * IE on Windows 11 * How to ask ChatGPT for AutoIt Code * for other useful stuff click the following button: Spoiler Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind. My contribution (my own projects): * Debenu Quick PDF Library - UDF * Debenu PDF Viewer SDK - UDF * Acrobat Reader - ActiveX Viewer * UDF for PDFCreator v1.x.x * XZip - UDF * AppCompatFlags UDF * CrowdinAPI UDF * _WinMergeCompare2Files() * _JavaExceptionAdd() * _IsBeta() * Writing DPI Awareness App - workaround * _AutoIt_RequiredVersion() * Chilkatsoft.au3 UDF * TeamViewer.au3 UDF * JavaManagement UDF * VIES over SOAP * WinSCP UDF * GHAPI UDF - modest begining - comunication with GitHub REST API * ErrorLog.au3 UDF - A logging Library * Include Dependency Tree (Tool for analyzing script relations) * Show_Macro_Values.au3 * My contribution to others projects or UDF based on others projects: * _sql.au3 UDF * POP3.au3 UDF * RTF Printer - UDF * XML.au3 UDF * ADO.au3 UDF * SMTP Mailer UDF * Dual Monitor resolution detection * * 2GUI on Dual Monitor System * _SciLexer.au3 UDF * SciTE - Lexer for console pane * Useful links: * Forum Rules * Forum etiquette * Forum Information and FAQs * How to post code on the forum * AutoIt Online Documentation * AutoIt Online Beta Documentation * SciTE4AutoIt3 getting started * Convert text blocks to AutoIt code * Games made in Autoit * Programming related sites * Polish AutoIt Tutorial * DllCall Code Generator * Wiki: * Expand your knowledge - AutoIt Wiki * Collection of User Defined Functions * How to use HelpFile * Good coding practices in AutoIt * OpenOffice/LibreOffice/XLS Related: WriterDemo.au3 * XLS/MDB from scratch with ADOX IE Related: * How to use IE.au3 UDF with AutoIt v3.3.14.x * Why isn't Autoit able to click a Javascript Dialog? * Clicking javascript button with no ID * IE document >> save as MHT file * IETab Switcher (by LarsJ ) * HTML Entities * _IEquerySelectorAll() (by uncommon) * IE in TaskScheduler * IE Embedded Control Versioning (use IE9+ and HTML5 in a GUI) * PDF Related: * How to get reference to PDF object embeded in IE * IE on Windows 11 * I encourage you to read: * Global Vars * Best Coding Practices * Please explain code used in Help file for several File functions * OOP-like approach in AutoIt * UDF-Spec Questions * EXAMPLE: How To Catch ConsoleWrite() output to a file or to CMD *I also encourage you to check awesome @trancexx code: * Create COM objects from modules without any demand on user to register anything. * Another COM object registering stuff * OnHungApp handler * Avoid "AutoIt Error" message box in unknown errors * HTML editor * winhttp.au3 related : * https://www.autoitscript.com/forum/topic/206771-winhttpau3-download-problem-youre-speaking-plain-http-to-an-ssl-enabled-server-port/ "Homo sum; humani nil a me alienum puto" - Publius Terentius Afer"Program are meant to be read by humans and only incidentally for computers and execute" - Donald Knuth, "The Art of Computer Programming" , be and \\//_. Anticipating Errors : "Any program that accepts data from a user must include code to validate that data before sending it to the data store. You cannot rely on the data store, ...., or even your programming language to notify you of problems. You must check every byte entered by your users, making sure that data is the correct type for its field and that required fields are not empty." Signature last update: 2023-04-24
n3wbie Posted September 24, 2017 Posted September 24, 2017 I Would recommend U Using xpdf Just Search In Forum For Xpdf Some One Had Posted It Earlier it Will Allow U To read Data From Pdf As Text
Popular Post jguinch Posted September 24, 2017 Popular Post Posted September 24, 2017 Using Xpdf tools : expandcollapse popup; #FUNCTION# ==================================================================================================================== ; Name...........: _XFDF_Info ; Description....: Retrives informations from a PDF file ; Syntax.........: _XFDF_Info ( "File" [, "Info"] ) ; Parameters.....: File - PDF File. ; Info - The information to retrieve ; Return values..: Success - If the Info parameter is not empty, returns the desired information for the specified Info parameter ; - If the Info parameter is empty, returns an array with all available informations ; Failure - 0, and sets @error to : ; 1 - PDF File not found ; 2 - Unable to find the external programm ; Remarks........: The array returned is two-dimensional and is made up as follows: ; $array[1][0] = Label of the first information (title, author, pages...) ; $array[1][1] = value of the first information ; ... ; =============================================================================================================================== Func _XFDF_Info($sPDFFile, $sInfo = "") Local $sXPDFInfo = @ScriptDir & "\pdfinfo.exe" If NOT FileExists($sPDFFile) Then Return SetError(1, 0, 0) If NOT FileExists($sXPDFInfo) Then Return SetError(2, 0, 0) Local $iPid = Run(@ComSpec & ' /c "' & $sXPDFInfo & ' "' & $sPDFFile & '"', @ScriptDir, @SW_HIDE, 2) Local $sResult While 1 $sResult &= StdoutRead($iPid) If @error Then ExitLoop WEnd Local $aInfos = StringRegExp($sResult, "(?m)^(.*?): +(.*)$", 3) If Mod( UBound($aInfos, 1), 2) = 1 Then Return SetError(3, 0, 0) Local $aResult [ UBound($aInfos, 1) / 2][2] For $i = 0 To UBound($aInfos) - 1 Step 2 If $sInfo <> "" AND $aInfos[$i] = $sInfo Then Return $aInfos[$i + 1] $aResult[$i / 2][0] = $aInfos[$i] $aResult[$i / 2][1] = $aInfos[$i + 1] Next If $sInfo <> "" Then Return "" Return $aResult EndFunc ; ---> _XFDF_Info ; #FUNCTION# ==================================================================================================================== ; Name...........: _XPDF_Search ; Description....: Retrives informations from a PDF file ; Syntax.........: _XFDF_Info ( "File" [, "String" [, Case = 0 [, Flag = 0 [, FirstPage = 1 [, LastPage = 0]]]]] ) ; Parameters.....: File - PDF File. ; String - String to search for ; Case - If set to 1, search is case sensitive (default is 0) ; Flag - A number to indicate how the function behaves. See below for details. The default is 0. ; FirstPage - First page to convert (default is 1) ; LastPage - Last page to convert (default is 0 = last page of the document) ; Return values..: Success - ; Flag = 0 - Returns 1 if the search string was found, or 0 if not ; Flag = 1 - Returns the number of occcurrences found in the whole PDF File ; Flag = 2 - Returns an array containing the number of occurrences found for each page ; (only pages containing the search string are returned) ; $array[0][0] - Number of matching pages ; $array[0][1] - Number of occcurrences found in the whole PDF File ; $array[n][0] - Page number ; $array[n][1] - Number of occcurrences found for the page ; Failure - 0, and sets @error to : ; 1 - PDF File not found ; 2 - Unable to find the external programm ; =============================================================================================================================== Func _XPDF_Search($sPDFFile, $sSearch, $iCase = 0, $iFlag = 0, $iStart = 1, $iEnd = 0) Local $sXPDFToText = @ScriptDir & "\pdftotext.exe" Local $sOptions = " -layout -f " & $iStart Local $iCount = 0, $aResult[1][2] = [[0, 0]], $aSearch, $sContent, $iPageOccCount If NOT FileExists($sPDFFile) Then Return SetError(1, 0, 0) If NOT FileExists($sXPDFToText) Then Return SetError(2, 0, 0) If $iEnd > 0 Then $sOptions &= " -l " & $iEnd Local $iPid = Run($sXPDFToText & $sOptions & ' "' & $sPDFFile & '" -', @ScriptDir, @SW_HIDE, 2) While 1 $sContent &= StdoutRead($iPid) If @error Then ExitLoop WEnd Local $aPages = StringSplit($sContent, chr(12) ) For $i = 1 To $aPages[0] $iPageOccCount = 0 While StringInStr($aPages[$i], $sSearch, $iCase, $iPageOccCount + 1) If $iFlag <> 1 AND $iFlag <> 2 Then $aResult[0][1] = 1 ExitLoop EndIf $iPageOccCount += 1 WEnd If $iPageOccCount Then Redim $aResult[ UBound($aResult, 1) + 1][2] $aResult[0][1] += $iPageOccCount $aResult[0][0] = UBound($aResult) - 1 $aResult[ UBound($aResult, 1) - 1 ][0] = $i + $iStart - 1 $aResult[ UBound($aResult, 1) - 1 ][1] = $iPageOccCount EndIf Next If $iFlag = 2 Then Return $aResult Return $aResult[0][1] EndFunc ; ---> _XPDF_Search ; #FUNCTION# ==================================================================================================================== ; Name...........: _XPDF_ToText ; Description....: Converts a PDF file to plain text. ; Syntax.........: _XPDF_ToText ( "PDFFile" , "TxtFile" [ , FirstPage [, LastPage [, Layout ]]] ) ; Parameters.....: PDFFile - PDF Input File. ; TxtFile - Plain text file to convert to ; FirstPage - First page to convert (default is 1) ; LastPage - Last page to convert (default is last page of the document) ; Layout - If true, maintains (as best as possible) the original physical layout of the text ; If false, the behavior is to 'undo' physical layout (columns, hyphenation, etc.) ; and output the text in reading order. ; Default is True ; Return values..: Success - 1 ; Failure - 0, and sets @error to : ; 1 - PDF File not found ; 2 - Unable to find the external program ; =============================================================================================================================== Func _XPDF_ToText($sPDFFile, $sTXTFile, $iFirstPage = 1, $iLastPage = 0, $bLayout = True) Local $sXPDFToText = @ScriptDir & "\pdftotext.exe" Local $sOptions If NOT FileExists($sPDFFile) Then Return SetError(1, 0, 0) If NOT FileExists($sXPDFToText) Then Return SetError(2, 0, 0) If $iFirstPage <> 1 Then $sOptions &= " -f " & $iFirstPage If $iLastPage <> 0 Then $sOptions &= " -l " & $iLastPage If $bLayout = True Then $sOptions &= " -layout" Local $iReturn = ShellExecuteWait ( $sXPDFToText , $sOptions & ' "' & $sPDFFile & '" "' & $sTXTFile & '"', @ScriptDir, "", @SW_HIDE) If $iReturn = 0 Then Return 1 Return 0 EndFunc ; ---> _XPDF_ToText ratus69, PoojaKrishna, Xandy and 5 others 4 4 Spoiler Network configuration UDF, _DirGetSizeByExtension, _UninstallList Firefox ConfigurationArray multi-dimensions, Printer Management UDF
Moderators JLogan3o13 Posted May 30, 2018 Moderators Posted May 30, 2018 @rsalois please don't resurrect old threads, especially just to say you have no idea how to answer the OP's question... "Profanity is the last vestige of the feeble mind. For the man who cannot express himself forcibly through intellect must do so through shock and awe" - Spencer W. Kimball How to get your question answered on this forum!
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now