Hello All -
First, I would like to apologize for not responding to your messages - I am a member of other web forums and have gotten used to a email advising me that someone has responded to a new post I made. Since I hadn't heard from AutoIT, I assumed that no one had replied to my original post.
mLipok, thank you for contacting me ;-) The photo you found on the UT-CTR website is about 7 years old so I have less hair and it is whiter now....I may have also lost some weight ;-)
But to the problem at hand.....
Actually, we have tried optical character readers for other applications that involved extracting information from hand written law enforcement crash records. This did not work that well. However, I've not tried to extract data from a type written PDF using an OCR - I somehow thought that I would be able to read the PDF image and extract the data directly, but apparently this is not possible. We have created an excel tool for another application that extracts data from a truck data website and places it into an excel database - I had hoped for something similar for this application.
To describe the process by which these files are initially created, a user accesses a web-site to create the document - the document is then stored in a database within the website as a PDF file which can be accessed by TxDOT but cannot be changed (it is pass word protected). The document creator can download a copy of the document for their use. However, these documents are password protected to prevent the person who first created the document from later altering it (or anyone else for that matter). I have been given a large number of these files for use in our project - the files are downloaded directly from the web database - so they might not actually be scanned images, but rather electronic copies (PDF file) of the document as it was originally created.
I am using Adobe Acrobat X Pro to open the images - the document properties indicate they are in PDF version 1.5 (Adobe 6.x).
The PDF files are not searchable.
I will try using an OCR (Tesseract) to read the PDF and convert it to a text or similar searchable file. I assume that AutoIT does not include an OCR function else a separate program would not be necessary.
If I am making ignorant statements (for example, assuming that OCR could be an AutoIT function, please keep in mind that I am in the learning phase.
Thanks very much for your comments, I'll check in on the forum to follow up.
Mike