supergg02 Posted October 1, 2005 Posted October 1, 2005 (edited) Hi ! is there a way to convert doc files to txt files ? The solution must be without using ms word because it use a lot of ram and cpu if there is a lot files to convert. Thinks for your help My goal is to make statistics about ponctuations, words, paragraphs in a given book Edited October 1, 2005 by supergg02
BigDod Posted October 1, 2005 Posted October 1, 2005 Hi !is there a way to convert doc files to txt files ?The solution must be without using ms word because it use a lot of ram and cpu if there is a lot files to convert.Thinks for your helpMy goal is to make statistics about ponctuations, words, paragraphs in a given bookSearching on google came up with this amongst many others. Time you enjoyed wasting is not wasted time ......T.S. Elliot Suspense is worse than disappointment................Robert Burns God help the man who won't help himself, because no-one else will...........My Grandmother
Valuater Posted October 1, 2005 Posted October 1, 2005 as i understand if you are in word and want to just save as a text file, word will prompt you saying " you will loose formatting" thus your goal is defeated 8)
supergg02 Posted October 1, 2005 Author Posted October 1, 2005 as i understand if you are in word and want to just save as a text file, word will prompt you saying " you will loose formatting"thus your goal is defeated8)No the doc files are generated automaticly by an other ocr software and i search a solution to convert them without opening them by word or other
BigDod Posted October 1, 2005 Posted October 1, 2005 No the doc files are generated automaticly by an other ocr software and i search a solution to convert them without opening them by word or otherThe solution that I gave you can be used in command line and therefore can easily be scripted. Time you enjoyed wasting is not wasted time ......T.S. Elliot Suspense is worse than disappointment................Robert Burns God help the man who won't help himself, because no-one else will...........My Grandmother
Valuater Posted October 1, 2005 Posted October 1, 2005 I think your missing the point... doc files include formatting i tried FileCopy("C:\Questions.doc", "C:\Questions.txt") and the txt file is gibberish.... because of doc formatting 8)
supergg02 Posted October 1, 2005 Author Posted October 1, 2005 The solution that I gave you can be used in command line and therefore can easily be scripted.thinks a lot ! i will try it now....
water Posted October 1, 2005 Posted October 1, 2005 I found an old mail about this subject:>wvWare might help you out. It's a library (the one used in Abiword)>and a set of command-line tools for reading and converting MS Word>documents. The URL is http://wvware.sourceforge.net/ . Good luck.HTH My UDFs and Tutorials: Spoiler UDFs: Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki Standard UDFs: Excel - Example Scripts - Wiki Word - Wiki Tutorials: ADO - Wiki WebDriver - Wiki
BigDod Posted October 1, 2005 Posted October 1, 2005 I think your missing the point... doc files include formattingi triedFileCopy("C:\Questions.doc", "C:\Questions.txt")and the txt file is gibberish.... because of doc formatting8)If the doc is run through a convertor the formatting is striped and only the text remains. What you tried was just renaming the file. Time you enjoyed wasting is not wasted time ......T.S. Elliot Suspense is worse than disappointment................Robert Burns God help the man who won't help himself, because no-one else will...........My Grandmother
jefhal Posted October 1, 2005 Posted October 1, 2005 (edited) Searching on google came up with this amongst many others.BigDod- I dl'd AntiWord for DOS and tested on 3 doc files. It works perfectly! Good call. (P.S. they "don't do Windows", but they do have a precompiled version for Windows if you want to spend a lot more time on it...This is a sample of the output. It would let you count words, punctuation, paragraphs, etc.:expires and you still need to use Outlook Web Access, refresh your browserand log on again.Supported browsers and operating systemsYou can use Outlook Web Access with Microsoft Internet Explorer or NetscapeNavigator Web browsers from many UNIX, Apple Macintosh, or MicrosoftWindows-based computers. To use the complete set of features available with Edited October 1, 2005 by jefhal ...by the way, it's pronounced: "JIF"... Bob Berry --- inventor of the GIF format
Valuater Posted October 1, 2005 Posted October 1, 2005 (edited) If the doc is run through a convertor the formatting is striped and only the text remains. What you tried was just renaming the file.thx.... i understand that8) Edited October 1, 2005 by Valuater
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now