drbobscureall Posted October 28, 2009 Posted October 28, 2009 I am relatively new to AutoIt and I want to know if anyone knows how to extract text from HTML files (lots of them), analyse the string and then replace it with an appropriate alternative? Essentially I am trying to typeset translated text into English HTML files. The translations are supplied in MS Excel spreadsheets, with the translated text in a cell immediately to the right of the English text. I want to do the following: - Read through the HTMLs and extract the text from between the tags. - Determine what the appropriate translation is by comparing the extracted text with the English source contained in the Excel sheet. - Replace the original HTML text with the translation whilst retaining all the formatting. If the script could automatically open, save and close files too that'd be even better. Can anyone help? Andy
exodius Posted October 28, 2009 Posted October 28, 2009 (edited) Welcome to the forums!This will illustrate one approach that may work for you for the HTML file reading and modification portion of that, play around with it some and see if you can get it to work for you.This example only works for one phrase and it's conversion at a time, you'll want to look at the _Excel* functions in the Help File to read your Excel doc, or convert it to a .csv file and read it in using _FileReadToArray to explore looping through all of the phrases you want to convert.#include <Array.au3> #include <File.au3> $varExampleStringToLookFor = "¿Cómo te llamas?" $varExampleStringToConvertTo = "What is your name?" $aFiles = _FileListToArray (@ScriptDir, "*.html") If Not @error Then _ArrayDisplay ($aFiles) ; Just so you can see that we found files For $x = 1 To $aFiles[0] $var = FileRead (@ScriptDir & "\" & $aFiles[$x]) $var = StringReplace ($var, $varExampleStringToLookFor, $varExampleStringToConvertTo) FileWrite(@ScriptDir & "\Modified-" & $aFiles[$x], $var) Next Else MsgBox (0, "No HTML Files!", "There aren't any HTML files in this folder!") EndIf Edited October 28, 2009 by exodius
drbobscureall Posted October 29, 2009 Author Posted October 29, 2009 Thanks for that! I've got it working to some degree - just one thing is causing trouble though, I'm trying to use the '_clipboard*' functions to allow a copy and paste from an ordered text file, but for some reason I can't seem to get the text into the clipboard. When I paste the clipboard contents, it just adds a blank line. Any idea what I might be doing wrong? Welcome to the forums! This will illustrate one approach that may work for you for the HTML file reading and modification portion of that, play around with it some and see if you can get it to work for you. This example only works for one phrase and it's conversion at a time, you'll want to look at the _Excel* functions in the Help File to read your Excel doc, or convert it to a .csv file and read it in using _FileReadToArray to explore looping through all of the phrases you want to convert. #include <Array.au3> #include <File.au3> $varExampleStringToLookFor = "¿Cómo te llamas?" $varExampleStringToConvertTo = "What is your name?" $aFiles = _FileListToArray (@ScriptDir, "*.html") If Not @error Then _ArrayDisplay ($aFiles) ; Just so you can see that we found files For $x = 1 To $aFiles[0] $var = FileRead (@ScriptDir & "\" & $aFiles[$x]) $var = StringReplace ($var, $varExampleStringToLookFor, $varExampleStringToConvertTo) FileWrite(@ScriptDir & "\Modified-" & $aFiles[$x], $var) Next Else MsgBox (0, "No HTML Files!", "There aren't any HTML files in this folder!") EndIf
exodius Posted October 29, 2009 Posted October 29, 2009 Wanna post your code so we can see what you're trying to do?
drbobscureall Posted October 30, 2009 Author Posted October 30, 2009 Here's my code - any help you can give me to get the clipboard function to work would be much appreciated.Also as this will enevitably be a very slow process when used for large amounts of text and documents (I'm talking 800 to 1000 lines of text and 50-60 documents), any advice you can give me to speed it up would be great!Thanks!#cs================================= Description ======================================================This script is intended to identify the contents of a directory and list the .doc files to an array.The .doc files are then opened one at a time (I've not included the save and close bits yet cos I'm stillstruggling with the core functionality) and the Find/Replace dialog opened.Two external text files (English Source & Foreign Translation) with ordered lists of phrases are read toarrays, which are then used as the source for filling the Find/Replace fields.This loops until all the source array items have been read and used - then the next .doc file is openedand the process repeated.Please be gentle when criticizing my code. I'm only a newby and don't know any better!=======================================================================================================#ce#Include <Clipboard.au3>#include <Array.au3>#Include <File.au3>dim $i, $x, $ENG, $Tran, $Index_New, $Index_Orig, $Source, $Translation, $var, $To, $From, $File$aFiles = _FileListToArray (@ScriptDir, "*.doc") ; Lists .doc files in script folder to array$Source = FileOpenDialog("Select English Source", @DesktopDir, "(*.txt)") ; Locate English source text file$Translation = FileOpenDialog("Select Translation Source", @DesktopDir, "(*.txt)") ; Locate Translated source text file_FileReadToArray($Source, $ENG) ; Read English source file to array_FileReadToArray($Translation, $Tran) ; Read translation source file to array_ArrayDisplay($ENG) ; Display English Array_ArrayDisplay($Tran) ; Display Translated Array;~ For $x = 1 To 2For $x = 1 To $aFiles[0] ; Repeat for each .doc file in folder ShellExecute($aFiles[$x]) ; open .doc file WinWait($aFiles & " - Microsoft Word","") ; wait for Word to load If Not WinActive($aFiles & " - Microsoft Word","") Then WinActivate($aFiles & " - Microsoft Word","") ; check if Word is active WinWaitActive($aFiles & " - Microsoft Word","") ; Wait until Word is active before continuing send("^h") ; open Find/Replace dialog WinWaitActive("Find and Replace","") ; wait for dialog to be active for $i = 1 to $ENG[0] ; repeat for all elements in source array If Not WinActive("Find and Replace","") Then WinWaitActive("Find and Replace","") ; check if Find/Replace is active WinWaitActive("Find and Replace","") ; wait for Find/Replace to be active before continuing $length1 = StringLen($From) ; Check length of English source string $Length2 = StringLen($To) ; Check length of Foreign dource string if $length1 > 254 Then ; If English string >254 characters then ... filewriteline(@ScriptDir & "\Errors.txt", "Line No - " & $i & " - Too Long - " & $From) ; ...list string in error report file ElseIf $length1 < 254 Then ; If English string <254 ... If $Length2 > 254 Then ; ... but foreign string is > 254 then ... filewriteline(@ScriptDir & "\Errors.txt", "Line No - " & $i & " - Too Long - " & $From) ; ...list string in error report file Else ; Otherwise ... _ArrayToClip($ENG[$i]) ; ... copy contents of English Source Array line to clipboard ... sleep(200) send("!n") ; ... Select the 'Find' field in the Find / Replace Dialog ... sleep(100);~ send("Foo") ; (Test Expression) ClipGet() ; ... Paste clipboard contents into field sleep(200) _ArrayToClip($Tran[$i]) ; Then copy equivalent data from Translation Source Array ... Sleep(200) Send("!i") ; ... Select the 'Replace' field in the Find / Replace Dialog ... sleep(100);~ send("Bar") ; (Test Expression) ClipGet() ; ... Paste clipboard contents into field sleep(200) send("!f") ; Find text sleep(200) send("!r") ; Replace text sleep(1000) send("{ENTER}") ; Close 'Completed' dialog box sleep(500) EndIf EndIf NextNext
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now