crashdemons Posted April 21, 2008 Posted April 21, 2008 (edited) I decided to make this because I was bored.Later, I wish I hadn't started because it was giving me a headache.But now, I have it working halfway decent.This script contains functions for building or processing parts of an HTML Document, concerning elements and content contained within them (including lower-hierarchy elements and text)It also has it's own referencing expression for elements.Example: HTML.Body.Table.TR,2.TD,6Where: Parent,Occurence.Child,Occurence(Note: Occurence is 1 when not included)Changes:-v4 fixed a processing bug in _HTML_GetContent and _HTML_GetByExpression that returned the content of the wrong tag when a side-by-side occurrence followed a nested occurrence.-v3 small changes-Zipped both files-Updated the test script (GUI Resizing/size fixes + Open file + Open Remote file + some quick-options for expressions)-v2 - Scripts can now check whether the selected TreeViewItem has been recently changed-Added the files as attachments instead of code boxes (some of the code was being malformed when inside the code boxes)The Zip below contains both HTML_DOM.au3 and DOM_test.au3HTML_DOM.zip Edited May 6, 2008 by crashdemons My Projects - WindowDarken (Darken except the active window) Yahsmosis Chat Client (Discontinued) StarShooter Game (Red alert! All hands to battlestations!) YMSG Protocol Support (Discontinued) Circular Keyboard and OSK example. (aka Iris KB) Target Screensaver Drive Toolbar Thingy Rollup Pro (Minimize-to-Titlebar & More!) 2D Launcher physics example Ascii Screenshot AutoIt3 Quine Example ("Is a Quine" is a Quine.) USB Lock (Another system keydrive - with a toast.)
ptrex Posted April 22, 2008 Posted April 22, 2008 @crashdemons It is not running as it should on my side. lot's of error comming up >Running AU3Check (1.54.9.0) params: from:C:\Program Files\AutoIt3 C:\_\Apps\AutoIT3\UDF's\HTML_DOM.au3(311,41) : ERROR: syntax error (illegal character) $string=StringReplace($string,"'",''' ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^ C:\_\Apps\AutoIT3\UDF's\HTML_DOM.au3(319,41) : ERROR: syntax error (illegal character) $string=StringReplace($string,''',"'" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^ C:\_\Apps\AutoIT3\UDF's\HTML_DOM.au3(319,42) : ERROR: StringReplace() [built-in] called with wrong number of args. $string=StringReplace($string,''',"'") regards ptrex Contributions :Firewall Log Analyzer for XP - Creating COM objects without a need of DLL's - UPnP support in AU3Crystal Reports Viewer - PDFCreator in AutoIT - Duplicate File FinderSQLite3 Database functionality - USB Monitoring - Reading Excel using SQLRun Au3 as a Windows Service - File Monitor - Embedded Flash PlayerDynamic Functions - Control Panel Applets - Digital Signing Code - Excel Grid In AutoIT - Constants for Special Folders in WindowsRead data from Any Windows Edit Control - SOAP and Web Services in AutoIT - Barcode Printing Using PS - AU3 on LightTD WebserverMS LogParser SQL Engine in AutoIT - ImageMagick Image Processing - Converter @ Dec - Hex - Bin -Email Address Encoder - MSI Editor - SNMP - MIB ProtocolFinancial Functions UDF - Set ACL Permissions - Syntax HighLighter for AU3ADOR.RecordSet approach - Real OCR - HTTP Disk - PDF Reader Personal Worldclock - MS Indexing Engine - Printing ControlsGuiListView - Navigation (break the 4000 Limit barrier) - Registration Free COM DLL Distribution - Update - WinRM SMART Analysis - COM Object Browser - Excel PivotTable Object - VLC Media Player - Windows LogOnOff Gui -Extract Data from Outlook to Word & Excel - Analyze Event ID 4226 - DotNet Compiler Wrapper - Powershell_COM - New
AU3Newbie Posted April 22, 2008 Posted April 22, 2008 @crashdemonsIt is not running as it should on my side.lot's of error comming upregardsptrexOnly one error here:testHTML-DOM.au3 (36) : ==> Unknown function name.: _HTML_TreeAdd_Deep($_HTML_document,$TreeView1) ^ ERROR>Exit code: 1 Time: 13.128Maybe I need some files to be include?
crashdemons Posted April 22, 2008 Author Posted April 22, 2008 Only one error here: testHTML-DOM.au3 (36) : ==> Unknown function name.: _HTML_TreeAdd_Deep($_HTML_document,$TreeView1) ^ ERROR >Exit code: 1 Time: 13.128 Maybe I need some files to be include? In HTML_DOM.au3 Line 101, the function is already defined: Func _HTML_TreeAdd_Deep($content,$treeview,$firstentry=True) In the DOM Test.au3 Line 8, HTML_DOM.au3 is included already: #include <HTML_DOM.au3> Make sure that HTML_DOM.au3 is correctly named, in the right folder (Includes OR in same folder as the 'Test Script') and make sure the Includes line mentioned is there My Projects - WindowDarken (Darken except the active window) Yahsmosis Chat Client (Discontinued) StarShooter Game (Red alert! All hands to battlestations!) YMSG Protocol Support (Discontinued) Circular Keyboard and OSK example. (aka Iris KB) Target Screensaver Drive Toolbar Thingy Rollup Pro (Minimize-to-Titlebar & More!) 2D Launcher physics example Ascii Screenshot AutoIt3 Quine Example ("Is a Quine" is a Quine.) USB Lock (Another system keydrive - with a toast.)
crashdemons Posted April 22, 2008 Author Posted April 22, 2008 UPDATE: See the first post for a new copy of HTML_DOM.au3 The AutoIt tags were changing some of the double-quotes in the script to single-quotes - creating problems. The file has been attached now instead of placed in a Highlighted codebox. My Projects - WindowDarken (Darken except the active window) Yahsmosis Chat Client (Discontinued) StarShooter Game (Red alert! All hands to battlestations!) YMSG Protocol Support (Discontinued) Circular Keyboard and OSK example. (aka Iris KB) Target Screensaver Drive Toolbar Thingy Rollup Pro (Minimize-to-Titlebar & More!) 2D Launcher physics example Ascii Screenshot AutoIt3 Quine Example ("Is a Quine" is a Quine.) USB Lock (Another system keydrive - with a toast.)
DaleHohm Posted April 22, 2008 Posted April 22, 2008 Impressive work - expecially considering you're doing it by brute force (string parsing rather than using a DOM parser).I assume the "Tag Expression" is your own invention right? What rules do you use to construct it?I find the example interface you constructed madeningly tiny, but I like the output.Dale Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model Automate input type=file (Related) Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded Better Better? IE.au3 issues with Vista - Workarounds SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead? Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble
crashdemons Posted April 22, 2008 Author Posted April 22, 2008 (edited) It was a pain.The basis of the hierarchy used in the "tag expression" is based soley upon _HTML_GetContent which took all but insanity to finish. (Gets the content between a beginning and ending tag, counting each identical opening tag until they are all matched with an ending tag or the ending tag search returns 0 (no ending tag found). If there is no ending tags to begin with, then the tag isnt encapsulating and therefore has no content.)The other main instrument was GetTLTags which uses the GetContent processor to find all of the "Top-Level" tags in a string (tags that arent already surrounded by another Tag's content area - within the content specified)...Using the two in conjuction I was able to Split a period-delimited string, the first element of the array was used to get the first Top-Level tag, the occurence is used to determine which (if more than one), GetContent is used to get the content of this tag - then this process is looped using the found content and the next element of the arrayThen the nth element is used on the (n-1)th element's content and looped until n is the last element, where the newest content will be returned.A similar process is used in GetByExpressionA except that the Attributes of the final tag are returned, not the content.In simpler terms, seeing similar formats, I constructed this Idea:Using tag,2.tagbYou could get information about 'tagb' which resides inside the content of the 2nd 'tag'(which would be "Woohoo" in the lower example)<tag> <tag1> Yipee </tag1> </tag> <tag> <tagb> Woohoo </tagb> </tag>I guess you could write HTML.Body.Table as pseudo:GetContent(GetContent(GetContent($html_file_data, 'HTML'),'Body'),'Table')However, this seems to take a bit of time if the HTML document is large.. Edited April 22, 2008 by crashdemons My Projects - WindowDarken (Darken except the active window) Yahsmosis Chat Client (Discontinued) StarShooter Game (Red alert! All hands to battlestations!) YMSG Protocol Support (Discontinued) Circular Keyboard and OSK example. (aka Iris KB) Target Screensaver Drive Toolbar Thingy Rollup Pro (Minimize-to-Titlebar & More!) 2D Launcher physics example Ascii Screenshot AutoIt3 Quine Example ("Is a Quine" is a Quine.) USB Lock (Another system keydrive - with a toast.)
ptrex Posted April 23, 2008 Posted April 23, 2008 @crashdemons I finally got it the work after your sencond update. But still had to add #include <StaticConstants.au3> to get rid of all the errors. Anyhow. This is a good job !! Same remarks as DaleHohm said. the GUI is bit tiny should be made resizeble. But that is easy to fix. Regards Ptrex Contributions :Firewall Log Analyzer for XP - Creating COM objects without a need of DLL's - UPnP support in AU3Crystal Reports Viewer - PDFCreator in AutoIT - Duplicate File FinderSQLite3 Database functionality - USB Monitoring - Reading Excel using SQLRun Au3 as a Windows Service - File Monitor - Embedded Flash PlayerDynamic Functions - Control Panel Applets - Digital Signing Code - Excel Grid In AutoIT - Constants for Special Folders in WindowsRead data from Any Windows Edit Control - SOAP and Web Services in AutoIT - Barcode Printing Using PS - AU3 on LightTD WebserverMS LogParser SQL Engine in AutoIT - ImageMagick Image Processing - Converter @ Dec - Hex - Bin -Email Address Encoder - MSI Editor - SNMP - MIB ProtocolFinancial Functions UDF - Set ACL Permissions - Syntax HighLighter for AU3ADOR.RecordSet approach - Real OCR - HTTP Disk - PDF Reader Personal Worldclock - MS Indexing Engine - Printing ControlsGuiListView - Navigation (break the 4000 Limit barrier) - Registration Free COM DLL Distribution - Update - WinRM SMART Analysis - COM Object Browser - Excel PivotTable Object - VLC Media Player - Windows LogOnOff Gui -Extract Data from Outlook to Word & Excel - Analyze Event ID 4226 - DotNet Compiler Wrapper - Powershell_COM - New
BrettF Posted April 23, 2008 Posted April 23, 2008 (edited) The check version thing did not seem to work for me... also it would be nice if you could view the DOM for a remote page... Edited April 23, 2008 by Bert Vist my blog!UDFs: Opens The Default Mail Client | _LoginBox | Convert Reg to AU3 | BASS.au3 (BASS.dll) (Includes various BASS Libraries) | MultiLang.au3 (Multi-Language GUIs!)Example Scripts: Computer Info Telnet Server | "Secure" HTTP Server (Based on Manadar's Server)Software: AAMP- Advanced AutoIt Media Player | WorldCam | AYTU - Youtube Uploader Tutorials: Learning to Script with AutoIt V3Projects (Hardware + AutoIt): ArduinoUseful Links: AutoIt 1-2-3 | The AutoIt Downloads Section: | SciTE4AutoIt3 Full Version!
crashdemons Posted April 23, 2008 Author Posted April 23, 2008 I Just updated the Test Script (Viewer)I find the example interface you constructed madeningly tiny, but I like the output.- The GUI has been changed a bit and now allows resizing.But still had to add #include <StaticConstants.au3> to get rid of all the errors.- I didn't need it for some odd reason - but I added it and it still worked, so I left it in this time.it would be nice if you could view the DOM for a remote page...- Added.Full Added List:-GUI resizing and size changes-File>Open and File>Exit menu options-File>Open Remote File option-Edit>Copy Content from expression (use the DOM-Like tag expression to point to a content to copy)-Edit>Copy Attributes from expression (use the DOM-Like tag expression to point to a tag to copy it's attributes)-both files are now zipped as the Test script is an eyesore in the codebox and the board won't let me upload more than one file. My Projects - WindowDarken (Darken except the active window) Yahsmosis Chat Client (Discontinued) StarShooter Game (Red alert! All hands to battlestations!) YMSG Protocol Support (Discontinued) Circular Keyboard and OSK example. (aka Iris KB) Target Screensaver Drive Toolbar Thingy Rollup Pro (Minimize-to-Titlebar & More!) 2D Launcher physics example Ascii Screenshot AutoIt3 Quine Example ("Is a Quine" is a Quine.) USB Lock (Another system keydrive - with a toast.)
BrettF Posted April 25, 2008 Posted April 25, 2008 (edited) Awesome! Just going to try it now! Line 198 really wasn't necessary... Same with 151... Edited April 25, 2008 by Bert Vist my blog!UDFs: Opens The Default Mail Client | _LoginBox | Convert Reg to AU3 | BASS.au3 (BASS.dll) (Includes various BASS Libraries) | MultiLang.au3 (Multi-Language GUIs!)Example Scripts: Computer Info Telnet Server | "Secure" HTTP Server (Based on Manadar's Server)Software: AAMP- Advanced AutoIt Media Player | WorldCam | AYTU - Youtube Uploader Tutorials: Learning to Script with AutoIt V3Projects (Hardware + AutoIt): ArduinoUseful Links: AutoIt 1-2-3 | The AutoIt Downloads Section: | SciTE4AutoIt3 Full Version!
crashdemons Posted April 25, 2008 Author Posted April 25, 2008 Awesome! Just going to try it now! Line 198 really wasn't necessary... Same with 151... huh? DOM_test.au3 151; given the byte value, Unit #, long/short setting, and Unit Set 198 #ce or did you mean HTML_DOM.au3 ? 151 $tlarr[$i]=StringReplace($tlarr[$i],'>','') 198;$content=$tagname <---- Okay this one isn't really useful Eh, Maybe you were just looking at the old copy My Projects - WindowDarken (Darken except the active window) Yahsmosis Chat Client (Discontinued) StarShooter Game (Red alert! All hands to battlestations!) YMSG Protocol Support (Discontinued) Circular Keyboard and OSK example. (aka Iris KB) Target Screensaver Drive Toolbar Thingy Rollup Pro (Minimize-to-Titlebar & More!) 2D Launcher physics example Ascii Screenshot AutoIt3 Quine Example ("Is a Quine" is a Quine.) USB Lock (Another system keydrive - with a toast.)
aGorilla Posted April 25, 2008 Posted April 25, 2008 Great stuff! Any chance you could allow resizing of the tree panels? When you get deep into the tree, you end up having to do a lot of scrolling. Thanks for posting it, hope the headache passes soon Search AutoItScript.com via Google
crashdemons Posted May 6, 2008 Author Posted May 6, 2008 (edited) Updated HTML_DOM.au3 to version 4A problem in version 3 _HTML_GetContent causes _HTML_GetByExpression, _HTML_GetByExpressionA and possibly (but not likely) _HTML_GetTLTags, to return the content of the wrong element.The problem was that the program was not using the right method to locate the element using the occurrence parameter.This has been solved, but forces the script to do a little more work because it must find the content of each previous element of the same name (on the same level) before continuing. Edited May 6, 2008 by crashdemons My Projects - WindowDarken (Darken except the active window) Yahsmosis Chat Client (Discontinued) StarShooter Game (Red alert! All hands to battlestations!) YMSG Protocol Support (Discontinued) Circular Keyboard and OSK example. (aka Iris KB) Target Screensaver Drive Toolbar Thingy Rollup Pro (Minimize-to-Titlebar & More!) 2D Launcher physics example Ascii Screenshot AutoIt3 Quine Example ("Is a Quine" is a Quine.) USB Lock (Another system keydrive - with a toast.)
ratacat Posted April 17, 2013 Posted April 17, 2013 Hi Crashdemons, I've been using your UDF for one of my web scripting projects this winter, and it's excellent! It's saved me a bunch of time and hassle, thankyou!!! I found a weird little anomaly with it that took me a long time to understand, figured I'd share it here incase anyone else comes up against it. Here is my test script + 3 example input files to compare. test script #include <HTML_DOM.au3> $f = FileRead("example-c.html") if ($f == -1) then msgbox(0, "FileRead Failed", @error) endif msgbox(0, "Whole File",$f) ;example-a.html has 5 TD's in the second TR ;example-b.html is identical, except the contents of the 3rd and 4th ;TD have been modified so they are not the same. ;Note what happens when you run this script on the two. for $j = 1 to 7 $expr = "tr,2"&".td,"& $j $sContent = _HTML_GetByExpression($f,$expr) if @error then msgbox(0,"_HTML_GetByExpression", "Error: "& @error) endif msgbox(0,$expr, $sContent) next example input a <TABLE> <TBODY> <TR> <TD>Job ID</TD> <TD>Employee</TD> <TD>Position</TD> <TD>Site</TD> <TD>Description</TD> </TR> <TR> <TD>8394</TD> <TD>HOWELL, BETTY J.</TD> <TD>ITINERANT</TD> <TD>ITINERANT</TD> <TD>ALL DAY</TD> </TR></TBODY></TABLE> example input b <TABLE> <TBODY> <TR> <TD>Job ID</TD> <TD>Employee</TD> <TD>Position</TD> <TD>Site</TD> <TD>Description</TD> </TR> <TR> <TD>8394</TD> <TD>HOWELL, BETTY J.</TD> <TD>ITINERANT</TD> <TD>ITINERANT1</TD> <TD>ALL DAY</TD> </TR></TBODY></TABLE> example input c <TABLE> <TBODY> <TR> <TD>Job ID</TD> <TD>Employee</TD> <TD>Position</TD> <TD>Site</TD> <TD>Description</TD> </TR> <TR> <TD>8394</TD> <TD>HOWELL, BETTY J.</TD> <TD>ITINERANT1</TD> <TD>ITINERANT</TD> <TD>ALL DAY</TD> </TR></TBODY></TABLE> If you run domtest.au3 against all three of those example html snippets, you will see what I mean. The TD's in question happen to be numbers 3 and 4 of the second TR. (tr,2.td,3 + tr,2.td,4) For some reason on example A and C, the script gets 'stuck' reading tr,2.td,4 and mistakenly thinks the value of all following TD's(even ones that don't exist) is also 'ITINERANT'. However in example B I've added a '1' onto tr,2.td,4 and it processes the entire file as you would expect. I hope this helps someone in the future, it was driving me nuts =) I will just be dealing with it by pre-processing the values and modifying them so they are unique.example-a.htmlexample-b.htmlexample-c.html
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now