Jump to content

Is it possible to get DOM object without browser?


Recommended Posts

IE library supplied with AutoIt allows to have access to DOM object, also there is some custom libraries for Chrome. Bu is it possible to get DOM using AutoIt native functions? Maybe by means of the AutoIt + Curl? What is the mechanizm of request for the DOM? How browsers perform this operation? I know that Curl can pretend to be browser so maybe it is possible to use this approach?

Link to comment
Share on other sites

Link to comment
Share on other sites

9 hours ago, Nine said:

I wanna get to March without a space Shuttle !!!  Maybe explain what you want to achieve without those very basic technical goals...

Exactly. This is the thing that I meant. E.g. Chrome driver or something like that. I mean another tool than browser.

Link to comment
Share on other sites

Chrome Driver (WebDriver) is just another means to interact with a browser.
So what is the need to work without a browser? At least IE comes with the OS.

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

It depends on what you mean by "DOM", you can easily use the InetRead function to just get the HTML and then use something like an XML parser to get the "DOM".

EasyCodeIt - A cross-platform AutoIt implementation - Fund the development! (GitHub will double your donations for a limited time)

DcodingTheWeb Forum - Follow for updates and Join for discussion

Link to comment
Share on other sites

Link to comment
Share on other sites

7 hours ago, TheDcoder said:

It depends on what you mean by "DOM", you can easily use the InetRead function to just get the HTML and then use something like an XML parser to get the "DOM".

Yes I can. And I can parse the page using AutoIt String functions. But there are much more options provided by JavaScipt since I get DOM by means of 

Local $oIE = _IECreate($URL, 0, 1)
$oDoc = $oIE.document

and from this moment you are able to use any of them. For example

$oHtml=$oDoc.GetElementsByTagName("html").item(0)

and a lot of others. The only problem is that in this case your are bounded to IE library supplied with AutoIt or other libraries existed for another browsers. But I do not want to be bounded to any browser. I'd like to know how to build Document Object Model based on the server responce whatever tool I use: WinHttp or InetGet or InetRead. 

Link to comment
Share on other sites

10 hours ago, Vitaliy4us said:

But there are much more options provided by JavaScipt since I get DOM by means of 

10 hours ago, Vitaliy4us said:

The only problem is that in this case your are bounded to IE library supplied with AutoIt or other libraries existed for another browsers. But I do not want to be bounded to any browser.

The JavaScript interface is part of the browser, and all of the required processing is done by the browser, so it doesn't make much sense how all that can function without the DOM object being bound to the browser.

Anyway, I assume the real question is, why do you not want to use the browser? :)

EasyCodeIt - A cross-platform AutoIt implementation - Fund the development! (GitHub will double your donations for a limited time)

DcodingTheWeb Forum - Follow for updates and Join for discussion

Link to comment
Share on other sites

4 hours ago, TheDcoder said:

 

Quote

The JavaScript interface is part of the browser, and all of the required processing is done by the browser, so it doesn't make much sense how all that can function without the DOM object being bound to the browser.

Yes it was like this before Node.js stepped out.

Quote

Anyway, I assume the real question is, why do you not want to use the browser?

I'd like to be free of the browser type. And I believe that it would be much faster to get server response as a string directly from a server bypassing such monster as browser.

Link to comment
Share on other sites

8 hours ago, Vitaliy4us said:

Yes it was like this before Node.js stepped out.

Not to mention that Node.js is basically a wrapper around Chrome's V8 JavaScript engine :)

8 hours ago, Vitaliy4us said:

I'd like to be free of the browser type.

IE is installed in 99% of the Windows installations, so it is a bit universal in that regard, if that is what you meant by being "free of the browser type".

8 hours ago, Vitaliy4us said:

And I believe that it would be much faster to get server response as a string directly from a server bypassing such monster as browser.

No, you are mistaken, the browser itself is the thing which makes the traditional JavaScript DOM interface tick... so to put in it more bluntly, you want the monster's abilities without its negative side effects.

EasyCodeIt - A cross-platform AutoIt implementation - Fund the development! (GitHub will double your donations for a limited time)

DcodingTheWeb Forum - Follow for updates and Join for discussion

Link to comment
Share on other sites

4 minutes ago, TheDcoder said:

Not to mention that Node.js is basically a wrapper around Chrome's V8 JavaScript engine :)

IE is installed in 99% of the Windows installations, so it is a bit universal in that regard, if that is what you meant by being "free of the browser type".

No, you are mistaken, the browser itself is the thing which makes the traditional JavaScript DOM interface tick... so to put in it more bluntly, you want the monster's abilities without its negative side effects.

Yes, you are right. But anyway the question is about the possibility to do this thing, not about what is it for and how it can be done in another way. What do you think about this https://www.codeproject.com/Articles/4586/Web-Data-Extraction-by-Crawling-using-WINHTTP-and ? It looks like the guy offers a real thing? 

Link to comment
Share on other sites

37 minutes ago, Vitaliy4us said:

the question is about the possibility to do this thing

Well, it is theoretically possible to simulate the DOM Interface with your own implementation which does not depend on a browser. No one as done that yet as far as I know.

40 minutes ago, Vitaliy4us said:

What do you think about this https://www.codeproject.com/Articles/4586/Web-Data-Extraction-by-Crawling-using-WINHTTP-and ? It looks like the guy offers a real thing? 

WinHTTP itself depends on IE I think, and it is only used for sending and recieving HTTP requests, not for processing HTML and providing a DOM Interface :)

EasyCodeIt - A cross-platform AutoIt implementation - Fund the development! (GitHub will double your donations for a limited time)

DcodingTheWeb Forum - Follow for updates and Join for discussion

Link to comment
Share on other sites

3 hours ago, TheDcoder said:

Well, it is theoretically possible to simulate the DOM Interface with your own implementation which does not depend on a browser. No one as done that yet as far as I know.

WinHTTP itself depends on IE I think, and it is only used for sending and recieving HTTP requests, not for processing HTML and providing a DOM Interface :)

Not only WinHTTP. Here is a quotation from the article provided by the link:

Quote

When we are done with this, we have the HTML page as a string in the respage buffer. So now the aim is to get a DOM model of this, so that we can operate on the data programmatically like query the nodes, access particular elements and so on. The best way to do DOM manipulation is through the Microsoft provided interfaces IHTMLDocument, IHTMLDocument2, IHTMLDocument3 and IHTMLDocument4. The following code takes data from that buffer and makes an IHTMLDocument2 out of it. We can then use its various methods ( getBody, getInnerHTML, etc. ) to access the DOM or type cast it into a related interface like IHTMLDocument3 and query the nodes in the DOM tree.

The paragraph is followed by the code (I think it should be C++ which I am not familiar with):

// Declare an IHTMLDocument structure
IHTMLDocument2Ptr myDocument; // Declared earlier in the code
HRESULT hr = CoCreateInstance(CLSID_HTMLDocument,NULL,
  CLSCTX_INPROC_SERVER,IID_IHTMLDocument2, (void **)&myDocument);
HRESULT hresult = S_OK;
VARIANT *param;
SAFEARRAY *tmpArray;

// Creates a new one-dimensional array
// for holding the webpage data
tmpArray = SafeArrayCreateVector(VT_VARIANT, 0, 1);
// Convert the buffer into binary string
bstr_t bsData = (LPCTSTR) respage;
hresult = SafeArrayAccessData(sfArray,(LPVOID*) & param);
param->vt = VT_BSTR;
param->bstrVal = bsData;
hresult = myDocument->write(tmpArray);
       // injected code in document structure
hresult = SafeArrayUnaccessData(tmpArray);
SysFreeString(bsData);
if (tmpArray != NULL) {
    SafeArrayDestroy(tmpArray);
}

 

Edited by Vitaliy4us
Link to comment
Share on other sites

I see, IHTMLDocument does look promising, but I am not familiar with it and I do not know if it is better than using the usual IE functions. In any case, it is still part of IE according to microsoft's documentation. By the way, the code resembles C more than C++

EasyCodeIt - A cross-platform AutoIt implementation - Fund the development! (GitHub will double your donations for a limited time)

DcodingTheWeb Forum - Follow for updates and Join for discussion

Link to comment
Share on other sites

One more comment regarding Node.js. Of course I know that "Node.js is basically a wrapper around Chrome's V8 JavaScript engine". But the thing is not the way how it does it, but the result that we get from such approach. And the result is a possibility of using JavaScript outside a browser. And here is one more interesting tool accessible from Node.js - namelly AutoIt module. Does anybody use it? I'm stuck at the installation stage. 

The problem is that there is a dependency from node-gyp that must be installed before. And this module uses Pyton. And it looks like there is another version of Pyton had been installed before. So there is cofiguring problem blocking node-gyp installation. Once the problem would be solved the autoit module could be installed and used fron the node:

var au = require('autoit');
 
au.Init();
au.Run("notepad.exe");
au.WinWait("[Class:Notepad]");
au.Send("Hello, autoit & nodejs!");

Hope that somebody use it and can share the way of solving the problem?

Link to comment
Share on other sites

9 hours ago, Vitaliy4us said:

But the thing is not the way how it does it, but the result that we get from such approach.

In that case, you can achieve the same by using IE, it will give a valid object in 99% of the Windows installations :)

9 hours ago, Vitaliy4us said:

And the result is a possibility of using JavaScript outside a browser.

There are many tools which allow for this, including a certain AutoIt UDF which controls Microsoft's "Chakra Core" JS engine:

By the way, now I think about it, I guess you can use this UDF if Chakra Core comes with the required interfaces to parse an HTML document into a DOM.

EasyCodeIt - A cross-platform AutoIt implementation - Fund the development! (GitHub will double your donations for a limited time)

DcodingTheWeb Forum - Follow for updates and Join for discussion

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...