WebDriver

The W3C WebDriver API is a platform and language-neutral interface and wire protocol allowing programs or scripts to control the behavior of a web browser.

Introduction

WebDriver API

WebDriver enables developers to create automated tests that simulate user interaction. This is different from JavaScript unit tests because WebDriver has access to functionality and information that JavaScript running in the browser doesn't, and it can more accurately simulate user events or OS-level events. WebDriver can also manage testing across multiple windows, tabs and webpages in a single test session.

WebDriver UDF

The WebDriver UDF allows to interact with any browser that supports the W3C WebDriver specifications. Supporting multiple browsers via the same code base is now possible with just a few configuration settings.

Requirements

(Last modified: 2022/01/25)

The following UDFs need to be installed - independent of the Browser you try to automate:

JSON UDF (AutoIt) by Ward and Jos
WinHTTP UDF (AutoIt) by trancexx or autoit-winhttp (GitHub)
WebDriver UDF (GitHub) by Danp2

One of the following Drivers needs to be installed - depending on the Browser type and version you try to automate:

Browser	Download Link	Comments
Chrome	Google	Follow this link to select the correct version depending on the Chrome version you run!
Edge	Microsoft
Firefox	GitHub	Firefox version ≥ 60 is recommended Note: You must still have the Microsoft Visual Studio redistributable runtime installed on your system for the binary to run. This is a known bug in version 0.26 which the authors weren't able fix for this release.
Opera	GitHub	The versioning of OperaDriver matches the Chromium version on which Opera browser is based on.

Limitations

(Last modified: 2020/01/28)
Not all WebDriver functions have been implemented by each browser. To check the status goto the corresponding website below:

Chrome
Edge
Firefox
Opera: "OperaChromiumDriver is a WebDriver implementation derived from ChromeDriver and adapted by Opera". That's why I think it has at least the same limitations as ChromeDriver.

Big Picture

How the browser independent and browser dependent parts fit together:

Technical terms

(Last modified: 2022/02/12)

You will encounter the following technical terms when working with the WebDriver UDF.
These terms are not unique to the UDF, so you will find just the general description in this section. How to use these terms with the WebDriver UDF can be found in the FAQ.

CDP (Chrome DevTools Protocol): Is a protocol that allows for tools to instrument, inspect, debug and profile Chromium, Chrome and other Blink-based browsers.

CSS Selector: Please see Selector.
Locator Strategy: A Locator strategy describes the method to use to search elements. This includes Linktext, Partial Linktext, Tag Name, CSS selector, XPath selector.

Marionette: Marionette is an automation driver for Mozilla’s Gecko engine. It can remotely control either the UI or the internal JavaScript of a Gecko platform, such as Firefox. It can control both the chrome (i.e. menus and functions) or the content (the webpage loaded inside the browsing context), giving a high level of control and ability to replicate user actions. In addition to performing actions on the browser, Marionette can also read the properties and attributes of the DOM.
Marionette consists of two parts: a server which takes requests and executes them in Gecko (the Marionette server ships with Firefox), and a client (the Marionette client ships with the GeckoDriver exe). The client sends commands to the server and the server executes the command inside the browser.
For details please visit this site.

Selector: Selectors are patterns used to select the element(s) you want to process. They are used by
CSS (see this link for Beginners or this link for Advanced) and
XPath (see this link for Beginners and this link for Advanced).

ShadowRoot: The ShadowRoot interface of the Shadow DOM API is the root node of a DOM sub tree that is rendered separately from a document's main DOM tree.
For details please visit this site.

XPath: XPath is a language for navigating in XML documents. XPath is a major element in the XSLT standard and includes over 200 built-in functions.

XQuery: XQuery is a language for querying XML documents.
For details please visit this site.

XSLT: XSLT is a language for transforming XML documents.
For details please visit this site.

Installation

(Last modified: 2022/02/02)

To automate your browser the following installation steps are needed:

Download the files listed in section "Requirements"
Move the UDFs to a directory where SciTE and Autoit can find them:
- wd_Core.au3, wd_helper.au3 , wd_cdp.au3 , wd_capabilities.au3 from the WebDriver UDF
- Json.au3 and BinaryCall.au3 from the JSON UDF
- WinHttp.au3 and WinHttpConstants.au3 from the WinHttp UDF
Move the browser dependent WebDriver to the same directory (with WD_Demo.au3):
- chromedriver.exe (Chrome)
- geckodriver.exe (Firefox)
- msedgedriver.exe (Edge - Chromium) or MicrosoftWebDriver.exe (Edge - EdgeHTML)
- or use "Update" option by choosing one of ComboBox option in WD_Demo.au3
Run WD_Demo.au3 and select "DemoNavigation" to validate the installation.
The result (for Firefox) displayed in the DOS window should be similar to the following:

1577745813519   geckodriver     DEBUG   Listening on 127.0.0.1:4444
1577745813744   webdriver::server       DEBUG   -> POST /session {"capabilities": {"alwaysMatch": {"browserName": "firefox", "acceptInsecureCerts":true}}}
1577745813746   geckodriver::capabilities       DEBUG   Trying to read firefox version from ini files
1577745813747   geckodriver::capabilities       DEBUG   Found version 71.0
1577745813757   mozrunner::runner       INFO    Running command: "C:\\Program Files\\Mozilla Firefox\\firefox.exe" "-marionette" "-foreground" "-no-remote" "-profile" "C:\\ ...
1577745813783   geckodriver::marionette DEBUG   Waiting 60s to connect to browser on 127.0.0.1:55184
1577745817392   geckodriver::marionette DEBUG   Connection to Marionette established on 127.0.0.1:55184.
1577745817464   webdriver::server       DEBUG   <- 200 OK {"value":{"sessionId":"925641bf-6c5d-4fe2-a985-02de9b1c7c74","capabilities":"acceptInsecureCerts":true,"browserName":"firefox", ...

Function reference

(Last modified: 2021/12/09)

The functions are now documented in the CHM help file that comes with the UDF.

Capabilities

WebDriver capabilities are used to communicate the features supported by a given implementation. The local end may use capabilities to define which features it requires the remote end to satisfy when creating a new session. Likewise, the remote end uses capabilities to describe the full feature set for a session.
Details can be found at WebDriver Capabilities (sub page).

Browser related functionality

Google Chrome

ChromeDriver supports "Chrome DevTools Protocol" (CDP) commands (for an explanation of the term and related links please see the Used Terms section).
Details can be found in the CHM Help file that comes with the UDF.

Translate IE UDF to WebDriver

This page is still a work in progress.

Internet Explorer is no longer supported by Microsoft. Therefore, it may be necessary to rewrite existing scripts for another browser using the WebDriver UDF. Below you will find a mapping of the functions of the IE UDF to the functions of the WebDriver UDF.

IE function	WebDriver function	Comments
_IEAction	_WD_ElementAction + _WD_ElementActionEx
_IEAttach	_WD_Attach	Examples can be found here: https://www.autoitscript.com/forum/topic/201537-webdriver-example-scripts-collection/?do=findComment&comment=1495880
_IEBodyReadHTML	_WD_GetSource
_IEBodyReadText
_IEBodyWriteHTML
_IECreate	_WD_CreateSession
_IECreateEmbedded	N/A
_IEDocGetObj
_IEDocInsertHTML
_IEDocInsertText
_IEDocReadHTML	_WD_GetSource
_IEDocWriteHTML
_IEErrorNotify
_IEFormElementCheckBoxSelect	_WD_ElementActionEx($sSession, $sElement, "check")
_IEFormElementGetCollection	_WD_FindElement
_IEFormElementGetObjByName	_WD_GetElementByName
_IEFormElementGetValue	_WD_ElementSelectAction($sSession, $sSelectElement, "value")
_IEFormElementOptionSelect	_WD_ElementSelectAction + _WD_ElementOptionSelect
_IEFormElementRadioSelect	_WD_ElementSelectAction
_IEFormElementSetValue	_WD_SetElementValue
_IEFormGetCollection	_WD_FindElement
_IEFormGetObjByName	_WD_GetElementByName
_IEFormImageClick	_WD_FindElement + _WD_ElementAction($sSession, $sElement, "click")
_IEFormReset
_IEFormSubmit
_IEFrameGetCollection	_WD_FindElement + _WD_FrameEnter	You are not able to get all Frames at once. You must get all that are available in current document context, then go to each frame and find all sub frames.
_IEFrameGetObjByName	_WD_GetElementByName + _WD_FrameEnter
_IEGetObjById	_WD_GetElementById
_IEGetObjByName	_WD_GetElementByName
_IEHeadInsertEventScript
_IEImgClick	_WD_ElementAction($sSession, $sElement, "click")
_IEImgGetCollection	_WD_FindElement
_IEIsFrameSet
_IELinkClickByIndex	_WD_ElementAction($sSession, $sElement, "click")
_IELinkClickByText	_WD_ElementAction($sSession, $sElement, "click")
_IELinkGetCollection	_WD_FindElement
_IELoadWait	_WD_LoadWait
_IELoadWaitTimeout	_WD_SetTimeouts
_IENavigate	_WD_Navigate
_IEPropertyGet	_WD_FindElement + _WD_ElementAction($sSession, $sElement, "property") ???
_IEPropertySet	_WD_FindElement + _WD_ElementAction($sSession, $sElement, "property") ???
_IEQuit	_WD_DeleteSession
_IETableGetCollection	_WD_FindElement
_IETableWriteToArray	_WD_GetTable
_IETagNameAllGetCollection	_WD_FindElement
_IETagNameGetCollection	_WD_FindElement
_IE_Example	N/A	Take a look on wd_demo.au3 script
_IE_Introduction	N/A	Take a look on wd_demo.au3 script
_IE_VersionInfo	$__WDVERSION

Troubleshooting

Debug the WebDriver setup

(Last modified: 2020/07/06)

WinHTTP UDF

Make sure that you are running at least version 1.6.4.2 (currently unreleased, but can be obtained here).

Chrome

Chrome does not start and the DOS window for chromedriver does not get displayed

Problem	Solution	Reference
When running WD_Demo.au3 it does not start up Chrome and does not display the DOS window for chromedriver. When you manually run the chromedriver in a DOS window you get message "[0.023][SEVERE]: CreatePlatformSocket() returned an error: An invalid argument was supplied."	This could be caused by missing execution permission for the network drive. Please ask your IT admin for "Applocker" or "application directory whitelisting". Or run the chrome driver from a local HDD and call _WD_Option to set the location of the webdriver executable. Example: _WD_Option("Driver", "C:\Local\WebDriver\chromedriver.exe")	Stackoverflow

Firefox

Firefox does not start and the DOS window for geckodriver does not get displayed

Problem	Solution	Reference
When running WD_Demo.au3 it does not start up Firefox and does not display the DOS window for geckodriver. When you manually run the geckodriver in a DOS window you get message "geckodriver: error: An invalid argument was supplied. (os error 10022)"	This could be caused by missing execution permission for the network drive. Please ask your IT admin for "Applocker" or "application directory whitelisting". Or run the gecko driver from a local HDD and call _WD_Option to set the location of the webdriver executable. Example: _WD_Option("Driver", "C:\Local\WebDriver\geckodriver.exe")	Stackoverflow

Debug your Script

FAQ

(Last modified: 2022/02/04)

1. How to connect to a running browser instance

Q: How can I connect to a running browser instance?
A: That's described (for Firefox, but should work similar for other browsers) in this post.

2. How to hide the webdriver console

Q: How can I hide the webdriver console?
A: The console can be completely hidden from the start by adding the following line near the beginning of your script:

$_WD_DEBUG = $_WD_DEBUG_None ; You could also use $_WD_DEBUG_Error

You can also control the visibility of the console with the function _WD_ConsoleVisible.

3. How to utilize an existing user profile

Q: Can I use an existing user profile instead of the default behavior of using a new one?
A: This is controlled by your "capabilities" declaration, with each browser using a different method to implement. Here are some examples:

Chrome

$sDesiredCapabilities = '{"capabilities": {"alwaysMatch": {"goog:chromeOptions": {"w3c": true, "args":["--user-data-dir=C:\\Users\\' & @UserName & '\\AppData\\Local\\Google\\Chrome\\User Data\\", "--profile-directory=Default"]}}}}'

MS Edge

$sDesiredCapabilities = '{"capabilities": {"alwaysMatch": {"ms:edgeOptions": {"args": ["user-data-dir=C:\\Users\\' & @UserName & '\\AppData\\Local\\Microsoft\\Edge\\User Data\\", "profile-directory=Default"]}}}}'

Firefox

$sDesiredCapabilities = '{"capabilities":{"alwaysMatch": {"moz:firefoxOptions": {"args": ["-profile", "' & GetDefaultFFProfile() & '"],"log": {"level": "trace"}}}}}'

Func GetDefaultFFProfile()
	Local $sDefault, $sProfilePath = ''

	Local $sProfilesPath = StringReplace(@AppDataDir, '\', '/') & "/Mozilla/Firefox/"
	Local $sFilename = $sProfilesPath & "profiles.ini"
	Local $aSections = IniReadSectionNames ($sFilename)

	If Not @error Then
		For $i = 1 To $aSections[0]
			$sDefault = IniRead($sFilename, $aSections[$i], 'Default', '0')

			If $sDefault = '1' Then
				$sProfilePath = $sProfilesPath & IniRead($sFilename, $aSections[$i], "Path", "")
				ExitLoop
			EndIf
		Next
	EndIf

	Return $sProfilePath
EndFunc

You will also likely need to specify the marionette port:

_WD_Option('DriverParams', '--marionette-port 2828')

4. How to specify location of browser executable

Q: Is it possible to launch a browser installed in a non-standard location?
A: This is controlled by your "capabilities" declaration. Here are some examples:

Chrome

$sDesiredCapabilities = '{"capabilities": {"alwaysMatch": {"goog:chromeOptions": {"w3c": true, "binary":"C:\\Path\\To\\Alternate\\Browser\\chrome.exe" }}}}'

Firefox

$sDesiredCapabilities = '{"desiredCapabilities":{"javascriptEnabled":true,"nativeEvents":true,"acceptInsecureCerts":true,"moz:firefoxOptions":{"binary":"C:\\Path\\To\\Alternate\\Browser\\firefox.exe"}}}'

Alternate Firefox method:

_WD_Option('DriverParams', '--binary "C:\Program Files\Mozilla Firefox\firefox.exe" --log trace ')

5. How to maximize the browser window

Q: Is it possible to maximize the browser window?
A: Simply call the following function:

_WD_Window($sSession, "Maximize")

Make sure to call _WD_Window after the session has been created with _WD_CreateSession.

6. How to specify location of WebDriver executable

Q: Is it possible to launch the WebDriver executable from a specific location?
A: This is controlled by function "_WD_Option". Example:

_WD_Option("Driver", "C:\local\WebDriver\WebDriver.exe")

7. How to retrieve the values of a drop-down list

Q: How to retrieve the values of a drop-down list (<Select> tag)?
A1: Here's a simple way to do it:

$sElement = _WD_FindElement($sSession, $_WD_LOCATOR_ByXPath, "//select[@name='xxx']")
$sText = _WD_ElementAction($sSession, $sElement, 'property', 'innerText')
$aOptions = StringSplit ( $sText, @LF,  $STR_NOCOUNT)
_ArrayDisplay($aOptions)

'xxx' is the name of the drop-down list.

A2: This can now be accomplished using the function _WD_ElementSelectAction:

$sElement = _WD_FindElement($sSession, $_WD_LOCATOR_ByXPath, "//select[@name='xxx']")
$aOptions = _WD_ElementSelectAction ($sSession, $sElement, 'options')
_ArrayDisplay($aOptions)

8. How to run the browser in headless mode (hidden mode)

Q: How do I run the browser in "headless" mode?
A: This is controlled by the Capabilities string that is passed to _WD_CreateSession. Example:

$sDesiredCapabilities = '{"capabilities": {"alwaysMatch": {"goog:chromeOptions": {"w3c": true, "args": ["--headless", "--allow-running-insecure-content"] }}}}'

9. How to configure the UDF to call a user-defined Sleep function

Q: How to configure the UDF to call a user-defined Sleep function, and interact with _WD_WaitElement() and _WD_LoadWait() to make the script more responsive?
A: Try to use: _WD_Option("Sleep") . Example:

#include <ButtonConstants.au3>
#include <GuiComboBoxEx.au3>
#include <GUIConstantsEx.au3>
#include <MsgBoxConstants.au3>
#include <WindowsConstants.au3>
#include "wd_helper.au3"

Global $idAbortTest
Global $WD_SESSION
_Example()

Func _Example()
	SetupChrome()

	; Create a GUI with various controls.
	Local $hGUI = GUICreate("Example")
	Local $idTest = GUICtrlCreateButton("Test", 10, 370, 85, 25)
	$idAbortTest = GUICtrlCreateButton("Abort", 150, 370, 85, 25)

	; Display the GUI.
	GUISetState(@SW_SHOW, $hGUI)

	ConsoleWrite("- TESTING" & @CRLF)

	Local $sFilePath = _WriteTestHtml()

	; Loop until the user exits.
	While 1
		Switch GUIGetMsg()
			Case $idTest
				_WD_Navigate($WD_SESSION, $sFilePath)
				_WD_WaitElement($WD_SESSION, $_WD_LOCATOR_ByXPath, '//a[contains(text(),"TEST")]', 100, 30 * 1000) ; timeout = 50 seconds
				ConsoleWrite("---> @error=" & @error & "  @extended=" & @extended & _
						" : after _WD_WaitElement()" & @CRLF)

			Case $GUI_EVENT_CLOSE
				ExitLoop

		EndSwitch
	WEnd

	; Delete the previous GUI and all controls.
	GUIDelete($hGUI)

EndFunc   ;==>_Example

Func _My_Sleep($iDelay)
	Local $hTimer = TimerInit() ; Begin the timer and store the handle in a variable.
	Do
		Switch GUIGetMsg()
			Case $GUI_EVENT_CLOSE
				ConsoleWrite("! USER EXIT" & @CRLF)
				Exit
			Case $idAbortTest
				Return SetError($_WD_ERROR_UserAbort)
		EndSwitch
	Until TimerDiff($hTimer) > $iDelay
EndFunc   ;==>_My_Sleep

Func _WriteTestHtml($sFilePath = @ScriptDir & "\TestFile.html")
	FileDelete($sFilePath)
	Local Const $sHtml = _
			"<html lang='en'>" & @CRLF & _
			"    <head>" & @CRLF & _
			"        <meta charset='utf-8'>" & @CRLF & _
			"        <title>TESTING</title>" & @CRLF & _
			"    </head>" & @CRLF & _
			"    <body>" & @CRLF & _
			"        <div id='MyLink'>Waiting</div>" & @CRLF & _
			"    </body>" & @CRLF & _
			"    <script type='text/javascript'>" & @CRLF & _
			"    setTimeout(function()" & @CRLF & _
			"    {" & @CRLF & _
			"        // Delayed code in here" & @CRLF & _
			"        document.getElementById('MyLink').innerHTML='<a>TESTING</a>';" & @CRLF & _
			"    }, 20000); // 20000 = 20 seconds" & @CRLF & _
			"    </script>" & @CRLF & _
			"</html>"
	FileWrite($sFilePath, $sHtml)
	Return "file:///" & StringReplace($sFilePath, "\", "/")
EndFunc   ;==>_WriteTestHtml

Func SetupChrome()
	_WD_Startup()
	_WD_Option('Driver', 'chromedriver.exe')
	_WD_Option('Port', 9515)
	_WD_Option('HTTPTimeouts', True)
	_WD_Option('DefaultTimeout', 40001)
	_WD_Option('DriverParams', '--verbose --log-path="' & @ScriptDir & '\chrome.log"')
	_WD_Option("Sleep", _My_Sleep)

	Local $sCapabilities = '{"capabilities": {"alwaysMatch": {"goog:chromeOptions": {"w3c": true, "excludeSwitches": [ "enable-automation"]}}}}'
	$WD_SESSION = _WD_CreateSession($sCapabilities)
	_WD_Timeouts($WD_SESSION, 40002)
EndFunc   ;==>SetupChrome

10. How to keep my WebDriver environment up-to-date

Q: How can I keep my WebDriver environment up-to-date?

A: You have to check the following components:

WebDriver UDF: Function _WD_IsLatestRelease compares local UDF version to latest release on Github. Returns True if the local UDF version is the latest, otherwise False. If you need to update the UDF you have to download it manually.
WebDriver Exe: Function _WD_UpdateDriver checks or updates the Web Driver with newer version, if available.
Browser: Function _WD_GetBrowserVersion returns the version number of the specified browser. If you need to update the Browser you have to download and install it by hand.

11. How to use "Locator strategy" and "Selectors"?

Q: How to use "Locator strategies" and how to find "Selectors"?
A: This UDF supports all locators defined in the Webdriver specifications. Below is a listing of predefined constants:

Locator strategy	Description how to use "Selector"
$_WD_LOCATOR_ByCSSSelector	CSS Selector string (see this site). In CSS, pattern matching rules determine which style rules apply to elements in the HTML DOM document tree.
$_WD_LOCATOR_ByXPath	XPath string (see this site). XPath is a language for addressing parts of an XML document.
$_WD_LOCATOR_ByLinkText	String with exact text of <a> element, which should be used to locate the proper <a> element.
$_WD_LOCATOR_ByPartialLinkText	String with partial text of <a> element, which should be used to locate the proper <a> element.
$_WD_LOCATOR_ByTagName	String that matches the desired element tag name, for example "button" is tag name of this element: <button name="ClickMe">

Q: How can I check XPath and CSSSelector in the Browser?
A: Work in progress....
A: Take a look on this link: How to search by Xpath/Css in Chrome Developer Tools?

12. How to download PDF file automatically?

Q: How to avoid browsers asking what to do with a file after clicking the "Download" button on a website?
A: In FireFox you should add additional capabilites settings:

_WD_CapabilitiesAdd("prefs", "pdfjs.disabled", True)
_WD_CapabilitiesAdd("prefs", "browser.download.folderList", 2)
_WD_CapabilitiesAdd("prefs", "browser.download.dir", $s_Download_dir)
_WD_CapabilitiesAdd("prefs", "browser.helperApps.neverAsk.saveToDisk", "application/pdf,application/octet-stream")
_WD_CapabilitiesAdd("prefs", "browser.helperApps.neverAsk.openFile", "application/pdf,application/octet-stream")
_WD_CapabilitiesAdd("prefs", "browser.download.useDownloadDir", True)

Tools

The following tools will help you to automate your browser:

ChroPath plugin: Makes finding an element by XPath, ID or CSS incredibly easy (Chrome, Firefox, Opera)
SelectorsHub plugin: Next Gen XPath tool to generate, write and verify the XPath and cssSelectors (All browsers)

References