Seminko Posted February 7, 2020 Posted February 7, 2020 I'm doing a GET request and the data returned is HTML encoded. I need it decoded to a readable string. I found dozens of topics but none that would work well for all the chars. Example of what I found: Func DecodeHTMLChars($s) $t = Execute("'" & StringRegExpReplace($s, "(&#)(\d+)(;)", "' & ChrW($2) & '") & "'") Return $t EndFunc This only works for the basic entities. However, I have entities like these: Spoiler expandcollapse popup’ é — á í Č ä – ï … š č ô ý Ú ě ž ů “ ” ñ Ş è ¡ ² ı ̇ Ç ü ó ö ‘ ã 
 ​ ¿ ğ İ ş 반 드 시 잡 는 다 - ê ± â € ™ ¯ ́ & Ł ł ś ę õ Have I missed an UDF? Can anyone point me in the right direction?
Nine Posted February 7, 2020 Posted February 7, 2020 For what I see, each non &#xYYYY; matches a single char. You simply need to make a select case for all the possibilities.,, “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Screen Scraping Multi-Threading Made Easy
Danp2 Posted February 7, 2020 Posted February 7, 2020 These appear to be unicode characters. Maybe this will help -- Latest Webdriver UDF Release Webdriver Wiki FAQs
Seminko Posted February 7, 2020 Author Posted February 7, 2020 37 minutes ago, Danp2 said: These appear to be unicode characters. Maybe this will help -- I've checked this one. This only works for deciding URL encoded chars like 'ka%C5%A1tan' 39 minutes ago, Nine said: For what I see, each non &#xYYYY; matches a single char. You simply need to make a select case for all the possibilities.,, I was worried that was the only solution. But TBH it surpises me, though, since this operation is more than common. Thank both
Nine Posted February 7, 2020 Posted February 7, 2020 3 minutes ago, Seminko said: since this operation is more than common Agree with you. On the other hand, it is quite an easy (but tedious) task, nobody felt it is worth an UDF. It would kind of nice from you, if you could post the solution you are creating in here or in the examples section. “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Screen Scraping Multi-Threading Made Easy
Gianni Posted February 8, 2020 Posted February 8, 2020 just a quick and wild test using the browser control as a decoder. it seems to work, the returned string is OK, but why when placing that string into the Edit control using GUICtrlSetData @cr and @lf are lost? to test, paste an entity string in the upper input and click button to decode it. expandcollapse popup#include <GUIConstantsEx.au3> #include <EditConstants.au3> Global $oIE_Server _Example() Func _Example() Local $hW = GUICreate("Entity decoder", 470, 445, 230, 134) Local $Edit1 = GUICtrlCreateEdit("", 8, 8, 450, 200, BitOR($ES_MULTILINE, $ES_WANTRETURN, $ES_AUTOVSCROLL)) GUICtrlSetData(-1, '𝒜𝓊𝓉ℴℐ𝓉' & @CRLF & _ '☎ ♀ ♂ ♠ ♣ ♥ ♦ ' & @CRLF & _ '𝒜𝓊𝓉ℴℐ𝓉 ☎ ♀ ♂ ♠ ♣ ♥ ♦' & @CRLF & @CRLF) Local $Edit2 = GUICtrlCreateEdit("", 8, 238, 450, 200, BitOR($ES_MULTILINE, $ES_WANTRETURN, $ES_AUTOVSCROLL)) GUICtrlSetFont(-1, 12) $Button1 = GUICtrlCreateButton("Decode Entity", 8, 210, 450, 25) GUISetState(@SW_SHOW) _SetupDecoder() Local $hDecoder = $oIE_Server.document.parentwindow.d3c0d3r While 1 $nMsg = GUIGetMsg() Switch $nMsg Case $GUI_EVENT_CLOSE Exit Case $Button1 $hDecoder.innerHTML = GUICtrlRead($Edit1) $sstr = $hDecoder.value MsgBox(0, 0, $sstr) ; ??? using GUICtrlSetData @cr are lost ??? GUICtrlSetData($Edit2, $sstr); $hDecoder.value) EndSwitch WEnd EndFunc ;==>_Example Func _SetupDecoder() $oIE_Server = ObjCreate("Shell.Explorer.2") GUICtrlCreateObj($oIE_Server, -10, -10, 5, 5) Sleep(3000) $oIE_Server.navigate("about:blank") Local $sHTML = _ '<!DOCTYPE html>' & @CRLF & _ '<html>' & @CRLF & _ ' <head>' & @CRLF & _ ' <meta http-equiv="X-UA-Compatible" content="IE=edge">' & @CRLF & _ ' </head>' & @CRLF & _ ' <body>' & @CRLF & _ '<textarea id="d3c0d3r"cols="10" wrap="hard"> </textarea>' & @CRLF & _ ' </body>' & @CRLF & _ '</html>' $oIE_Server.document.Write($sHTML) ; inject lising directly to the HTML document: $oIE_Server.document.close() ; close the write stream $oIE_Server.document.execCommand("Refresh") EndFunc ;==>_SetupDecoder jguinch 1 Chimp small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....
argumentum Posted February 8, 2020 Posted February 8, 2020 about https://www.autoitscript.com/forum/topic/51084-html-entity-udf/ ? Follow the link to my code contribution ( and other things too ). FAQ - Please Read Before Posting.
Seminko Posted February 9, 2020 Author Posted February 9, 2020 I actually started creating a function for this only to find out that the 27k html entities I scraped were not enough to decode everything. After a couple of hours, I decided to save myself the hustle, and more importantly the time, which is in short supply, now more then ever, and used Python - solved with one line of code... Thank you all for the ideas, though!
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now