andrewz Posted December 8, 2014 Share Posted December 8, 2014 (edited) Hey ;P Dunno how to start of so here is an explanation of what I want to be able to automate: Export data from a website called immobilienscout24.de (A website where people offer properties), for example the name of the owner, where it's located and how much it is. The data is ALWAYS saved at the same location. For instance: <div class="margin-bottom font-line-l"> <span data-qa="contactName" class="font-bold">Herr Thomas und Uschi Westhoff</span> Is there any function in autoIT available to export this kind of data? In this case it would be the name "Herr Thomas und Uschi Westhoff" (Yeah german names haha). I cant seem to find it With exporting I mean just saving this into a variable or clipboard. Here is the link I used for the example: http://www.immobilienscout24.de/expose/78279770 I would be sooooo thankful if anyone could give me an idea on how to start off, as it takes ages to copy paste all the included data into excel by hand. Thanks in advance & best regards Andrewz Edited December 8, 2014 by andrewz Link to comment Share on other sites More sharing options...
MikahS Posted December 8, 2014 Share Posted December 8, 2014 (edited) Have you had a chance to look at the _IE functions reference? EDIT: By the way, welcome to the AutoIt forum! Edited December 8, 2014 by MikahS Snips & Scripts My Snips: graphCPUTemp ~ getENVvarsMy Scripts: Short-Order Encrypter - message and file encryption V1.6.1 ~ AuPad - Notepad written entirely in AutoIt V1.9.4 Feel free to use any of my code for your own use. Forum FAQ Link to comment Share on other sites More sharing options...
andrewz Posted December 8, 2014 Author Share Posted December 8, 2014 Have you had a chance to look at the _IE functions reference? EDIT: By the way, welcome to the AutoIt forum! Thanks you for the welcome And another thanks for the IE functions, I only heard a bit of those but didnt really look into them yet. I will for sure try my best to use them, if I cant, I'll ask ^^ best regards, Andrewz Link to comment Share on other sites More sharing options...
jdelaney Posted December 8, 2014 Share Posted December 8, 2014 Example of and xpath to use in my sig: $xpath = "//span[@data-qa='contactName']" IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window. Link to comment Share on other sites More sharing options...
mikell Posted December 8, 2014 Share Posted December 8, 2014 Sort of .. $txt = BinaryToString(InetRead("http://www.immobilienscout24.de/expose/78279770", 1)) $name = StringRegExpReplace($txt, '(?is).*contactname.*?>([^<]+).*', "$1") Msgbox(0,"", $name) Link to comment Share on other sites More sharing options...
Solution andrewz Posted December 9, 2014 Author Solution Share Posted December 9, 2014 (edited) Example of and xpath to use in my sig: $xpath = "//span[@data-qa='contactName']" Sort of .. $txt = BinaryToString(InetRead("http://www.immobilienscout24.de/expose/78279770", 1)) $name = StringRegExpReplace($txt, '(?is).*contactname.*?>([^<]+).*', "$1") Msgbox(0,"", $name) Thank you both so much! Guess I couldnt have figured that out since I am still a beginner @mikell, that works perfect I'm gonna make a full application out of it to grab all the required data from these properties and then save them into a csv table, that should be easy. It's because im currently doing an internship at an estate agency (I didnt have a choice, would have gone for any IT-company straight away lol) for school and they always export hundrets of properties into excel by copy and paste, which of course takes ages to complete. best regards Edited December 9, 2014 by andrewz Link to comment Share on other sites More sharing options...
andrewz Posted December 9, 2014 Author Share Posted December 9, 2014 (edited) I almost got it now, but there is an error that I dont know how to bypass or fix. Currently, the script only works if all the data is given on the website, if one is missing cuz the owner didnt include it , the script doesnt write down anything and exits. Sooo here it is : expandcollapse popupIf FileExists("Immobilien.csv") =false Then FileWrite("Immobilien.csv","Name;Adresse;Tel;Objekt;Ort;Baujahr;Zi;frei/vermie.;Wfl./ qm;Kaltmiete;Warmmietpreis;Scout- ID"& @CRLF) EndIf #include <Inet.au3> #include <Array.au3> #include <String.au3> Global $mobil_A= "0" Global $telefon_A = "0" Global $url = InputBox("ScoutID","Enter the Scout-ID") Global $content = _INetGetSource($url) Global $name_A = _StringBetween($content, '<span data-qa="contactName" class="font-bold">', '</span>') Global $preis_A = _StringBetween($content, ' "offerPrice": "', '",') Global $strase_A = _StringBetween($content, '<strong class="font-standard">' , '</strong><br/>') Global $telefon_A = _StringBetween($content, '<div class="is24-phone-number hide">' ,'</div>') Global $objekttyp_A = _StringBetween($content, '<dd class="is24qa-wohnungstyp">' ,'</dd>') Global $ort_A = _StringBetween($content, '</strong><br/>' , '<br/>') Global $baujahr_A = _StringBetween($content, '<dd class="is24qa-baujahr">','</dd>') Global $zimmer_A = _StringBetween($content, '<dd class="is24qa-zimmer">','</dd>') Global $bezugsfrei_A = _StringBetween($content, '<dd class="is24qa-bezugsfrei-ab">' ,'</dd>') Global $wohnflache_A = _StringBetween($content, '<dd class="is24qa-wohnflaeche-ca">' ,'</dd>') Global $preiswarm_A =_StringBetween($content, '<strong class="is24qa-gesamtmiete">','</strong>') $aio= $name_A[0]&";"&$strase_A[0]&";"&$telefon_A[0]&";"&$objekttyp_A[0]&";"&$ort_A[0]&";"&$baujahr_A[0]&";"&$zimmer_A[0]&";"&$bezugsfrei_A[0]&";"&$wohnflache_A[0]&";"&$preis_A[0]&",00"&";"&$preiswarm_A[0]&";"&$url $sString1 = StringReplace($aio, " ", "") ;removing spaces -to format it later to csv $sString2 = StringReplace($sString1, "<p>", "") ;removing <p> -useless $sString3 = StringReplace($sString2, "<span>Mobil:</span>", "") ;removing <span>Mobil:</span> -useless $sString4 = StringReplace($sString3, "</p>", "") ;removing </p> - useless $sString5 = StringReplace($sString4, "Â", "") ;removing  from m² $sString6 = StringReplace($sString5, '<spanclass="is24-operator">=</span>', "") ;removing <spanclass="is24-operator">=</span> -useless $sString7 = StringReplace($sString6, "EUR", "") ;removing EUR -useless cuz we will format it later in excel $sString8 = StringReplace($sString7, "<span>Telefon:</span>","") ;removing <span>Telefon:</span> -useless $sStringfinal = StringReplace($sString8, @CRLF, "") ;finally removing @CRLF to get a csv format FileWrite ( "Immobilien.csv", $sStringfinal & @CRLF ) I did it a bit different cuz it was easier for me this way. Now if you try it with this linK: http://www.immobilienscout24.de/expose/78294144 it work perfect. BUT with this link: http://www.immobilienscout24.de/expose/78295011 it exits, cuz of course it cant find the adress for example, which is given in the first link as "Grasserstr. 5" but there is no given in the second link. Is there anyway to skip or make that variable 0 if it cant be found ? Thanks in advance! Edited December 9, 2014 by andrewz Link to comment Share on other sites More sharing options...
somdcomputerguy Posted December 9, 2014 Share Posted December 9, 2014 (edited) This a bit of code that I use in a script of mine. Now if any or all of the fourth field(s) and beyond are blank, the script continues. It doesn't exit. Note that I am using some of the _IE* functions..$oForm = _IEFormGetObjByName($oIE, "cpform") $Spammer[0] = _IEFormElementGetObjByName($oForm, "user[username]") $Spammer[0] = _IEFormElementGetValue($Spammer[0]) $Spammer[1] = _IEFormElementGetObjByName($oForm, "user[email]") $Spammer[1] = _IEFormElementGetValue($Spammer[1]) $Spammer[2] = _IEFormElementGetObjByName($oForm, "user[ipaddress]") $Spammer[2] = _IEFormElementGetValue($Spammer[2]) $Spammer[3] = _IEFormElementGetObjByName($oForm, "user[homepage]") $Spammer[3] = _IEFormElementGetValue($Spammer[3]) $Spammer[4] = _IEFormElementGetObjByName($oForm, "profile[field1]") ;Biography $Spammer[4] = _IEFormElementGetValue($Spammer[4]) $Spammer[5] = _IEFormElementGetObjByName($oForm, "profile[field2]") ;Location $Spammer[5] = _IEFormElementGetValue($Spammer[5]) $Spammer[6] = _IEFormElementGetObjByName($oForm, "profile[field3]") ;Interests $Spammer[6] = _IEFormElementGetValue($Spammer[6]) $Spammer[7] = _IEFormElementGetObjByName($oForm, "profile[field4]") ;Occupation $Spammer[7] = _IEFormElementGetValue($Spammer[7])This is a bit of code from another script that I have written. Note that it uses the native Inet function '_INetGetSource'. If any of the array elements don't exist, the script does not quit. I don't know if either of these 'code bits' will help you, but good luck with your project! Global $Banyan_Calico[5] = ["Registering", "Activating", "Modifying", "Viewing User Profile", "Viewing User Control Panel"], $Quatrain While 1 Local $Source = _INetGetSource("http://forum.powweb.com/online.php?who=members") If StringInStr($Source, "The server is too busy at the moment.") <> 0 Then MsgBox(48 + 4096, "Oh No!!", "Busy server.", 3) ;If text does exist For $a = 0 To UBound($Banyan_Calico) - 1 If StringInStr($Source, $Banyan_Calico[$a], 1) <> 0 Then ;If text does exist SoundPlay(@ScriptDir & "\foghorn.mp3") MsgBox(48 + 4096, @ScriptName, $Banyan_Calico[$a], 3) Whoson() EndIf Next TraySetIcon("hourglass.ico") Timer() WEnd Edited December 9, 2014 by somdcomputerguy - Bruce /*somdcomputerguy */ If you change the way you look at things, the things you look at change. Link to comment Share on other sites More sharing options...
andrewz Posted December 9, 2014 Author Share Posted December 9, 2014 This a bit of code that I use in a script of mine. Now if any or all of the fourth field(s) and beyond are blank, the script continues. It doesn't exit. Note that I am using some of the _IE* functions.. $oForm = _IEFormGetObjByName($oIE, "cpform") $Spammer[0] = _IEFormElementGetObjByName($oForm, "user[username]") $Spammer[0] = _IEFormElementGetValue($Spammer[0]) $Spammer[1] = _IEFormElementGetObjByName($oForm, "user[email]") $Spammer[1] = _IEFormElementGetValue($Spammer[1]) $Spammer[2] = _IEFormElementGetObjByName($oForm, "user[ipaddress]") $Spammer[2] = _IEFormElementGetValue($Spammer[2]) $Spammer[3] = _IEFormElementGetObjByName($oForm, "user[homepage]") $Spammer[3] = _IEFormElementGetValue($Spammer[3]) $Spammer[4] = _IEFormElementGetObjByName($oForm, "profile[field1]") ;Biography $Spammer[4] = _IEFormElementGetValue($Spammer[4]) $Spammer[5] = _IEFormElementGetObjByName($oForm, "profile[field2]") ;Location $Spammer[5] = _IEFormElementGetValue($Spammer[5]) $Spammer[6] = _IEFormElementGetObjByName($oForm, "profile[field3]") ;Interests $Spammer[6] = _IEFormElementGetValue($Spammer[6]) $Spammer[7] = _IEFormElementGetObjByName($oForm, "profile[field4]") ;Occupation $Spammer[7] = _IEFormElementGetValue($Spammer[7]) This is a bit of code from another script that I have written. Note that it uses the native Inet function '_INetGetSource'. If any of the array elements don't exist, the script does not quit. I don't know if either of these 'code bits' will help you, but good luck with your project! Global $Banyan_Calico[5] = ["Registering", "Activating", "Modifying", "Viewing User Profile", "Viewing User Control Panel"], $Quatrain While 1 Local $Source = _INetGetSource("http://forum.powweb.com/online.php?who=members") If StringInStr($Source, "The server is too busy at the moment.") <> 0 Then MsgBox(48 + 4096, "Oh No!!", "Busy server.", 3) ;If text does exist For $a = 0 To UBound($Banyan_Calico) - 1 If StringInStr($Source, $Banyan_Calico[$a], 1) <> 0 Then ;If text does exist SoundPlay(@ScriptDir & "\foghorn.mp3") MsgBox(48 + 4096, @ScriptName, $Banyan_Calico[$a], 3) Whoson() EndIf Next TraySetIcon("hourglass.ico") Timer() WEnd Spammer , anway thanks a lot ! Let's see if I can get this working now... best regards, Andrewz Link to comment Share on other sites More sharing options...
somdcomputerguy Posted December 9, 2014 Share Posted December 9, 2014 Ya, $Spammer[] I chose that variable name since I use the script to get info from another forum that I moderate. That way I don't have to clip/paste all the necessary info individually, which takes quite a long time. BTW, you don't need to quote any or all of my post(s), I know what I have written. Although a partial quote may help someone else know what you are replying about, but again it's not really necessary. - Bruce /*somdcomputerguy */ If you change the way you look at things, the things you look at change. Link to comment Share on other sites More sharing options...
andrewz Posted December 9, 2014 Author Share Posted December 9, 2014 Hmm I can't find out how to solve this error ...the main question is solved anyway. Link to comment Share on other sites More sharing options...
mikell Posted December 9, 2014 Share Posted December 9, 2014 One way to solve the error problem is an error checking - obviously $preis_A = _StringBetween($content, ' "offerPrice": "', '",') $preis = (IsArray($preis_A) = 1) ? $preis_A[0] : "not found" Using this small example, if the _StringBetween fails then the returned result is "not found" instead of nothing Link to comment Share on other sites More sharing options...
andrewz Posted December 10, 2014 Author Share Posted December 10, 2014 (edited) One way to solve the error problem is an error checking - obviously $preis_A = _StringBetween($content, ' "offerPrice": "', '",') $preis = (IsArray($preis_A) = 1) ? $preis_A[0] : "not found" Using this small example, if the _StringBetween fails then the returned result is "not found" instead of nothing Hey, thanks, that will work too I did it this way: If IsArray($preis_A) Then $preis_B = $preis_A[0] Else $preis_B = "not found" EndIf And later on use $preis_B in order to display the data. Which way to you think is better? (Maybe resource consuming related) The one I use or the one you provided? Your's looks shorter so maybe it is better, but I dunno anything about this ... Edited December 10, 2014 by andrewz Link to comment Share on other sites More sharing options...
mikell Posted December 10, 2014 Share Posted December 10, 2014 The 2 ways are exactly the same - have a look in the helpfile at 'ternary operator' for details But one way uses 1 line of code, the other one uses 5 lines Link to comment Share on other sites More sharing options...
Kap Posted January 12, 2015 Share Posted January 12, 2015 Hi All, I've been stuggeling with something simulair the last couple of days (Been browsing the fora for a possible sullotion, array's and such are still kinda new to me..) And the one sullotion above seemed also great one for me... but it doesn't do exactly what it suppose to to. It does create a .csv, every time I run the script it puts in another line, but it doesn't seem to find the info in the HTML/website (all it gives are 0's) So I suspect that the script doesn't read the site or don't seem to find info that I want. Been breaking my head over it all weekend, but can't seem to find where I gone wrong. Here is the script I use to test it and the HTML where I test it with expandcollapse popupHotKeySet("{ESC}", "Terminate") Opt("WinTextMatchMode", 2) ;1=complete, 2=quick Opt("WinTitleMatchMode", 1) ;1=start, 2=subStr, 3=exact, 4=advanced, -1 to -4=Nocase AutoItSetOption("MouseCoordMode", 0) opt("SendKeyDelay",90) opt("WinWaitDelay",35) opt("TrayIconDebug",1) #include <IE.au3> #include <Inet.au3> #include <Array.au3> #include <String.au3> #include <MsgBoxConstants.au3> If FileExists("C:\Data\Auto ITs\check\check.csv") =false Then FileWrite("C:\Data\Auto ITs\check\check.csv","Actief;Lidstaat;nummer;Tijdstip waarop de aanvraag werd ontvangen;Naam;Adres;Cnummer"& @CRLF) EndIf $content = _INetGetSource("C:\Data\Auto ITs\check\Test.htm") $Status = _StringBetween($content, '<span class="validStyle">', "</span></b></td>") $Lidstaat = _StringBetween($content, '<td class="labelStyle">Lidstaat</td> <td>' , '</td>') $nr = _StringBetween($content, '<td class="labelStyle">nummer</td> <td>' , '</td>') $Tijd = _StringBetween($content, '<td class="labelStyle">Tijdstip waarop de aanvraag werd ontvangen</td> <td>' , '</td>') $Naam = _StringBetween($content, '<td class="labelStyle">Naam</td> <td>' , '</td>') $Adres= _StringBetween($content, '<td class="labelStyle">Adres</td> <td>' , '</td>') $Cnummer = _StringBetween($content, '<td class="labelStyle">Cnummer</td> <td>' , '</td>') $aio= $Status&";"&$Lidstaat&";"&$nr&";"&$Tijd&";"&$Naam&";"&$Adres&";"&$Cnummer $sString1 = StringReplace($aio, " ", "") ;removing spaces -to format it later to csv $sString2 = StringReplace($sString1, "<p>", "") ;removing <p> -useless $sString3 = StringReplace($sString2, "<span>Mobil:</span>", "") ;removing <span>Mobil:</span> -useless $sString4 = StringReplace($sString3, "</p>", "") ;removing </p> - useless $sString5 = StringReplace($sString4, "Â", "") ;removing  from m² $sString6 = StringReplace($sString5, '<spanclass="is24-operator">=</span>', "") ;removing <spanclass="is24-operator">=</span> -useless $sString7 = StringReplace($sString6, "EUR", "") ;removing EUR -useless cuz we will format it later in excel $sString8 = StringReplace($sString7, "<span>Telefon:</span>","") ;removing <span>Telefon:</span> -useless $sStringfinal = StringReplace($sString8, @CRLF, "") ;finally removing @CRLF to get a csv format FileWrite ( "check.csv", $sStringfinal & @CRLF ) Func Terminate() Exit 0 EndFunc The HTML test page expandcollapse popup<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Test</title> </head> <body> <a id="top-page" name="top-page"></a> <div id="layout" class="layout"> <div id="header"> <h2>Info</h2> <fieldset> <table id="vatResponseFormTable"> <tr> <td class="labelLeft" colspan="3"><b><span class="validStyle">Ja, correct</span></b></td> </tr> <tr> <td><br /></td> </tr> <tr> <td class="labelStyle">Lidstaat</td> <td>NL</td> <td class="errorFormStyle"></td> </tr> <tr> <td class="labelStyle">nummer</td> <td>820471616gdwsg01</td> </tr> <tr> <td class="labelStyle">Tijdstip waarop de aanvraag werd ontvangen</td> <td>2015/01/12 12:28:03</td> </tr> <tr> <td class="labelStyle">Naam</td> <td>T. Est </td> </tr> <tr> <td class="labelStyle">Adres</td> <td><br />Straat 00189<br />1234AA Stad<br /> </td> </tr> <tr> <td class="labelStyle">Cnummer</td> <td></td> </tr> </table> <br /> <p><a href="backtest.html">Back</a></p> </fieldset> </div> </div> </div> </div> </div> </div> </body> </html> If somebody could point out where I gone wrong or send me in the right direction it would be greatly appreciated Thanks in advanced! -Kap Link to comment Share on other sites More sharing options...
Claire Posted May 24, 2018 Share Posted May 24, 2018 Hi, I have a question on this topic too, I am very new to XPath and I am trying to import the address of an exposé on ImmobilienScout into Google Spreadsheets. I am using this URL as an example: https://www.immobilienscout24.de/expose/104781577 With the function =importxml(URL;//div[@class="address-block"]/div) I get the address, but since there is twice the address on the page I get it twice. I would like to only get it once; I tried many ways of specifying precisely where one of the 2 versions of the address is, but they don't work... any idea? Best, Claire Link to comment Share on other sites More sharing options...
Claire Posted May 24, 2018 Share Posted May 24, 2018 Well... writing my question inspired me to find the answer! I focused on the 2nd version of the address and wrote this: //div[@class="grid-item automatic-width padding-right"]/div and it works! But I am still interested in learning about a way to get data from a precise place in the document, with the example of the 1st version of the address. I think it would be more advanced than the solution I found (?). Link to comment Share on other sites More sharing options...
Claire Posted May 26, 2018 Share Posted May 26, 2018 Hi, Me again! I have a more important question that has been blocking me for quite some time already! Still on the same page on ImmobilienScout (https://www.immobilienscout24.de/expose/104781577), I would like to extract all the images of the flat that is on this page. All their urls look the same : https://pictures.immobilienscout24.de/listings/d7f76101-0cd8-4d6e-8591-fbcf72a38339-1210531259.jpg/ORIG/resize/1106x830>/format/jpg/quality/80 https://pictures.immobilienscout24.de/listings/c82ba552-672f-4d89-bf71-0a54dcae3254-1210531261.jpg/ORIG/resize/1106x830>/format/jpg/quality/80 There are 11 of them in this precise case. I tried all kinds of Xpaths (with the Importxml function on GoogleSheets) to automatically extract all these image urls, but it doesn't work. Sometimes I get 3 urls while there are 11, no idea why! Any idea? Link to comment Share on other sites More sharing options...
mikell Posted May 26, 2018 Share Posted May 26, 2018 (edited) Not so difficult using a regular expression on the source code of the page #Include <Array.au3> $txt = BinaryToString(InetRead("https://www.immobilienscout24.de/expose/104781577")) ; get all img ; $img = StringRegExp($txt, 'https://pictures.immobilienscout24.de/listings/[^"]+', 3) ; specific size $img = StringRegExp($txt, '(https://pictures.immobilienscout24.de/listings/[^"]+?.jpg)[^"]+?1106x830[^"]+', 3) _ArrayDisplay($img) Edited May 26, 2018 by mikell Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now