jackyyll Posted April 11, 2006 Posted April 11, 2006 Okay, so I have a website that's html I need to parse. I need to find : 1. <b>- TEXT -</b> The text between the two things (i think i already have this one down with (<b>- )(.*)( -</b>) but i dont know if multiple <b>'s will affect it .. dont think it will) 2. URLS: <a href="mob.php?id=1723&h=3419142165fc215ce0250faa75b35b01">Bob</a> <a href="world.php?room=2&h=837ea8cec75449736acad26628ded10f&lastroom=1"> East</a> I'm having alot of trouble with the URL's... Cus i needa find the room=* the h=* (dif on both/all links) the id=* the lastroom=* and the link text. I tried this <a href="mob.php\?id=(.*)">(.*)</a> and it just gives me this : 0 => 1723&h=3419142165fc215ce0250faa75b35b01">Bob</a> <a href="mob.php?id=6157&h=a71ec30f6b3b50ecd15be71b3ef1270e">Man in dark gray</a> <a href="world.php?room=2&h=837ea8cec75449736acad26628ded10f&lastroom=1 1 => East Any ideas? :/
Moderators big_daddy Posted April 11, 2006 Moderators Posted April 11, 2006 These seem to work: $a = '<a href="mob.php?id=1723&h=3419142165fc215ce0250faa75b35b01">Bob</a>' $b = '<a href="world.php?room=2&h=837ea8cec75449736acad26628ded10f&lastroom=1"> East</a>' MsgBox(0, "$a", "id: " & StringMid($a, StringInStr($a, 'id=')+3, StringInStr($a, '&')-(StringInStr($a, 'id=')+3))) MsgBox(0, "$a", "h: " & StringMid($a, StringInStr($a, 'h=')+2, StringInStr($a, '">')-(StringInStr($a, 'h=')+2))) MsgBox(0, "$b", "room: " & StringMid($b, StringInStr($b, 'room=')+5, StringInStr($b, '&')-(StringInStr($b, 'room=')+5))) MsgBox(0, "$b", "h: " & StringMid($b, StringInStr($b, 'h=')+2, StringInStr($b, '&', "", 2)-(StringInStr($b, 'h=')+2))) MsgBox(0, "$b", "lastroom: " & StringMid($b, StringInStr($b, 'lastroom=')+9, StringInStr($b, '">')-(StringInStr($b, 'lastroom=')+9)))
Moderators SmOke_N Posted April 11, 2006 Moderators Posted April 11, 2006 I use something like this for my RSS readers:Func _StringBetweenCodeTags($s_String, $s_Start, $s_End) $a_Array = StringRegExp($s_String, '(?:' & $s_Start & ')(.*?)(?:' & $s_End & ')', 3) If @error == 0 Then Return $a_Array Return 0 EndFuncI use FileRead() to get all the info originally, but you could do it a different way... It just needs a string. Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.
Stumpii Posted April 12, 2006 Posted April 12, 2006 Something like this: room=([^&]+)&h=([^&]+)&lastroom=([^"]+)">([^<]+)< Give a man a script; you have helped him for today. Teach a man to script; and you will not have to hear him whine for help.AutoIt4UE - Custom AutoIt toolbar and wordfile for UltraEdit/UEStudio users.AutoIt Graphical Debugger - A graphical debugger for AutoIt.SimMetrics COM Wrapper - Calculate string similarity.
neogia Posted April 12, 2006 Posted April 12, 2006 This puts all the records into an array ("var"="value") where each element alternates either "var" or "value", and each url is separated by an element containing "**end of url**" expandcollapse popup#include <Array.au3> $string = '<a href="mob.php?id=1723&h=3419142165fc215ce0250faa75b35b01">Bob</a><a href="world.php?room=2&h=837ea8cec75449736acad26628ded10f&lastroom=1"> East</a>' Dim $infoArr[1] While StringInStr($string, "?") $results = StringRegExp($string, '(?:\?)(.*?)(\#)(?:=)', 1) If @extended == 1 Then _ArrayAdd($infoArr, $results[0]) $string = StringTrimLeft($string, $results[1]) Else ExitLoop EndIf While 1 $results = StringRegExp($string, '(?:=)(.*?)(\#)(?:&)', 1) If @extended == 1 Then If StringInStr($results[0], ">") == 0 Then _ArrayAdd($infoArr, $results[0]) $string = StringTrimLeft($string, $results[1]) Else ExitLoop EndIf Else ExitLoop EndIf $results = StringRegExp($string, '(?:&)(.*?)(\#)(?:=)', 1) If @extended == 1 Then _ArrayAdd($infoArr, $results[0]) $string = StringTrimLeft($string, $results[1]) Else ExitLoop EndIf WEnd $results = StringRegExp($string, '(?:=)(.*?)(\#)(?:")', 1) If @extended == 1 Then _ArrayAdd($infoArr, $results[0]) $string = StringTrimLeft($string, $results[1]) Else ExitLoop EndIf $infoArr[0] = "**beginning of html**" _ArrayAdd($infoArr, "**end of url**") WEnd _ArrayDisplay($infoArr, "") Hope this helps. [u]My UDFs[/u]Coroutine Multithreading UDF LibraryStringRegExp GuideRandom EncryptorArrayToDisplayString"The Brain, expecting disaster, fails to find the obvious solution." -- neogia
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now