litlmike Posted July 23, 2007 Share Posted July 23, 2007 (edited) Any help that you can provide would be appreciated. I would like to open a webpage, and create a "ranking" system for the words that appear on that page. For instance, if we went to www.CNN.com the script would create a unique list of all the words on a page and for everytime the same word appears, it will be assigned +1. So if the word "George" appeared on the page 5 times, and the word "Bush" appeared 10 times, the script would output: "Bush 10" "George 5" Followed by the other words that appeared on the page and their scores. Can you tell me how to start working towards my solution. I am not sure how to make the words unique, nor how to give them +1 per occurence. Thanks in advance. #include <IE.au3> $s_Url = "http://www.CNN.com/" $oIE = _IECreate ($s_Url, 1) $oText = _IEBodyReadText ($oIE) $aText = StringSplit ( $oText, " ") For $iCC = 1 To UBound ($aText) -1 MsgBox (0, "", $aText[$iCC],1) Next P.S. I can't get the au3 forum code to appear I can only get the "[ code ] [/ code ]" to work Edited July 24, 2007 by litlmike _ArrayPermute()_ArrayUnique()Excel.au3 UDF Link to comment Share on other sites More sharing options...
mikehunt114 Posted July 23, 2007 Share Posted July 23, 2007 A bit difficult to do, in particular, I noticed StringSplit won't give you each individual word all the time. However, if you figure that out, you should be able to count and add unique strings by doing something like this: #include <Array.au3> #include <IE.au3> $s_Url = "http://www.CNN.com/" $oIE = _IECreate ($s_Url, 1) $oText = _IEBodyReadText ($oIE) Dim $output[1][2] $k = 0 $aText = StringSplit ( $oText, " ") For $i = 0 To UBound($aText) - 1 For $j = 0 To UBound($output) - 1 $match = False If $aText[$i] = $output[$j][0] Then $output[$j][1] += 1 $match = True EndIf Next If $match = False Then ReDim $output[$k + 1][2] $output[$k][0] = $aText[$i] $output[$k][1] = 1 $k += 1 EndIf Next _ArrayDisplay($output) PS. Are you using [ autoit ] and [ /autoit ], without the spaces? IE Dev ToolbarMSDN: InternetExplorer ObjectMSDN: HTML/DHTML Reference Guide[quote]It is surprising what a man can do when he has to, and how little most men will do when they don't have to. - Walter Linn[/quote]--------------------[font="Franklin Gothic Medium"]Post a reproducer with less than 100 lines of code.[/font] Link to comment Share on other sites More sharing options...
litlmike Posted July 24, 2007 Author Share Posted July 24, 2007 Thanks for that, it really helps. The arraydisplay would not work because we are using a 2D array, so I added the 2D version made by big_daddy (I think). I was able to get the words all individually, I just had to replace the @CRLF. Below is the updated code, can anyone think of a way to produce the display in descending order from highest number to lowest? I am open to using Excel instead of array display and it may be better in the long run. expandcollapse popup#include <IE.au3> $s_Url = "http://www.CNN.com/" $oIE = _IECreate ($s_Url, 1) $oText = _IEBodyReadText ($oIE) Global $sTitle Global $iBase Global $sToConsole Dim $output[1][2] $k = 0 $oString = StringReplace ($oText, @CRLF, " ") $aText = StringSplit ( $oString, " ") For $i = 0 To UBound($aText) - 1 For $j = 0 To UBound($output) - 1 $match = False If $aText[$i] = $output[$j][0] Then $output[$j][1] += 1 $match = True EndIf Next If $match = False Then ReDim $output[$k + 1][2] $output[$k][0] = $aText[$i] $output[$k][1] = 1 $k += 1 EndIf Next _ArrayDisplay2D($output); base at 0 to get the [0][0] Func _ArrayDisplay2D($aArray, $sTitle = 'Array Display 2Dim', $iBase = 0, $sToConsole = 1); base at 0 to get the [0][0] ;If $aArray is not an array then 'Return' and Set error... Wish I knew that IsArray was a function about 3 weeks ago! ;Where does $aArray come from? Where is it previously declared? If Not IsArray($aArray) Then Return SetError(1, 0, 0) Local $sHold = 'Dimension 1 Has: ' & UBound($aArray, 1) - 1 & ' Element(s)' & @LF & _ 'Dimension 2 Has: ' & UBound($aArray, 2) - 1 & ' Element(s)' & @LF & @LF ;Loop through the First Dimension of $aArray For $iCC = $iBase To UBound($aArray, 1) - 1 ;Loop through the 2nd Dimension of $aArray (up to the Ubound of the 1st dimension - 1) For $xCC = 0 To UBound($aArray, 2) - 1 ;I think the $iCC and $xCC coorelate to the keys, and $aArray must be $aValues? $sHold &= '[' & $iCC & '][' & $xCC & '] = ' & $aArray[$iCC][$xCC] & @LF Next Next If $sToConsole Then Return ConsoleWrite(@LF & $sHold) ;Display Results. Return MsgBox(262144, $sTitle, StringTrimRight($sHold, 1)) EndFunc ;==>_ArrayDisplay2DoÝ÷ Ú«¨µéÚ PS. Are you using [ autoit ] and [ /autoit ], without the spaces? _ArrayPermute()_ArrayUnique()Excel.au3 UDF Link to comment Share on other sites More sharing options...
mikehunt114 Posted July 24, 2007 Share Posted July 24, 2007 You should be able to get it to work by doing something like this: expandcollapse popup#include <IE.au3> #include <Array.au3> $s_Url = "http://www.CNN.com/" $oIE = _IE_Example() $oText = _IEBodyReadText ($oIE) Global $sTitle Global $iBase Global $sToConsole Dim $output[1][2] $k = 0 $oString = StringReplace ($oText, @CRLF, " ") $aText = StringSplit ( $oString, " ") For $i = 0 To UBound($aText) - 1 For $j = 0 To UBound($output) - 1 $match = False If $aText[$i] = $output[$j][0] Then $output[$j][1] += 1 $match = True EndIf Next If $match = False Then ReDim $output[$k + 1][2] $output[$k][0] = $aText[$i] $output[$k][1] = 1 $k += 1 EndIf Next _ArrayDisplay($output) Dim $newArray[1][2] $max = "" $j = 0 Do For $i = 0 To UBound($output) - 1 If $output[$i][1] > $max Then $max = $output[$i][1] $index = $i EndIf Next ReDim $newArray[$j + 1][2] $newArray[$j][0] = $output[$index][0] $newArray[$j][1] = $output[$index][1] $output[$index][0] = "" $output[$index][1] = "" $j += 1 $max = "" Until $j = UBound($output) _ArrayDisplay($newArray) IE Dev ToolbarMSDN: InternetExplorer ObjectMSDN: HTML/DHTML Reference Guide[quote]It is surprising what a man can do when he has to, and how little most men will do when they don't have to. - Walter Linn[/quote]--------------------[font="Franklin Gothic Medium"]Post a reproducer with less than 100 lines of code.[/font] Link to comment Share on other sites More sharing options...
litlmike Posted July 25, 2007 Author Share Posted July 25, 2007 You should be able to get it to work by doing something like this:I have made some more updates to the script and it is working nicely. However, the first several hundred elements are blank, and because I am still trying to fully comprehend how your array does what it does, I am finding it hard to manipulate. If you run the script exactly as I have posted it here, approx. the first 300 elements will be null/blank, can you figure out what those null elements might be from and how to eliminate them??thanks _ArrayPermute()_ArrayUnique()Excel.au3 UDF Link to comment Share on other sites More sharing options...
Moderators SmOke_N Posted July 25, 2007 Moderators Share Posted July 25, 2007 (edited) If I've read this all right... this seems to be what you are looking for... (Might have to work on the punctuation part in the regexp's, I didn't add them all)#include <array.au3> $sString = "How many words do think a person can count or think of?" $aArray = _StringCountInstances($sString) MsgBox(0, 'Word Count', UBound($aArray)) _ArrayDisplay($aArray) Func _StringCountInstances($sString, $iCase = 1) Local $aArray = StringRegExp($sString, "[\s\.:;,\?\!]*([a-zA-Z0-9-_]+)[\s\.:;,\?\!]*", 3) If Not IsArray($aArray) Then Return SetError(1, 0, 0) _ArrayUnique($aArray, '', 0, $iCase) Local $aReturn[UBound($aArray)] If $iCase Then $iCase = '(?i)' For $iCC = 1 To UBound($aArray) - 1 StringRegExpReplace($sString, '(?s)' & $iCase & '(?m:^|\s|\.|:|;|,|\?|\!)' & $aArray[$iCC] & '(?m:$|\s|\.|:|;|,|\?|\!)', '') $aReturn[$iCC] = $aArray[$iCC] & ' ' & @extended Next Return $aReturn EndFunc Func _ArrayUnique(ByRef $aArray, $vDelim = -1, $iBase = 1, $iCase = '') If Not IsArray($aArray) Then Return SetError(1, 0, 0) If $vDelim = '' Then $vDelim = Chr(01) Local $sHold For $iCC = $iBase To UBound($aArray) - 1 If Not StringInStr($vDelim & $sHold, $vDelim & $aArray[$iCC] & $vDelim, $iCase) Then _ $sHold &= $aArray[$iCC] & $vDelim Next If $sHold Then $aArray = StringSplit(StringTrimRight($sHold, StringLen($vDelim)), $vDelim) Return SetError(0, 0, 0) EndIf Return SetError(2, 0, 0) EndFuncEdit: had to fix the $iCase ... if you don't want it to be case sensitive search then just leave the param blank. Edited July 25, 2007 by SmOke_N Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer. Link to comment Share on other sites More sharing options...
mikehunt114 Posted July 26, 2007 Share Posted July 26, 2007 Can you post the code you modified litlmike? That example I posted works well for me. IE Dev ToolbarMSDN: InternetExplorer ObjectMSDN: HTML/DHTML Reference Guide[quote]It is surprising what a man can do when he has to, and how little most men will do when they don't have to. - Walter Linn[/quote]--------------------[font="Franklin Gothic Medium"]Post a reproducer with less than 100 lines of code.[/font] Link to comment Share on other sites More sharing options...
litlmike Posted August 1, 2007 Author Share Posted August 1, 2007 If I've read this all right... this seems to be what you are looking for... (Might have to work on the punctuation part in the regexp's, I didn't add them all)Lol... man this makes me come into full realization of how poor of a coder I am... Not only is it condensed, it completes it soo much faster! I will plan to use this long term, but until I fully understand all of how your script accomplishes the task, I will have to complete mine, then return to yours. The reason being, for me to modify your script into the final format that I will want, I will have to comprehend how you are doing the same thing, in so much less code! haha! Very well done. Can someone explain the following line of code, I don't grasp the relevance yet. Local $aArray = StringRegExp($sString, "[\s\.:;,\?\!]*([a-zA-Z0-9-_]+)[\s\.:;,\?\!]*", 3)oÝ÷ Ú«¨µéÚ _ArrayPermute()_ArrayUnique()Excel.au3 UDF Link to comment Share on other sites More sharing options...
Moderators SmOke_N Posted August 1, 2007 Moderators Share Posted August 1, 2007 Local $aArray = StringRegExp($sString, "[\s\.:;,\?\!]*([a-zA-Z0-9-_]+)[\s\.:;,\?\!]*", 3) Because words are not just separated by spaces, they may have a punctuation before or after them, most of the examples that you were given would never actually return true results. [\s\.:;,\?\!]* Says to find any space, decimal, colon, semi colon, comma, question mark, or exclamation mark before the start of: ([a-zA-Z0-9-_]+) Which this tells it to find any character A to Z (upper or lower) any number 0 through 9 (didn't know if you wanted numbers too), any hyphen and underscore (as they are considered legal word characters to some), because it's surrounded by parenthesis, whatever is found here will be part of the return. [\s\.:;,\?\!]* This says that the word you just found must follow one of these: any space, decimal, colon, semi colon, comma, question mark, or exclamation mark The ",3" says to return all instances found. Hope you understand now. Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now