bleh Posted July 4, 2013 Share Posted July 4, 2013 Hello The title says it basically: How can i count the number of words in a text? Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted July 4, 2013 Moderators Share Posted July 4, 2013 bleh,Just one of the threads that appeared when I searched the forum. M23P.S. The "Search" facility is at top right. Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area Link to comment Share on other sites More sharing options...
water Posted July 4, 2013 Share Posted July 4, 2013 What do you mean by "text"? A string with space separated words, a Word document ...? My UDFs and Tutorials: Spoiler UDFs: Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki Standard UDFs: Excel - Example Scripts - Wiki Word - Wiki Tutorials: ADO - Wiki WebDriver - Wiki Link to comment Share on other sites More sharing options...
bleh Posted July 4, 2013 Author Share Posted July 4, 2013 Sorry, Melba23. What do you mean by "text"? A string with space separated words, a Word document ...? It can be a string or a text file or whatever. The text is a normal text with punctuation etc. Just like a Wikipedia entry for example. Link to comment Share on other sites More sharing options...
water Posted July 4, 2013 Share Posted July 4, 2013 Search the forum for one of the word count examples as Melba suggested. Then decide which characters should denote a new word (space, hyphen, @CRLF, @LF ...). My UDFs and Tutorials: Spoiler UDFs: Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki Standard UDFs: Excel - Example Scripts - Wiki Word - Wiki Tutorials: ADO - Wiki WebDriver - Wiki Link to comment Share on other sites More sharing options...
UEZ Posted July 4, 2013 Share Posted July 4, 2013 I don't know how accurate this code is: $sText = "GDI + allows you to easily manipulate 2D, without having to select from the dc, the pen, the font, so the brush can restore the DC to its original state before returning (it's hard enough with GDI). We can manipulate images, scale, rotate, translate, shear, or mix these functions quite easily. The pen has several functions to make indents. You can define preset tips for lines such as the tips of arrows or creating custom tips. The brushes are quite numerous and diverse, we can make very easily degraded by passing the color of departure and arrival." MsgBox(0, "Test", $sText & @CRLF & @CRLF & "Number of words: " & CountWords($sText)) Func CountWords($sText) Local $aResult = StringRegExp($sText, "(\w+)", 3) If @error Then Return SetError(1, 0, 0) Return UBound($aResult) EndFunc Br, UEZ Please don't send me any personal message and ask for support! I will not reply! Selection of finest graphical examples at Codepen.io The own fart smells best! ✌Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!¯\_(ツ)_/¯ ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ Link to comment Share on other sites More sharing options...
water Posted July 4, 2013 Share Posted July 4, 2013 According to the help file of StringRegExp when using "(w+)" a word is a consecutive number of this characters: a-z, A-Z, 0-9 or underscore (_) My UDFs and Tutorials: Spoiler UDFs: Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki Standard UDFs: Excel - Example Scripts - Wiki Word - Wiki Tutorials: ADO - Wiki WebDriver - Wiki Link to comment Share on other sites More sharing options...
UEZ Posted July 4, 2013 Share Posted July 4, 2013 Are these 3 words: " * * *."? What is the definition of a word? Br, UEZ Please don't send me any personal message and ask for support! I will not reply! Selection of finest graphical examples at Codepen.io The own fart smells best! ✌Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!¯\_(ツ)_/¯ ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ Link to comment Share on other sites More sharing options...
water Posted July 4, 2013 Share Posted July 4, 2013 Let's see how the OP defines "word". My UDFs and Tutorials: Spoiler UDFs: Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki Standard UDFs: Excel - Example Scripts - Wiki Word - Wiki Tutorials: ADO - Wiki WebDriver - Wiki Link to comment Share on other sites More sharing options...
mikell Posted July 4, 2013 Share Posted July 4, 2013 (edited) Are these 3 words: " * * *."? As there is neither a space nor an apostrophe I would say one... What about StringRegExp($sText, "[^s']+", 3) ? Edit assuming there is no typo or grammatical mistake in the text Edited July 4, 2013 by mikell Link to comment Share on other sites More sharing options...
UEZ Posted July 4, 2013 Share Posted July 4, 2013 @mikell: it has spaces (underscore)-> "_*_*_*." Br, UEZ Please don't send me any personal message and ask for support! I will not reply! Selection of finest graphical examples at Codepen.io The own fart smells best! ✌Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!¯\_(ツ)_/¯ ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ Link to comment Share on other sites More sharing options...
jchd Posted July 4, 2013 Share Posted July 4, 2013 (edited) We didn't have a chance to see what "word" means to the OP. OTOH current PCRE implementation is compiled without PCRE_UCP, sadly (yes I do heavily insist on that). As a bad consequence, the easy way, b, can't be used in the general case. Hopefully PCRE is kind enough to provide us with Unicode-wide h and v (and their negation, resp. H and V) to match horizontal and vertical "spaces" but the problem now shifts to detecting punctuation, which would require UCP support to be of general use. So once again we're stuck with half-backed solutions which work for (?i)[a-z] English only despite AutoIt claiming to support Unicode (which it doesn't really). Edited July 4, 2013 by jchd This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
bleh Posted July 5, 2013 Author Share Posted July 5, 2013 (edited) Let's see how the OP defines "word". "A single distinct meaningful element of speech or writing, used with others (or sometimes alone) to form a sentence" I don't know whats unclear. I want to count the number of words in texts using AutoIt. For example: The quote above are seven words. The punctuation (" and .) would not count as a word. The link of Melba23 seems to work fine, btw. Thanks to you all. Edited July 5, 2013 by bleh Link to comment Share on other sites More sharing options...
orbs Posted July 5, 2013 Share Posted July 5, 2013 how about StringSplit with the delimiters string being a combination of common separators, like space & @TAB & @CR & @CRLF & @LF and whatever. Signature - my forum contributions: Spoiler UDF: LFN - support for long file names (over 260 characters) InputImpose - impose valid characters in an input control TimeConvert - convert UTC to/from local time and/or reformat the string representation AMF - accept multiple files from Windows Explorer context menu DateDuration - literal description of the difference between given dates Apps: Touch - set the "modified" timestamp of a file to current time Show For Files - tray menu to show/hide files extensions, hidden & system files, and selection checkboxes SPDiff - Single-Pane Text Diff Link to comment Share on other sites More sharing options...
jchd Posted July 5, 2013 Share Posted July 5, 2013 "A single distinct meaningful element of speech or writing, used with others (or sometimes alone) to form a sentence" I don't know whats unclear. I want to count the number of words in texts using AutoIt. If that's the definition you're going to use, then you're in deeper problem than you seem to think. Please apply your own word count definition to the following translations of the very same (modulo Google translate accuracy!) sentence, all of them matching your definition. "This is a well-formed sentence that meets the definition." (english) "Voilà une phrase bien formée qui répond à la définition." (french) "這是一個結構完整的句子,符合定義。" (traditional chinese) "זהו משפט בנוי היטב שעונה על ההגדרה." (hebrew) "นี่คือประโยคที่ดีขึ้นที่ตรงตามคำนิยาม" (thaï) "これは、定義を満たす整形文です。" (japanese) "இந்த வரையறையை சந்திக்கிறது என்று ஒரு நன்கு வடிவமைக்கப்பட்ட சொற்றொடர் உள்ளது." (tamul) ... and so many others I won't bother to list, but you get the idea. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
bleh Posted July 5, 2013 Author Share Posted July 5, 2013 Seriously? Why pick it apart? What's the point? Here you go: I want to count the words of formatted, punctuated and well written texts based on the latin script. What's the "Latin script"? http://en.wikipedia.org/wiki/Latin_script What's a "word"? https://en.wikipedia.org/wiki/Word What's "punctuation"? http://en.wikipedia.org/wiki/Punctuatin "What does a text with words look like?" Like this: One, two and three. "But how many words would that be?" 4. Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted July 5, 2013 Moderators Share Posted July 5, 2013 bleh, Why pick it apart? What's the point?Because we get so many people who ask a question and then complain about proposed solutions becasue they do not meet the special cases that they omitted to mention at the beginning. Thank you for explaining clearly what it you are asking. Does the thread to which I linked you in post #2 not do what you need? M23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area Link to comment Share on other sites More sharing options...
bleh Posted July 5, 2013 Author Share Posted July 5, 2013 (edited) Does the thread to which I linked you in post #2 not do what you need? From my ealier reply (#13): The link of Melba23 seems to work fine, btw. Thanks to you all. Edited July 5, 2013 by bleh Link to comment Share on other sites More sharing options...
jchd Posted July 5, 2013 Share Posted July 5, 2013 Well, if by latin script you intend a-z and A-Z, then yes. But your ironic answer is the first time where "latin" was used. So much for precision. Please recognize that your previous wordings were overly ambiguous. I wasn't picky, just trying to read your mind at a distance. This forum receives posts from worldwide users so unless a precise context is clear, answers should be as general as possible. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
bleh Posted July 5, 2013 Author Share Posted July 5, 2013 (edited) Well, maybe you should read before you reply. Since everyone with common sense could understand what i meant with words and text after reading what i wrote. Especially the example in #13 and even more so after i wrote that Melbas example worked. Please recognize that it's also save to assume that i would have said it if i wanted to count words in traditional Chinese, Egyptian hieroglyphs or Martian. Edited July 5, 2013 by bleh Link to comment Share on other sites More sharing options...
Recommended Posts