I have an autoit program that extracts text from all text files in a folder and saves the extracted words in a text file word list.
I need to add an ignore characters option like a black list of words or single characters. Also I'm not sure if the program detects word fragments and spacing in Chinese text, it has to detect spacing in Chinese text so it doesn't extract entire phrases
heres the code
#include <File.au3>
#include <Array.au3>
#include <MsgBoxConstants.au3>
Local $oDictionary = ObjCreate("Scripting.Dictionary")
Local $mypath = @ScriptDir
Local $aFiles = _FileListToArray($mypath, "*.txt", 1, 1)
If @error Then
MsgBox($MB_SYSTEMMODAL, "Error", "No files found")
Exit
Else
MsgBox($MB_SYSTEMMODAL, "Found", $aFiles[0] & " files")
EndIf
Local $aWords
For $i = 1 To $aFiles[0]
$aWords = StringRegExp(FileRead($aFiles[$i]), "[^\s]+", 3) ; change pattern to fit your definition of "word"
Local $iError = @error
If $iError = 0 Then
For $Word In $aWords
$oDictionary.ADD($Word, $Word)
Next
Else
MsgBox($MB_SYSTEMMODAL, "Error", $aFiles[$i] & " - " & $i & @CRLF & "error: " & $iError)
EndIf
Next
$aWords = $oDictionary.Items
FileWrite("saved/words.txt", _ArrayToString($aWords, @CRLF))