qwert Posted November 24, 2020 Posted November 24, 2020 I've worked with RTF files and rich edit controls for years. But recently I needed to simply extract the text from an RTF file (i.e., without any of the formatting). I found a few suggested methods, but none were as simple as what I would like. A short investigation revealed that although all the necessary pieces are in the standard Au3 function set, there is no macro Extract Text function. What I'm providing below is a working tool for ferreting out and tuning any subtle aspects to any processing you might need. THE IMPORTANT THING TO KNOW IS THIS: the file's ENCODING is key to everything. If you're certain of the file's encoding, then just specify it in your _GUICtrlRichEdit_StreamFromFile() call. If you don't know it, use FileGetEncoding() and use the result in your StreamFrom call. BUT HERE'S THE CAVEAT: determining a file's encoding is tricky. There's a wide range of programs writing RTFs ... and the specification(s) for a file's encoding can be rather loosely implemented. As a result, there's a note in the Au3 function that it will return Binary (code = 16) if the encoding isn't clear. But you can never specify Binary in your StreamFrom call or you will get gibberish. A fairly reliable "rule" is that if your don't get a clear encoding indication—like UTF-8 or UTF-16—then it's pretty safe to assume ANSI for an RTF file on a windows PC ... so replace any code=16 with code=512. Feel free to suggest alternatives ... or ways to make the process more robust. For anyone who's interested, I found this related discussion on StackExchange: link expandcollapse popup#include <GUIConstantsEx.au3> #include <GuiRichEdit.au3> #include <WindowsConstants.au3> #include <WinAPISysWin.au3> #include <String.au3> #include <Array.au3> #include <File.au3> Global $watch = "C:\path\to\file.rtf" ; path to RTF file $hGui = GUICreate("Extract text from RTF", 660, 320, -1, -1) $lblMask = GUICtrlCreateLabel("", 10, 10, 300, 220) GUICtrlSetBkColor($lblMask, $GUI_BKCOLOR_TRANSPARENT) $hRichEdit = _GUICtrlRichEdit_Create($hGui, "This is a test.", 10, 20, 300, 220, BitOR($ES_MULTILINE, $WS_VSCROLL)) $normal = GUICtrlCreateEdit("initial text", 330, 20, 320, 240) $cButton = GUICtrlCreateButton("Process the file", 80, 270, 180, 30) $eButton = GUICtrlCreateButton("Examine first 500", 400, 270, 180, 30) GUICtrlSetState($cButton, $GUI_FOCUS) GUISetState(@SW_SHOW) While True $iMsg = GUIGetMsg() Select Case $iMsg = $GUI_EVENT_CLOSE _GUICtrlRichEdit_Destroy($hRichEdit) ; needed unless script crashes Exit Case $iMsg = $cButton $encoding = FileGetEncoding($watch) If $encoding = 16 Then $encoding = 512 ; MsgBox(0, "Encoding is ", $encoding) _GUICtrlRichEdit_StreamFromFile($hRichEdit, $watch, $encoding) GUICtrlSetData($normal, _GUICtrlRichEdit_GetText($hRichEdit, True)) ConsoleWrite("Processed" & @CRLF) Case $iMsg = $eButton $readText = StringLeft(GUICtrlRead($normal), 500) MsgBox(0, "2: ", $readText & @CRLF & _StringRepeat("-", 80) & @CRLF & _StringToHex($readText)) EndSelect WEnd Tony4219 1
Tony4219 Posted December 28, 2024 Posted December 28, 2024 Thanks for posting this. I haven't been able to get plaintext out of the RichEdit control ... yet. This full example (WORKS!) is helping me to understand that my apparent problem is involving 'encoding'. The snippets I have found so far didn't work for me.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now