subuddhi Posted July 2, 2021 Share Posted July 2, 2021 (edited) So just in short, i just copy some text from .pdf, and this is it Spoiler as you see there are so many symbol there, especially like this from here [i cant copy that symbol, so check it here] Spoiler so my question is, since this symbol is not selectable, if i copy those sentences from ms.word to here, it will appear like this Spoiler Needless to say, after that he collected and relished everything he could find in the Gos[1]vami’s books that would enhance his bhajana and he became very deeply engrossed in fol[1]lowing the raganuga-marga. He remained absorbed in bhajana almost all the time. His extraor[1]dinary immersion in prema became obvious when he was engaged in sravana-kirtana. Dur[1]ing lila-sravana-kirtana, tears, mucous and saliva would stream from his face, and two Vaisnavas wiping them away could not stop the flow. Once, while sitting on the bank of the Manasa-ganga in deep trance, he fell into the water and remained there for three days. On the fourth day he floated to the surface. When his followers found him and pulled him out of the water, they saw that he was still alive. After they loudly sang nama-kirtana for a long time, he finally returned to external consciousness. From this time on, he was known as “Siddha[1]baba.” so it is very wonderful, that autoitscript forum can transliterate it to "[1]", word denote it as that symbol, but notepad, plain text is unable to transliterate it. i attach the ms.word file, you may try to copy it, but nothing copied, only "space" copied, and if you copied them to notepad, the same only space appear.. my question is how i can find that symbol, because i need to replace that symbol, i am working on transliteration, and if i can't find this symbol, how can i replace it..i want to remove it, so i need to find it then remove it so this is my current script for replacing some romanian letters, i just need to find out the way to detect that symbol, and then how can i remove it? without leaving "any space", as you can see from image above, remove the symbol, and make everything looks nice, that what is wanted. expandcollapse popup#include <File.au3> #include <FileConstants.au3> #include <MsgBoxConstants.au3> Func convert($file) Local $srch = "¸õΩ•@~µ∫˙√†‰∂ˇî®ßÃĀåḌḤĪïùàḶḸṂṆñìṄṚṜṢŚṬāḍḥīḷḹṃṁṇṅṛṝṣśṭūäÇéüöëò" Local $repl = "Sns-aamnhntrdtirsnaadhinhmllmnnnnrrsstadhiiimmnnrrsstuasiutnd" Local $check = FileGetAttrib($file) If StringInStr($check, "D") Then ConsoleWrite("Skipping the directory " & $file & @CRLF) Return Else ConsoleWrite("Parsing file: " & $file & @CRLF) EndIf ; load file content into memory $filereader = FileOpen($file) $content = FileRead($filereader) FileClose($filereader) ; change all characters in memory For $i = 1 To StringLen($srch) $content = StringReplace($content, StringMid($srch, $i, 1), StringMid($repl, $i, 1)) Next ; write back file to disk $filewriter = FileOpen($file, 2) FileWrite($filewriter, $content) FileClose($filewriter) EndFunc ;==>convert ; Display an open dialog to select a list of file(s). Local $sFileOpenDialog = FileOpenDialog("Hold down Ctrl or Shift to choose multiple files.", @ScriptDir & "\", "Au3 (*.au3)", BitOR($FD_FILEMUSTEXIST, $FD_MULTISELECT)) If @error Then Exit MsgBox($MB_SYSTEMMODAL, "", "No file(s) were selected.") ; split up the selected files into an array $sFileOpenDialog = StringSplit($sFileOpenDialog, "|") ; walk through the array, convert one file after another For $file = 1 To $sFileOpenDialog[0] convert($sFileOpenDialog[$file]) Next thats my question, please help guys example word file.docx Edited July 2, 2021 by subuddhi Link to comment Share on other sites More sharing options...
Luke94 Posted July 2, 2021 Share Posted July 2, 2021 Copy the text from the PDF file and run this script. Then paste wherever after running the script. #include <MsgBoxConstants.au3> Global $g_sClipboard Global $g_sReplaceWith = '[1]' $g_sClipboard = ClipGet() ; Retrive text from the clipboard $g_sReplacedString = StringReplace($g_sClipboard, Chr(2), $g_sReplaceWith) ; Replace the symbol with $g_sReplaceWith MsgBox($MB_OK, @ScriptName, $g_sReplacedString) ; Display the new text ClipPut($g_sReplacedString) ; Write the new text to the clipboard You can change $g_sReplaceWith with whatever you want. subuddhi 1 Link to comment Share on other sites More sharing options...
subuddhi Posted July 2, 2021 Author Share Posted July 2, 2021 this is what happen when i am using your script above Spoiler after that, after using the script, modify the clipboard Spoiler i want to ask, if that script intended, to change that symbol to [1] ? but it seems remove it, and leave the space there, as you can see, its okey with the remove, but why there is still "space" Link to comment Share on other sites More sharing options...
Luke94 Posted July 2, 2021 Share Posted July 2, 2021 How does the text appear in the PDF file before copying it? Link to comment Share on other sites More sharing options...
Marc Posted July 2, 2021 Share Posted July 2, 2021 If the special characters only need to get removed, try adding this line after line 26 (below the for loop which changes all characters in memory): $content = StringRegExpReplace($content, '[^[:print:]]', '') Just a guess, but it should simply kill all non-printable characters out of the text. subuddhi 1 Any of my own codes posted on the forum are free for use by others without any restriction of any kind. (WTFPL) Link to comment Share on other sites More sharing options...
Nine Posted July 2, 2021 Share Posted July 2, 2021 I would suggest to display the content of the clipboard in binary. This way you can see exactly the value used to represent that (sequence of) character(s). You can use the Binary() function to perform such a task. Once you know the value, just replace it with "" “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Screen Scraping Multi-Threading Made Easy Link to comment Share on other sites More sharing options...
subuddhi Posted July 2, 2021 Author Share Posted July 2, 2021 (edited) 1 hour ago, Marc said: If the special characters only need to get removed, try adding this line after line 26 (below the for loop which changes all characters in memory): $content = StringRegExpReplace($content, '[^[:print:]]', '') Just a guess, but it should simply kill all non-printable characters out of the text. this will make all formating gone, all new paragraph spacing gone, when we press "enter" there is gap between text, all gone..this is all right, but it is too much, the new paraghraph line should be keep. here the result Spoiler Needless to say, after that he collected and relished everything he could find in the Gosvamis books that would enhance his bhajana and he became very deeply engrossed in following the raganuga-marga. He remained absorbed in bhajana almost all the time. His extraordinary immersion in prema became obvious when he was engaged in sravana-kirtana. During lila-sravana-kirtana, tears, mucous and saliva would stream from his face, and twoVaisnavas wiping them away could not stop the flow. Once, while sitting on the bank of theManasa-ganga in deep trance, he fell into the water and remained there for three days. Onthe fourth day he floated to the surface. When his followers found him and pulled him out ofthe water, they saw that he was still alive. After they loudly sang nama-kirtana for a long time,he finally returned to external consciousness. From this time on, he was known as Siddhababa. what do you think? Edited July 2, 2021 by subuddhi Link to comment Share on other sites More sharing options...
Marc Posted July 3, 2021 Share Posted July 3, 2021 then you should go the way @Nine suggesested. Any of my own codes posted on the forum are free for use by others without any restriction of any kind. (WTFPL) Link to comment Share on other sites More sharing options...
subuddhi Posted July 3, 2021 Author Share Posted July 3, 2021 (edited) i guess it is like this, expandcollapse popup#include <MsgBoxConstants.au3> #include <StringConstants.au3> Example() Func Example() ; Define the string that will be converted later. ; NOTE: This string may show up as ?? in the help file and even in some editors. ; This example is saved as UTF-8 with BOM. It should display correctly in editors ; which support changing code pages based on BOMs. Local Const $sString = "Hello - 你好" ; Temporary variables used to store conversion results. $dBinary will hold ; the original string in binary form and $sConverted will hold the result ; afte it's been transformed back to the original format. Local $dBinary = Binary(""), $sConverted = "" ; Convert the original UTF-8 string to an ANSI compatible binary string. $dBinary = StringToBinary($sString) ; Convert the ANSI compatible binary string back into a string. $sConverted = BinaryToString($dBinary) ; Display the resulsts. Note that the last two characters will appear ; as ?? since they cannot be represented in ANSI. DisplayResults($sString, $dBinary, $sConverted, "ANSI") ; Convert the original UTF-8 string to an UTF16-LE binary string. $dBinary = StringToBinary($sString, $SB_UTF16LE) ; Convert the UTF16-LE binary string back into a string. $sConverted = BinaryToString($dBinary, $SB_UTF16LE) ; Display the resulsts. DisplayResults($sString, $dBinary, $sConverted, "UTF16-LE") ; Convert the original UTF-8 string to an UTF16-BE binary string. $dBinary = StringToBinary($sString, $SB_UTF16BE) ; Convert the UTF16-BE binary string back into a string. $sConverted = BinaryToString($dBinary, $SB_UTF16BE) ; Display the resulsts. DisplayResults($sString, $dBinary, $sConverted, "UTF16-BE") ; Convert the original UTF-8 string to an UTF-8 binary string. $dBinary = StringToBinary($sString, $SB_UTF8) ; Convert the UTF8 binary string back into a string. $sConverted = BinaryToString($dBinary, $SB_UTF8) ; Display the resulsts. DisplayResults($sString, $dBinary, $sConverted, "UTF8") EndFunc ;==>Example ; Helper function which formats the message for display. It takes the following parameters: ; $sOriginal - The original string before conversions. ; $dBinary - The original string after it has been converted to binary. ; $sConverted- The string after it has been converted to binary and then back to a string. ; $sConversionType - A human friendly name for the encoding type used for the conversion. Func DisplayResults($sOriginal, $dBinary, $sConverted, $sConversionType) MsgBox($MB_SYSTEMMODAL, "", "Original:" & @CRLF & $sOriginal & @CRLF & @CRLF & "Binary:" & @CRLF & $dBinary & @CRLF & @CRLF & $sConversionType & ":" & @CRLF & $sConverted) EndFunc ;==>DisplayResults i see that it convert string to binary but how can i decide which one the binary of the symbol and the binary of other word, because it just appear as set/combination of number, could you give an example please? detect any simbol in sentence then remove it then convert again to string Edited July 3, 2021 by subuddhi Link to comment Share on other sites More sharing options...
Nine Posted July 3, 2021 Share Posted July 3, 2021 Lets make it simple. Copy into clipboard a single word containing the symbol you want to get rid of. Then run the following script : ConsoleWrite(ClipGet() & @CRLF) Local $dData = Binary(ClipGet()) ConsoleWrite($dData & @CRLF) What do you get into the console ? subuddhi 1 “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Screen Scraping Multi-Threading Made Easy Link to comment Share on other sites More sharing options...
subuddhi Posted July 3, 2021 Author Share Posted July 3, 2021 (edited) actually every time i want to test ConsoleWrite, i always fail.. please see this Spoiler as you see, i block that symbol in notepad and then ctrl+c, then i go to SciTe and press F5, but there is error notification said error couldnt open input file Edited July 3, 2021 by subuddhi Link to comment Share on other sites More sharing options...
subuddhi Posted July 3, 2021 Author Share Posted July 3, 2021 (edited) okey i get it somehow, this one 0x74657374 oh wrong this one 0x02 Edited July 3, 2021 by subuddhi Link to comment Share on other sites More sharing options...
subuddhi Posted July 3, 2021 Author Share Posted July 3, 2021 Okey problem solved, 0x02 this is what @Luke94 means with this Quote On 7/2/2021 at 5:42 PM, Luke94 said: $g_sReplacedString = StringReplace($g_sClipboard, Chr(2), $g_sReplaceWith) ; Replace the symbol with $g_sReplaceWith then i just make it like this, $g_sReplacedString = StringReplace($g_sClipboard, Chr(2), '') ; Replace the symbol with $g_sReplaceWith and add to my script above, problem solved..haha i just realized i am too lazy.. Thanks nine, i am also learn that binary, and Marc also, and especially Luke Link to comment Share on other sites More sharing options...
subuddhi Posted July 5, 2021 Author Share Posted July 5, 2021 (edited) i am facing new problem, as you see between the word "ki" and "ora", it should be written ki"s"ora, so the letter "s" is gone, then using binary, i found out that it is 0x8D, and converted to number it is 141, so it is Chr(141).. everything is detected, i may remove or replace it to any other letter but the problem when i copy the text in picure above to notepad or to this autoitforum, that Chr(141) gone.."of nava Yugala-kiora" as you can see it is gone, and there is no space even, then how can detect that character and replace it with another? i see only microsoft word able to write that letter as a "Blank space", notepad and this forum is can't, then how i manage to find out that letter and replace it, using this "String Replace", because so far my coding working on notepad..is there any other way? Edited July 5, 2021 by subuddhi Link to comment Share on other sites More sharing options...
Nine Posted July 5, 2021 Share Posted July 5, 2021 Not sure I fully understand your issue. But I believe the problem comes from the encoding you are transfering to has a UTF-8 encoding and any character after chr(127) requires a second byte. You need to setup your receiver as ANSI to see the chr(141). “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Screen Scraping Multi-Threading Made Easy Link to comment Share on other sites More sharing options...
subuddhi Posted July 5, 2021 Author Share Posted July 5, 2021 (edited) okey i manage to solve it, so i replace the letter from memory, clipboard..i dont use notepad #include <MsgBoxConstants.au3> Global $g_sClipboard , $dBinary , $g_sFind ;seperate every letter ;first letter Global $g_sFind = "A00A" Global $g_sReplaceWith = "74" $g_sClipboard = ClipGet() ; Retrive text from the clipboard Global $dBinary = Binary($g_sClipboard) Global $string = String($dBinary) ;MsgBox($MB_OK, "1", $g_sFind) ; Display the new text $g_sReplacedString = StringReplace($string, $g_sFind, $g_sReplaceWith) ; Replace the symbol with $g_sReplaceWith ;second letter Global $g_sFind = "8D" ;MsgBox($MB_OK, "2", $g_sFind) ; Display the new text Global $g_sReplaceWith = "73" $g_sReplacedString = StringReplace($g_sReplacedString, $g_sFind, $g_sReplaceWith) ; Replace the symbol with $g_sReplaceWith ;third letter Global $g_sFind = "9420" ;MsgBox($MB_OK, "3", $g_sFind) ; Display the new text Global $g_sReplaceWith = "6920" $g_sReplacedString = StringReplace($g_sReplacedString, $g_sFind, $g_sReplaceWith) ; Replace the symbol with $g_sReplaceWith ;convert back to string $g_sReplacedString = BinaryToString($g_sReplacedString) ;display and retrive back to clipboard MsgBox($MB_OK, @ScriptName, $g_sReplacedString) ; Display the new text ClipPut($g_sReplacedString) ; Write the new text to the clipboard but i dont know how to make it efficient, i just rewrite multiple line again and again, is it possible to make it like this? Local $srch = "Ÿ§¨‚Œ¸õΩ•@~µ∫˙√†‰∂ˇî®ßÃĀåḌḤĪïùàḶḸṂṆñìṄṚṜṢŚṬāḍḥīḷḹṃṁṇṅṛṝṣśṭūäÇéüöëò" Local $repl = "usrSaSns-aamnhntrdtirsnaadhinhmllmnnnnrrsstadhiiimmnnrrsstuasiutnd" $content = StringReplace($content, StringMid($srch, $i, 1), StringMid($repl, $i, 1)) so i just write all the binary i want to search in 1 line and 1 line for replaced binary..something like that because there is, some 4 character binary and there is some 2 character binary, that makes me little confuse Edited July 5, 2021 by subuddhi Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now