asgarcymed Posted December 6, 2007 Share Posted December 6, 2007 Many files downloaded by eMule (ed2k/Kad) contain, in its name, UniCode characters (such as Chinese, Japanese, Korean, Arabic, Hebraic, Russian) which are seen as "Illegal Characters" by English version of Windows XP's explorer.exe... This causes serious troubles when managing such files... Thus, I would like to get a script to automatically delete such characters from files' name, in order to avoid problems when trying to access them... PS - even when we download an eBook totally written in English, stupidly the files' names contain such unicode/illegal characters... Thanks. Regards. MLMK - my blogging craziness... Link to comment Share on other sites More sharing options...
weaponx Posted December 6, 2007 Share Posted December 6, 2007 Can you paste an example of a string you need modified? Link to comment Share on other sites More sharing options...
asgarcymed Posted December 6, 2007 Author Share Posted December 6, 2007 In Windows XP English, all unicode/illegal character strings appear as a "square" or a "?" Only if someone has the MUI (MultiLingual User Interface) will see the correct Chinese/Japanese/Korean/Arabic/Hebraic/Russian characters ?????? ????? ?????????? => this was my attempt to make a copy-paste... This forum also does not support Chinese/Japanese/Korean/Arabic/Hebraic/Russian characters... MLMK - my blogging craziness... Link to comment Share on other sites More sharing options...
asgarcymed Posted December 6, 2007 Author Share Posted December 6, 2007 PS - if you want to see such characters, you can see Wikipedia in all of these (esoteric) languages... Regards. MLMK - my blogging craziness... Link to comment Share on other sites More sharing options...
weaponx Posted December 6, 2007 Share Posted December 6, 2007 I'm gonna go out on a limb here...maybe: StringRegExpReplace ( "titlestringwithforeigncharacters", "[\x10-\x1F\x21-\x2F\x3A-\x40\x5B-\x60\x80-\xFF]", "") Link to comment Share on other sites More sharing options...
therks Posted December 6, 2007 Share Posted December 6, 2007 I'm gonna go out on a limb here...maybe: StringRegExpReplace ( "titlestringwithforeigncharacters", "[\x10-\x1F\x21-\x2F\x3A-\x40\x5B-\x60\x80-\xFF]", "")I think you should probably have them replaced with something.. even if just dashes, or underscores, etc. Otherwise you could get an error trying to rename a file to nothing (in the case that all the characters are unacceptable). My AutoIt Stuff | My Github Link to comment Share on other sites More sharing options...
weaponx Posted December 6, 2007 Share Posted December 6, 2007 I'm not even sure if thats what the OP is after. If it works, I will leave the replacement at his discretion. Link to comment Share on other sites More sharing options...
asgarcymed Posted December 6, 2007 Author Share Posted December 6, 2007 Thank to all of you for replying! Saunders - you are correct - Chinese/Japanese/Korean/Arabic/Hebraic/Russian characters MUST be replaced by underscores (because of the reason you very well posted)... I still need help about such StringRegExpReplace... Maybe someone who is from one of such countries, and then, must deal with both languages... Thanks. Regards. MLMK - my blogging craziness... Link to comment Share on other sites More sharing options...
asgarcymed Posted December 6, 2007 Author Share Posted December 6, 2007 A "MUST"!... Look at:http://www.isthisthingon.org/unicode/allchars1.phpAll characters are there!My problem is that I do not know how to make the RegExp...Please help!Thank you!Regards. MLMK - my blogging craziness... Link to comment Share on other sites More sharing options...
asgarcymed Posted December 6, 2007 Author Share Posted December 6, 2007 I need to allow: All English/German and Latin (Portuguese/Spanish/French/Italian) letters, lower and upper case [A..Z; À; Ã; É; Ê; Í; Ì; Ó; Ò; Õ; Ñ; Ç] AND !; ""; #; $; %; & @; £; §; {; }; '; «; »; [American and European Keyboard] I very urgently need to kill ALL Chinese, Japanese, Korean, Arabic, Hebraic, Russian characters (all letters are "crazy")... Could this be?: StringRegExpReplace ("", "[^\u0000-\u024F]+", "_") trying: \p{InBasic_Latin}: U+0000..U+007F \p{InLatin-1_Supplement}: U+0080..U+00FF \p{InLatin_Extended-A}: U+0100..U+017F \p{InLatin_Extended-B}: U+0180..U+024F Is there any UniCode and RegEx expert? Thanks. Regards. MLMK - my blogging craziness... Link to comment Share on other sites More sharing options...
weaponx Posted December 6, 2007 Share Posted December 6, 2007 It looks correct to me. What is the problem? Link to comment Share on other sites More sharing options...
asgarcymed Posted December 6, 2007 Author Share Posted December 6, 2007 I am now using "RegExBuddy", a superb Win32 app to work and learn about Regular Expressions...Using Google, I could get a txt file (inside zip attached) which has many, many Chinese characters; and few English characters... I opened it with RegExBuddy, and I tested both RegEx's: [\x10-\x1F\x21-\x2F\x3A-\x40\x5B-\x60\x80-\xFF]and[^\u0000-\u024F]+But the results of test/debug were very confusing...Even more - I got the Windows XP MUI (MultiLingual User Interface) and I installed all languages I already announced (Chinese/Japanese/Korean/Arabic/Hebraic/Russian)...My confusion is now even bigger - some apps can correctly load the Chinese characters (for example), but the majority of apps continue not to deal with such characters (they show "squares" or "???????????" or distorted characters like when we try to read a binary file with a text editor...A big confusion is installed in my brain... Must I have MUI installed ?... What is the best RegEx to kill such characters from files' names? If I have MUI installed, do I need such regex/script?? What should I do to solve this question once and for all?Is there any Chinese/Japanese/Korean/Arabic/Hebraic/Russia person here? If yes, how do you manage the characters' conflicts between your Native Language and English?Help is very appreciated!Thanks in advance.Regards.CHINES.zip MLMK - my blogging craziness... Link to comment Share on other sites More sharing options...
asgarcymed Posted December 6, 2007 Author Share Posted December 6, 2007 PS - If you have problems with the file attached; please see:http://www.xys.org/xys/netters/others/net/wiki2.txtThanks.Regards. MLMK - my blogging craziness... Link to comment Share on other sites More sharing options...
Bowmore Posted December 6, 2007 Share Posted December 6, 2007 I need to allow: All English/German and Latin (Portuguese/Spanish/French/Italian) letters, lower and upper case [A..Z; À; Ã; É; Ê; Í; Ì; Ó; Ò; Õ; Ñ; Ç] AND !; ""; #; $; %; & @; £; §; {; }; '; «; »; [American and European Keyboard] I very urgently need to kill ALL Chinese, Japanese, Korean, Arabic, Hebraic, Russian characters (all letters are "crazy")... Could this be?: StringRegExpReplace ("", "[^\u0000-\u024F]+", "_")oÝ÷ ÚÚòx4÷jH¬ÂÚ¶)ÔûM4ÑO´Ó±t÷jH¶}Rºezg§µO´ÓÍûMOv¤Ëjا^×O´×MûM{Ov¤Ëjا^×O´×ÍûM¸-êÞj| ¨uæ§u ±¥êíN§Ä^ªÝ³ú®¢×çèZ0x0¢¹¢¹Â+aÊ«±©©çâæ(ºf²ç¶py©%Ëh}ÈZ§-z»¶Ø^mè"x§íç%jË-¢fr¨º·±iËkz«¢éÛºÚ"µÍÝ[ÔYÑ^XÙH ][ÝÉ][ÝË ][ÝÖ× ÌÌÎÉ][ÝÈÉÌÍÉI[Ð(éÞßIÌÎNêîÈIÍN×I][ÝË ][Ý×É][ÝÊ "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to build bigger and better idiots. So far, the universe is winning."- Rick Cook Link to comment Share on other sites More sharing options...
asgarcymed Posted December 6, 2007 Author Share Posted December 6, 2007 Bowmore - thank you for replying!... To solve this once and for all, could you please post the complete script (with the correct sequence of different RegEx's)? Please note that today is the first day in my life in that I deal with RegEx's... If you help me, you can be sure I will study this so important subject that I was missing out; from your precious help... Thank you! Regards. MLMK - my blogging craziness... Link to comment Share on other sites More sharing options...
Confuzzled Posted December 15, 2007 Share Posted December 15, 2007 Many files downloaded by eMule (ed2k/Kad) contain, in its name, UniCode characters (such as Chinese, Japanese, Korean, Arabic, Hebraic, Russian) which are seen as "Illegal Characters" by English version of Windows XP's explorer.exe... This causes serious troubles when managing such files... Thus, I would like to get a script to automatically delete such characters from files' name, in order to avoid problems when trying to access them...PS - even when we download an eBook totally written in English, stupidly the files' names contain such unicode/illegal characters...Thanks.Regards.Try the 'cleanup' button in the Mass Rename function in eMule. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now