orichec Posted March 5, 2016 Share Posted March 5, 2016 Hello Guys, I build a autoit script to save source codes using give URL, then use _StringBetween Command to extract my desired content from saved source code's text file. I done everything upto my desires but I want to use _StringBetween command on conditional basis like text the extract from source codes having some kind for HTML parameters and URLs like <<img alt="image" src="https://en.wiki***.org/*****.jpg">><br><br><b> and similar other parameters like <div></div><p></p>, etc... I want to only copy text file source file after removing all the URL and other html based parameters but don't want to remove <b> these type of parameters because I want to save text with its original like breaks so <b> will help me to keep it in text to use to give breaks in my desired text... waiting for kind response...Thanks in advance... Sample text from _Stringbetween is looking like this; <img alt="image" src="//imghost.com/id_rk3FyZ.jpg"><br><br><b>Philosophy of Education and Its Importance</b><br><br>Behind every school and every teacher is a set of related beliefs--a philosophy of education--that influences what and how students are taught.<br><br>A philosophy of education represents answers to questions about the purpose of schooling, a teacher's role, and what should be taught and by what methods.<br> <br><span style="color: red;"><b>Note</b></span></b></span><br><br>As an academic field, philosophy of education is "the philosophical study of education and its problems...its central subject matter is education, and its methods are those of philosophy". "The philosophy of education may be either the philosophy of the process of education or the philosophy of the discipline of education.<br><br><a class="ajaxLink"href="https://en.wikipe***.org/wiki/Philosophy_of_edu****" rel="nofollow">Homepage</a></div> Codes to to generate specified data from source code's file is given hereunder; ;Starting fetching source codes from a given URL (URL placed in excel sheet at cell "C1") $IE = _IECreate(_Excel_RangeRead ($oWorkbook1, Default, "C1"), 0, 0 ) $source = _IEDocReadHTML($IE) FileWrite($file, $source) ; Extracting text from saved source code file and save it in a separate file "text01.txt" $target_source = _StringBetween($source, 'desc">', '<h2>') If Not @error Then FileWrite (@scriptdir & "\text01.txt", $target_source[0]) Link to comment Share on other sites More sharing options...
mikell Posted March 5, 2016 Share Posted March 5, 2016 Could something like this be correct ? $txt = '<img alt="image" src="//imghost.com/id_rk3FyZ.jpg"><br><br><b>Philosophy of Education and Its Importance</b><br><br>Behind every school and every teacher is a set of related beliefs--a philosophy of education--that influences what and how students are taught.<br><br>A philosophy of education represents answers to questions about the purpose of schooling, a teacher''s role, and what should be taught and by what methods.<br> <br><span style="color: red;"><b>Note</b></span></b></span><br><br>As an academic field, philosophy of education is "the philosophical study of education and its problems...its central subject matter is education, and its methods are those of philosophy". "The philosophy of education may be either the philosophy of the process of education or the philosophy of the discipline of education.<br><br><a class="ajaxLink"href="https://en.wikipe***.org/wiki/Philosophy_of_edu****" rel="nofollow">Homepage</a></div>' $txt = StringReplace($txt, "<br>", @crlf) $txt = StringRegExpReplace($txt, '(?s)<a.*?</a>|<.*?>', "") Msgbox(0,"", $txt) orichec 1 Link to comment Share on other sites More sharing options...
orichec Posted March 6, 2016 Author Share Posted March 6, 2016 9 hours ago, mikell said: Could something like this be correct ? $txt = '<img alt="image" src="//imghost.com/id_rk3FyZ.jpg"><br><br><b>Philosophy of Education and Its Importance</b><br><br>Behind every school and every teacher is a set of related beliefs--a philosophy of education--that influences what and how students are taught.<br><br>A philosophy of education represents answers to questions about the purpose of schooling, a teacher''s role, and what should be taught and by what methods.<br> <br><span style="color: red;"><b>Note</b></span></b></span><br><br>As an academic field, philosophy of education is "the philosophical study of education and its problems...its central subject matter is education, and its methods are those of philosophy". "The philosophy of education may be either the philosophy of the process of education or the philosophy of the discipline of education.<br><br><a class="ajaxLink"href="https://en.wikipe***.org/wiki/Philosophy_of_edu****" rel="nofollow">Homepage</a></div>' $txt = StringReplace($txt, "<br>", @crlf) $txt = StringRegExpReplace($txt, '(?s)<a.*?</a>|<.*?>', "") Msgbox(0,"", $txt) Thanks mate it's working super fine. But It show results as text in text box but when i tried to save it using Filewrite command it'll only show text with lot of TAB based spaces... 1 more thing i want to remove everything from my text like URLs, <b>,<div>,<img> tags except <br> because whenever i used this saved text through source codes it'll not showing where is line ends and it isn't. So please guide how can it possible to remove all html codes and URLs and only keep <br> tag where ever it is placed in my text source code file + plus removal of TAB based spaces from our saved text.... Saved Text file with extra spaces saved by our command is attached herewith for ready reference; html.txt Link to comment Share on other sites More sharing options...
AutoBert Posted March 6, 2016 Share Posted March 6, 2016 The solution of mikell work's fine: Link to comment Share on other sites More sharing options...
orichec Posted March 6, 2016 Author Share Posted March 6, 2016 40 minutes ago, AutoBert said: The solution of mikell work's fine: Yeah, I know that codes shared by mikell are working fine but my questions is that end results saved in the form of text contains several TAB Spaces and it's only plain text saved in text file I'm looking for text that should be on HTML format only having <br> tags so new line can easily b traced through HTML compiler when it used through HTML source. Please read my above post carefully, in this way, I think you can understand whole scenario and my point of view. Link to comment Share on other sites More sharing options...
AutoBert Posted March 6, 2016 Share Posted March 6, 2016 And where's the difficult to replace back the @CrLf to <br>? $txt = '<img alt="image" src="//imghost.com/id_rk3FyZ.jpg"><br><br><b>Philosophy of Education and Its Importance</b><br><br>Behind every school and every teacher is a set of related beliefs--a philosophy of education--that influences what and how students are taught.<br><br>A philosophy of education represents answers to questions about the purpose of schooling, a teacher''s role, and what should be taught and by what methods.<br> <br><span style="color: red;"><b>Note</b></span></b></span><br><br>As an academic field, philosophy of education is "the philosophical study of education and its problems...its central subject matter is education, and its methods are those of philosophy". "The philosophy of education may be either the philosophy of the process of education or the philosophy of the discipline of education.<br><br><a class="ajaxLink"href="https://en.wikipe***.org/wiki/Philosophy_of_edu****" rel="nofollow">Homepage</a></div>' $txt = StringReplace($txt, "<br>", @crlf) $txt = StringRegExpReplace($txt, '(?s)<a.*?</a>|<.*?>', "") $txt = StringReplace($txt, @crlf, "<br>") Msgbox(0,"replaced back", $txt) orichec 1 Link to comment Share on other sites More sharing options...
orichec Posted March 6, 2016 Author Share Posted March 6, 2016 2 hours ago, AutoBert said: And where's the difficult to replace back the @CrLf to <br>? $txt = '<img alt="image" src="//imghost.com/id_rk3FyZ.jpg"><br><br><b>Philosophy of Education and Its Importance</b><br><br>Behind every school and every teacher is a set of related beliefs--a philosophy of education--that influences what and how students are taught.<br><br>A philosophy of education represents answers to questions about the purpose of schooling, a teacher''s role, and what should be taught and by what methods.<br> <br><span style="color: red;"><b>Note</b></span></b></span><br><br>As an academic field, philosophy of education is "the philosophical study of education and its problems...its central subject matter is education, and its methods are those of philosophy". "The philosophy of education may be either the philosophy of the process of education or the philosophy of the discipline of education.<br><br><a class="ajaxLink"href="https://en.wikipe***.org/wiki/Philosophy_of_edu****" rel="nofollow">Homepage</a></div>' $txt = StringReplace($txt, "<br>", @crlf) $txt = StringRegExpReplace($txt, '(?s)<a.*?</a>|<.*?>', "") $txt = StringReplace($txt, @crlf, "<br>") Msgbox(0,"replaced back", $txt) Everything working fine...Thank you so much everyone... Link to comment Share on other sites More sharing options...
mikell Posted March 6, 2016 Share Posted March 6, 2016 This (removing links and all tags except <br>) could be done using a single regex $txt = '<img alt="image" src="//imghost.com/id_rk3FyZ.jpg"><br><br><b>Philosophy of Education and Its Importance</b><br><br>Behind every school and every teacher is a set of related beliefs--a philosophy of education--that influences what and how students are taught.<br><br>A philosophy of education represents answers to questions about the purpose of schooling, a teacher''s role, and what should be taught and by what methods.<br> <br><span style="color: red;"><b>Note</b></span></b></span><br><br>As an academic field, philosophy of education is "the philosophical study of education and its problems...its central subject matter is education, and its methods are those of philosophy". "The philosophy of education may be either the philosophy of the process of education or the philosophy of the discipline of education.<br><br><a class="ajaxLink"href="https://en.wikipe***.org/wiki/Philosophy_of_edu****" rel="nofollow">Homepage</a></div>' $txt = StringRegExpReplace($txt, '(?s)(<br>)|<a.*?</a>|<.*?>', "$1") Msgbox(0,"", $txt) Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now