Mat Posted October 4, 2009 Posted October 4, 2009 (edited) I was wondering if someone have code regarding beautifying plain HTML code and is willing to share it. I'm not interested in additional software, tools, whatever, just plain AutoIt code. Indentation is what this is about. Would be much obliged. So I made one. It deals only with indentation, and presumes you are inputting a valid HTML file. And that means VALID. no "<br>"'s, it will stuff it up. "<br />"'s only. same applies to any other tag that does not have a separate closing tag. HTML is generally a messy language, but this is very much "garbage in, garbage out" software. Not literally, you'll just have ridiculous tabs in there. Command lines: -i, installs it into the context menu for ".html" and ".htm" files. Use install.bat if your lazy -u, uninstalls the above. (or click uninstall.bat) Filename [tab], the in file and then the tab width in spaces (default = -1 = @TAB) Haven't tried it with a huge HTML file yet... Neither has it been tested on anything other than WinXP SP3. Although I see no reason for any change. source + exe: http://code.google.com/p/m-a-t/downloads/detail?name=HTMLTidy.zip Mat Updated: Fixed problem with comments not following <!-- convention. Tested on a large Wikipedia article, worked perfectly Update2: Significant performance gain. parsed the same wiki article, completed VERY fast. Should also give a better affect? Update3: anchor tags stay on the same line. Much easier to read blocks of text now. Will deal with scripts now. Update4: Another big update, Can now recursively tidy entire directories of HTML files (very useful ), I have also finally managed to sort out problems with which tags are indented. Edited June 24, 2010 by Mat AutoIt Project Listing
MerkurAlex Posted October 4, 2009 Posted October 4, 2009 Hey works great and very fast good work! [quote name='PsaltyDS' post='635433' date='Jan 27 2009, 07:04 AM']Larry is a mass murderer?! It's always the quiet, clean cut, bald guys... [/quote]
Mat Posted October 4, 2009 Author Posted October 4, 2009 (edited) Thanks... I was about to test it on a big html file for how fast it actually was, but realised that I don't have a big web page to test. I used: http://en.wikipedia.org/wiki/Maths They hadn't closed their opening DOCTYPE tag with a "/>" so it messed up the layout. It also showed up an error in my code. I always thaught that <!-- was necessary for a comment, although it appears this is more a convention, and only <! is required. I have updated now! The <! solves the doctype to, that 912 line wikipedia page tidied perfectly (Scrolling down to the bottom showed a nice ending anyhow.) It was a lot faster than I expected though! Still slow on reading the file. Mat Edited October 4, 2009 by Mat AutoIt Project Listing
Mat Posted October 4, 2009 Author Posted October 4, 2009 (edited) Updated again, with no loop for reading the file, and a bit more stringregexp, significantly faster. I have also noticed another fault in that the anchor tag is treated as any other tag would be... Leading to unreadable text in some cases. I will look for a solution. Mat Edit: Also completely messes up any internal scripts, such as CSS. Use external files or tidy it yourself. Edited October 4, 2009 by Mat AutoIt Project Listing
Mat Posted October 6, 2009 Author Posted October 6, 2009 Big Update. Can now recursively tidy entire directories of HTML files using the "-d" command line, or simply by entering a directory as $cmdLine[0]. I managed to sort out the problem Trancexx brought up with indenting after tags such as "input" or "base", and I also added in tag correction. Mat AutoIt Project Listing
James Posted July 9, 2010 Posted July 9, 2010 Mat,Old topic, but since it's an interesting piece of code, it deserves to be re-opened You know how your code only works with XHTML source, due to the <br /> being required, why not use a regular expression to allow either <br> or <br />James Blog - Seriously epic web hosting - Twitter - GitHub - Cachet HQ
Mat Posted July 9, 2010 Author Posted July 9, 2010 Mat,Old topic, but since it's an interesting piece of code, it deserves to be re-opened You know how your code only works with XHTML source, due to the <br /> being required, why not use a regular expression to allow either <br> or <br />JamesIt does at the moment (I think). There is supposed to be tag correction in there as well, I think it's broken though (I got a pm about something )I have updated the script with code corrections working.Mat AutoIt Project Listing
James Posted July 9, 2010 Posted July 9, 2010 Mat, I'm going to have a play with this some time I think you could make it into a neat little app if you added enough features. James Blog - Seriously epic web hosting - Twitter - GitHub - Cachet HQ
Mat Posted July 9, 2010 Author Posted July 9, 2010 Mat,I'm going to have a play with this some time I think you could make it into a neat little app if you added enough features.JamesI know, it's been a while but I would like to make a full check and tidy tool, as well as building in different levels like xhtml 1.0 Strict, frames, transitional and 1.1 as well as HTML 4 and 5 and also general xml.The best way to go would be a big database with tag,description,supported versions,supported browsers,self closing... Do you know if there are any out there already? AutoIt Project Listing
stoyan Posted October 1, 2010 Posted October 1, 2010 They hadn't closed their opening DOCTYPE tag with a "/>" so it messed up the layout. It also showed up an error in my code. I always thaught that <!-- was necessary for a comment, although it appears this is more a convention, and only <! is required.You totally got it wrong. Semantically and structurally the '<!' has nothing in common with '<!--'. The '<!DOCTYPE ... >' is the whole tag and is the HTML analog of the (pre)processing instruction in the programing.As for the comments the opening is '<!--' and the closing is '-->' no '-'s allowed in between.P.S.Sorry for grumbling about this one-year-old post of yours. ; Opt('MustDeclareVars', 1)
Mat Posted October 1, 2010 Author Posted October 1, 2010 You totally got it wrong. Semantically and structurally the '<!' has nothing in common with '<!--'. The '<!DOCTYPE ... >' is the whole tag and is the HTML analog of the (pre)processing instruction in the programing. As for the comments the opening is '<!--' and the closing is '-->' no '-'s allowed in between. P.S. Sorry for grumbling about this one-year-old post of yours. I'll defend the me of a year ago by saying that it was the lack of closing tag on the doctype not the lack of hyphens that was the first error I fixed. As for comments: <! test -- 123 -- 123 123> does not show in the browser, it is simply taken to be a preprocessor as you said. This is a simple tidier, as far as it's concerned its a comment node, so does not start a block. That's all I care about. But yes, you are right that I would never write my comments like that, and more complex programs would throw an error. AutoIt Project Listing
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now