Anyone did the HTML Tidy function?

Talk about writing modules and plugins for CMS Made Simple, or about specific core functionality. This board is for PHP programmers that are contributing to CMSMS not for site developers
Post Reply
gnolen

Anyone did the HTML Tidy function?

Post by gnolen »

Hi there everyone!

I found the htmltidy function for smarty the other day and I am just wondering if someone got it working with Tidy in the root folder and not in the php on the server? We had that with phpwcms and it worked really great..Just cleaned it sweet!

This is how marcus did it..So I am thinking of implement it for CMSmadesimple..Could it be really hard or did someone do it already? No use of doing it again then!

Take care / Gnolen

Code: Select all

First TIDY is a free validator to check and transform your site to valid XHTML.

Oliver integrates this to validate the PHPWCMS output on the fly.

1. go to http://tidy.sourceforge.net/#binaries and download the app
if your site running under Linux choose LINUX/x86 else any other provided OS

2. upload tidy to any folder you want into your phpwcms directory

3. open index.php and go to line 140 and uncomment:
Code:
// EXPERIMENTAL
// now try to cleanup html code with tidy
   if(isset($phpwcms['tidy']) && $phpwcms['tidy']) {

      require_once ("include/inc_front/utf8.func.inc.php");

      $tmp_tidy_file = PHPWCMS_ROOT.'/content/tmp/'.time().'_tidy_cleanup.html';
      $tidy_written = entities_to_utf8($content['page_start'].$content["all"].$content['page_end']);
      $tidy_written = write_textfile($tmp_tidy_file, $tidy_written);
      $tidy_page_end = $content['page_end'];
      if($tidy_written && filesize($tmp_tidy_file)){
         
         //echo '<!-- '.$phpwcms['tidy_command'].' "'.$tmp_tidy_file.'" //-->';
         
         @exec($phpwcms['tidy_command'].' "'.$tmp_tidy_file.'"', $tidy_return);
         if(!isset($tidy_return[0])) {
            
            if(filesize($tmp_tidy_file) && ($content['page_end'] = file_get_contents($tmp_tidy_file))) {
               
               echo $content['page_start'] = '';
               echo $content["all"] = '';
            
            } else {
            
               $content['page_end'] = $tidy_page_end;
            }
            
         }
         unlink($tmp_tidy_file);
      }
   }

4. open conf.inc.php -> you should find (or add)
Code:
$phpwcms["tidy"]              = 1;        //use tidy to cleanup rendered document
$phpwcms["tidy_command"]      = '/www/your_root/tidy/tidy -cqm -utf8 -asxhtml -clean -numeric --wrap 0 --tidy-mark 0 --alt-text 1';        //use tidy to cleanup rendered document

change the path on the second var $phpwcms["tidy_command"] on windows it could be C:\webroot\tidy\tidy.exe

after the filename you write the commands.
-cqm (don't know what this mean/find no documentation)
-utf8
-asxhtml (means it will validate XHTML)

and the you can notice a lot more functions with
--tidy-mark 0 (means tidy don't add a <meta generator="tidy..." /> tag)
--wrap 0 (don't wrap the sourcecode like in default settings after 68 chars)

there are a lot more commads you will find description here:
http://tidy.sourceforge.net/docs/quickref.html
http://www.perpetualpc.net/srtd_tidy.html
http://jnpassieux.chez.tiscali.fr/info/Tidy.php

sorry for my bad english, hope you understand Wink
if you know more about this (performance, commands etc.) please tell us.

greetings marcus
Ted
Power Poster
Power Poster
Posts: 3329
Joined: Fri Jun 11, 2004 6:58 pm
Location: Fairless Hills, Pa USA

Re: Anyone did the HTML Tidy function?

Post by Ted »

As far as I know, this hasn't been tried yet.  I would definatly like to see someone give it a shot.
gnolen

Re: Anyone did the HTML Tidy function?

Post by gnolen »

I can give it a shot but as I am not so good at php it can be hard..This is a great feature to have and it works really nice and if there is a problem it is just easy to set different commands. Because some scripts and Tiny doesn't create totally valid code sometimes it would be a fantastic function. There is a smarty version on this but it only works with Tidy as a php extension which is not to prefer for most of us.

Wishy, what do you think? Would it be hard to pull off? Maybe I going out on deap php water which I can't manage...But I am willing to give it a try though! The code is not that hard.

/ gnolen
mip

Re: Anyone did the HTML Tidy function?

Post by mip »

A very quick hack for tidying the CMS' output:

In index.php replace

Code: Select all

echo $html;
with this

Code: Select all

// HTML Output Cleaner - usually calling "tidy"
// http://tidy.sourceforge.net/
function tidy($webpage) {
	
	// Path to tidy and its commandline args
	$tidyprg = 'tidy -i -q -wrap 0 --tidy-mark 0';

	$descriptorspec = array(
		0 => array("pipe", "r"),  // stdin is a pipe that the child will read from
		1 => array("pipe", "w"),  // stdout is a pipe that the child will write to
		// 2 => array("file", "/tmp/error-output.txt", "a") // stderr is a file to write to
	);

	$process = proc_open($tidyprg, $descriptorspec, $pipes);
	if (is_resource($process)) {
		// $pipes now looks like this:
		// 0 => writeable handle connected to child stdin
		// 1 => readable handle connected to child stdout
		// Any error output will be appended to /tmp/error-output.txt

		fwrite($pipes[0], $webpage);
		fclose($pipes[0]);

		$output = '';

		while (!feof($pipes[1])) {
			$output .= fgets($pipes[1], 1024);
		}

		fclose($pipes[1]);

		// It is important that you close any pipes before calling
		// proc_close in order to avoid a deadlock
		$return_value = proc_close($process);

   		return $output;
	} else {
		// Calling tidy prog failed - return original input
		return $webpage;
	}
}
echo tidy($html);
Cave: Calling tidy every time a page is requested is a bad solution and may cause high server loads!
Ted
Power Poster
Power Poster
Posts: 3329
Joined: Fri Jun 11, 2004 6:58 pm
Location: Fairless Hills, Pa USA

Re: Anyone did the HTML Tidy function?

Post by Ted »

Yeah, the trick to this is putting it into a module and running it when you save content...  or at the very minimum, running it through prerender so that it gets cached.  However, I have no idea how htmltidy would handle the {cms_module} and assorted plugins...

Anyone interested in testing this?
mip

Re: Anyone did the HTML Tidy function?

Post by mip »

Tidy as a prerender module:
New file: modules/HTMLTidy/HTMLTidy.module.php

Code: Select all

<?php

// $Id$

class HTMLTidy extends CMSModule
{
	function GetName()
	{
		return 'HTMLTidy';
	}

	function IsPluginModule()
	{
		return true;
	}

	function GetVersion()
	{
		return '0.1';
	}

	function GetHelp($lang='en_US')
	{
		return "
		<h3>What does this do?</h3>
		<p>HTMLTidy is a module for tidying up HTML content.</p>
		";
	}

	function GetAuthor()
	{
		return 'The M.I.P.';
	}

	function GetAuthorEmail()
	{
		return 'egroups_mip@gmx.fr';
	}

	function GetChangeLog()
	{
		?>
		<ul>
		<li>
		<p>Version: 0.1</p>
		<p>First test.</p>
		</li>
		</ul>
		<?php
	}

    function FriendlyName()
	{
		return 'HTMLTidy';
	}
	
	function ContentPreRender(&$content)
	{
		if (extension_loaded('tidy')) {
			// Use PHP Tidy extension
			$tidy = tidy_parse_string($content, array('indent' => TRUE, 'output-xhtml' => TRUE, 'wrap' => 0));
			tidy_clean_repair($tidy);
			$content = tidy_get_output($tidy);
	
		} else {
			// Call Tidy directly
			// http://tidy.sourceforge.net/
		
			// Path to tidy and its commandline args - use absolute path names for security reasons
			$tidyprg = 'tidy -i -q -wrap 0 --tidy-mark 0';
		
			$descriptorspec = array(
				0 => array("pipe", "r"),  // stdin is a pipe that the child will read from
				1 => array("pipe", "w"),  // stdout is a pipe that the child will write to
				// 2 => array("file", "/tmp/error-output.txt", "a") // stderr is a file to write to
			);
		
			$process = proc_open($tidyprg, $descriptorspec, $pipes);
			if (is_resource($process)) {
				// $pipes now looks like this:
				// 0 => writeable handle connected to child stdin
				// 1 => readable handle connected to child stdout
				// Any error output will be appended to /tmp/error-output.txt
		
				fwrite($pipes[0], $content);
				fclose($pipes[0]);
		
				$output = '';
		
				while (!feof($pipes[1])) {
					$output .= fgets($pipes[1], 1024);
				}
		
				fclose($pipes[1]);
		
				// It is important that you close any pipes before calling
				// proc_close in order to avoid a deadlock
				$return_value = proc_close($process);
		
		   		$content = $output;
			}
		}
	}
}

# vim:ts=4 sw=4 noet
?>

Or: Tidy as a smarty output module:
(basically taken from "Source code of file outputfilter.tidyrepairhtml.php")
New file: lib/smarty/plugins/outputfilter.tidyrepairhtml.php

Code: Select all

<?php
/*
* Smarty plugin
* -------------------------------------------------------------
* File:     outputfilter.tidyrepairhtml.php
* Type:     outputfilter
* Name:     tidyrepairhtml
* Version:  1.1
* Date:     2005-04-06
* Purpose:  Uses the tidy extension to repair a mailformed HTML
*           template before displaying it
* Install:  Drop into the plugin directory, call
*           $smarty->load_filter('output','tidyrepairhtml');
*           from application.
* Authors:  John Coggeshall <john@php.net>, The M.I.P.
* -------------------------------------------------------------
*/

function smarty_outputfilter_tidyrepairhtml ($source, &$smarty)
{
	if (extension_loaded('tidy')) {
		// Use PHP Tidy extension
		$tidy = tidy_parse_string($source, array('indent' => TRUE, 'output-xhtml' => TRUE, 'wrap' => 0));
		tidy_clean_repair($tidy);
		return tidy_get_output($tidy);

	} else {
		// Call Tidy directly
		// http://tidy.sourceforge.net/
	
		// Path to tidy and its commandline args - use absolute path names for security reasons
		$tidyprg = 'tidy -i -q -wrap 0 --tidy-mark 0';
	
		$descriptorspec = array(
			0 => array("pipe", "r"),  // stdin is a pipe that the child will read from
			1 => array("pipe", "w"),  // stdout is a pipe that the child will write to
			// 2 => array("file", "/tmp/error-output.txt", "a") // stderr is a file to write to
		);
	
		$process = proc_open($tidyprg, $descriptorspec, $pipes);
		if (is_resource($process)) {
			// $pipes now looks like this:
			// 0 => writeable handle connected to child stdin
			// 1 => readable handle connected to child stdout
			// Any error output will be appended to /tmp/error-output.txt
	
			fwrite($pipes[0], $source);
			fclose($pipes[0]);
	
			$output = '';
	
			while (!feof($pipes[1])) {
				$output .= fgets($pipes[1], 1024);
			}
	
			fclose($pipes[1]);
	
			// It is important that you close any pipes before calling
			// proc_close in order to avoid a deadlock
			$return_value = proc_close($process);
	
	   		$source = $output;
		} else {
			// Calling tidy prog failed - return original input
			return $source;
		}
	}

	return $source;
}

?>
In lib/smarty/Smarty.class.php search for

Code: Select all

var $autoload_filters = array();
and replace with

Code: Select all

var $autoload_filters = array('output' => array('tidyrepairhtml'));

These are only quick and dirty hacks. In particular, the Smarty hack is really bad, as the output filter would be hardcoded.
But maybe someone is interested and takes this as a starting point for a better solution.
Last edited by mip on Wed Apr 06, 2005 1:16 am, edited 1 time in total.
Ted
Power Poster
Power Poster
Posts: 3329
Joined: Fri Jun 11, 2004 6:58 pm
Location: Fairless Hills, Pa USA

Re: Anyone did the HTML Tidy function?

Post by Ted »

lol.  Wow, that was quick!

Cheers!
nils73
Power Poster
Power Poster
Posts: 520
Joined: Wed Sep 08, 2004 3:32 pm

Re: Anyone did the HTML Tidy function?

Post by nils73 »

Hi everyone,

this thread is fairly old and now we have 1.x at hand - but still no tidy in sight. I have watched the discussion over here about the new WYSIWYG for 1.x and since it will be TinyMCE I will most likely need tidy in future since the code output of TinyMCE is sometimes very lousy. Maybe someone has picked up the pre-render idea of tidy at any place that I could not find.

Regards,
Nils
Post Reply

Return to “Developers Discussion”