XML validation & query urls (& failure)

I tried to make my website xml-valid so that it can be loaded as xml by browsers that accept it (currently firefox3, chrome & opera) and be parsed faster.
Moreover, I simply wanted my website to be w3c valid with an XHTML header.

Problems come when you have urls with arguments in your page. Validation fails upon ‘&’ variable separator.

At first, what I did was simply to htmlencode the separator, replacing ‘&’ with ‘&’. It works fine regarding content validation, but breaks down server-side code as ‘&’ is not treated as the separator by php if raw url is directly pasted into the browser (note that normally all browsers automatically decode ‘&’ so that when user click on the link, page is correctly served => error explained below should almost never occur).

Example:

<a href="myurl?key1=value1&amp;key2=value2">my_link</a>

// server side script corresponding to queried page
echo var_dump($_GET);
// ouput:
array(2) {
["key1"]=>
string(6) "value1"
["amp;key2"]=>
string(6) "value2"
}

Quick-solution

Personally I find it pretty annoying having to encode ‘&’ into ‘&amp;’ each time an url gets generated, and I also don’t like the idea of being dependent upon the browser for it to automatically decode ‘&amp;’ when it is used as a link.

A much cleaner solution is to change default var aggregator character to an xml-compliant one that can be urlencoded too (so that values of your query variables won’t interfere with it).

‘;’ meets these criteria (note: this character is recommended by w3c as an official alternative to ‘&’ => +++)

In php, you simply need to modify php.ini, and change ‘arg_separator.input’ value from ‘&’ to ‘;’ (don’t forget to restart you server for changes to be applied).
Be aware not to use a multi-character separator as each character will be treated as a separator per see.
We will need this multi-character behavior in our advantage to avoid our code breaking down in case we forgot to use ‘;’ as variable separtor, by using following list of separators:

arg_separator.input = ";&"

To implement these changes, do the following:

  • locate php.ini file (/etc/php5/apache2/php.ini  on my server)
  • open it, simply locate line with ‘arg_separator.input’ and uncomment it by removing ‘;’ char at beginning of the line.
  • do the same with ‘arg_separator.output’ if you need to
  • save your changes
  • restart your server (“sudo /etc/init.d/apache2 restart”)

That’s it! (you can start using your new xml-compliant urls)

Warning

If you implement this solution, keep in mind that it might NOT be an optimal choice for search-engine optimization. They might expect you to use standard ‘&’ character so that they can parse your request uri to index it better.
In my case, this did not come into consideration because my website is almost entirely private (identification required) => useless for search-engines.

A solution more search-engine compliant would be to use ‘/’ as a separator, which by the way may not need to make any changes to php per see if your app is built upon recent frameworks (such as Zend Framework) which will natively handle variable extractions (but you won’t get values into $_GET superglobal, check your framework documentation for more details)

Note1: you might also want to modify  arg_separator.output and set it to the same value if you use php functions to build/output some urls

Note2: if you don’t have access to php.ini file, don’t worry, ‘arg_separator.input’ can also be changed within httpd.conf or .htaccess files using the following directives: “php_value arg_separator.output &amp;” and “php_value arg_separator.input ;&”

sources

Post a Comment

Your email is never published nor shared. You're allow to say what you want...