Percipient Studios logo

Anatomy of an umbraco XSLT file

Saturday, April 11, 2009
Filed under: xslt - by Douglas Robar

XSLT (eXtensible Sylesheet Language Transformation) is used to transform XML into other formats, such as (x)HTML. In this series of articles you'll learn the basics of using XSLT for creating umbraco macros. This is the first post in that series.

When you create a clean XSLT macro in umbraco 4 you see the following template code (some indenting added for increased clarity):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xsl:stylesheet [ <!ENTITY nbsp "&#x00A0;"> ]>
<xsl:stylesheet 
    version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    xmlns:msxml="urn:schemas-microsoft-com:xslt"
    xmlns:umbraco.library="urn:umbraco.library"
    exclude-result-prefixes="msxml umbraco.library">
    
    <xsl:output method="xml" omit-xml-declaration="yes"/>
    
    <xsl:param name="currentPage"/>
    
    <xsl:template match="/">
        <!-- start writing XSLT -->
    </xsl:template>
    
</xsl:stylesheet>

Before diving in and starting to write an XSLT macro, let's take a moment to understand the anatomy of an umbraco XSLT file. Though you won't often need to change any of the default values, understanding the various parts will be invaluable as we proceed, and we'll refer back to this article often.

Line 1: XML version and encoding

<?xml version="1.0" encoding="UTF-8"?>

Umbraco provides full unicode support for websites, which means you aren't limited to only ASCII or HTML-encoded characters. For instance, the following unicode characters might be used: ä ¿ § ¡ ò مرحبًا καλημέρα müjde. To ensure your XSLT macro honors unicode characters you must always specify the XML encoding.

There is no reason to change this value when using umbraco.

Line 2: !DOCTYPE and !ENTITY

<!DOCTYPE xsl:stylesheet [ <!ENTITY nbsp "&#x00A0;"> ]>

XSLT follows all the formatting rules of XML, such as proper nesting and closing of all tags. Though (x)HTML and XML share many similarities they are not exactly the same. For instance, <br> is valid HTML but <br /> is required in XML/XHTML.

There are five character sequences (or "entities") built-in to the XSL specification. They are: &lt; &gt; &apos; &quot; and &amp; The !DOCTYPE is only required if you wish to specify additional entities.

Besides the five built-in entities, HTML makes use of another character sequence very frequently, the non-breaking space (&nbsp;). Line 2 adds this entity so that you may include non-breaking spaces in your XSLT macros.

If the entity had not been added you would have to use &amp;nbsp; rather than the more traditional &nbsp; in your macros.

TIP: Remember to use &amp; rather than just the & symbol in your XSLT code. Similarly, when comparing values in XSLT you cannot simply use "5 < 10" but must use "5 &lt; 10". This is because the built-in entities resolve the problem of trying to use reserved characters in your XSLT macros. We'll see many examples of this in future articles.

As with line 1, this line rarely needs any modification, though in extreme cases you might wish to add additional entities.

Lines 3 and 18: <xsl:stylesheet> </xsl:stylesheet>

<xsl:stylesheet 
    ...
    ...
</xsl:stylesheet>

These tags enclose the stylesheet instructions themselves. Apart from the header information in the two lines above, all of your XSLT will appear inside these tags.

Lines 4 and 5: XSLT version and namespace

    version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

Within the opening of the <xsl:stylesheet … > there are a number of elements specifying details about this stylesheet.

Umbraco uses the Microsoft XML Parser (MSXML 6.0), which follows the XSLT 1.0 specification. The XSLT 1.0 specification defines a standard namespace, which is shown on line 5 and needs no modification.

Notice that all XSL functions are pre-pended with the xsl: prefix label to indicate that they are found in the core XSL library built into the MSXML parser.

Note: The XSLT 1.0 specification does not support all the XSLT functions noted at http://www.w3schools.com/Xpath/xpath_functions.asp (a great reference, by the way!), which lists the full set of functions available in the newer XSLT 2.0 specification. I'll discuss which XSLT functions are available in XSLT 1.0 and also explain how you can work around the limitations of XSLT 1.0 to get the same effect as the functions in XSLT 2.0 in another article.

Lines 6 and 7: Additional namespaces

    xmlns:msxml="urn:schemas-microsoft-com:xslt"
    xmlns:umbraco.library="urn:umbraco.library"

You can specify additional namespaces and gain access to functions beyond those built into the core XSL namespace. To access these extra functions you need to add namespaces to the xsl:stylesheet declaration. This is similar to the "include" or "using" feature of other languages.

Umbraco references two additional namespaces by default, as shown in these lines. One for the MSMXL functions (such as node-set()), and the other for the umbraco.library API functions (such as NiceUrl()). We'll discuss the umbraco.library functions in detail in another post.

There is no need to change these lines, though you may wish to include more namespaces to allow you to reference the functions contained in other XSLT extensions, including those you create yourself (a topic we'll talk more about in another article).

Note: Beginning with umbraco 4.0.1, the EXSLT library namespace and functions are also referenced by default, making it easier to use those extended functions within your umbraco macros. The full xsl:stylesheet declaration is shown below (some indenting and line breaks added for clarity):

    version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:msxml="urn:schemas-microsoft-com:xslt"
    xmlns:umbraco.library="urn:umbraco.library" 
    xmlns:Exslt.ExsltCommon="urn:Exslt.ExsltCommon" 
    xmlns:Exslt.ExsltDatesAndTimes="urn:Exslt.ExsltDatesAndTimes" 
    xmlns:Exslt.ExsltMath="urn:Exslt.ExsltMath" 
    xmlns:Exslt.ExsltRegularExpressions="urn:Exslt.ExsltRegularExpressions" 
    xmlns:Exslt.ExsltStrings="urn:Exslt.ExsltStrings" 
    xmlns:Exslt.ExsltSets="urn:Exslt.ExsltSets"
    exclude-result-prefixes="msxml
        umbraco.library 
        Exslt.ExsltCommon 
        Exslt.ExsltDatesAndTimes 
        Exslt.ExsltMath 
        Exslt.ExsltRegularExpressions 
        Exslt.ExsltStrings 
        Exslt.ExsltSets"
    >

Line 8: Excluding prefixes in the output

    exclude-result-prefixes="msxml umbraco.library">

When you include additional namespaces you should also list them in the exclude-result-prefixes list. Simply list all namespaces you've included (use the prefix after the xslns:) with a space between each prefix.

You only need to modify this line if you include additional namespaces.

Line 10: XSL output method

    <xsl:output method="xml" omit-xml-declaration="yes"/>

This is the line you are most likely to want to change.

By default, umbraco specifies the output method as "xml". This is fine in most cases. But there is one situation that you may encounter in which you might prefer different behavior.

That situation is when you have an empty tag in the output. When the output method is set to "xml" the tag will be simplified or collapsed.

For instance, if your XSLT contained the following:

<div></div>

The output sent to the browser with method="xml" would be:

<div />

Most browsers will not handle this gracefully, your CSS styles will not be applied as expected, and your site may not display correctly.

There are two solutions. One is to include a test in the XSLT to ensure you never output an empty tag that will collapse. The other is to change the XSL output method to "html", which will preserve the expanded tags and not collapse them.

The output sent to the browser with method="html" would be:

<div></div>

Do be aware that the "html" output method relaxes the requirements on the output somewhat and this may introduce issues that could cause your webpages to fail xhtml validation because of un-closed tags, as is appropriate for HTML 4.0 but not XHTML. According to the XSLT specification at http://www.w3.org/TR/xslt#section-HTML-Output-Method, you'll find the following:

The html output method should not output an end-tag for empty elements. For HTML 4.0, the empty elements are: area, base, basefont, br, col, frame, hr, img, input, isindex, link, meta, and param. For example, an element written as <br /> or <br></br> in the stylesheet should be output as <br>

A more interesting situation is that the img tag will not have a closing slash with method="html".

Indeed, you may wish to set all your XSLT macros to use method="html" and reserve method="xml" for special cases,such as when creating RSS feeds that must be in valid XML format with collapsed tags.

Because some tags do not get closed when using method="html" I recommend the use of method="xml" unless you have a specific reason to use method="html". Just be sure to avoid empty tags that would collapse and potentially break your site's css rendering.

Only in very rare and special cases would you want to include the xml declaration (<?xml...?>) in the output, and never for output used in (x)HTML pages no matter what setting you've used for the output method.

Line 12: The currentPage parameter

    <xsl:param name="currentPage"/>

Unlike traditional XSLT in which the entire content of a single file is processed and transformed, a website must include some sense of context.

What page is the website visitor currently viewing? That is what the currentPage parameter is used for. Of all the pages in your site, currentPage will tell you which one is being viewed.

The currentPage parameter is automatically populated by umbraco. It is referenced as $currentPage in your XSLT and you will use it a lot, as we'll see in future articles.

Lines 14 to 16: Template match="/"

    <xsl:template match="/">
        <!-- start writing XSLT -->
    </xsl:template>

Templates are a lot like functions() in other languages. They can be referenced by name (named templates) or they can run automatically when the xml being processed matches a certain condition (match templates).

All xml files will have a root node and will match the root or "/" xpath query. Thus, a match template with match="/" will always be executed, and executed first because the root node in the XML is always the first node.

Or, putting it another way, template match="/" is a lot like the main() or Page_Load() functions in other languages.

We'll discuss xpath as well as match and named templates in detail in other articles. For now, the important thing to understand is that any XSLT code you put in the match="/" template will always be executed when the macro runs.

Conclusion

The pre-defined XSLT templates provided by umbraco can be used "as-is" and few people bother to worry about what all those lines are doing. You, too, might often do that, focusing your efforts on line 15 and writing your own XSLT code.

But understanding each of the lines that make up the structure or "anatomy" of an XSLT file will give you confidence and guidance as you develop your XSLT skills through this series of articles. We will look back to these settings in many of the articles to come.

4 comments for “Anatomy of an umbraco XSLT file”

  1. Gravatar of Søren Sprogø
    Søren Sprogø says:
    Oh gosh, what a great guide!

    Didn't know you could exclude stuff with ENTITY, that'll come in handy.

    Can't wait for the next article!
  2. Gravatar of David Conlisk
    David Conlisk says:
    Nice work Doug and great to see your blog is up and running - you've been added to my blogroll of course!

    Keep the articles coming...
  3. Gravatar of Thomas Höhler
    Thomas Höhler says:
    Great article,

    very clear and step by step. Keep on your great work.

    Thomas
  4. Gravatar of Douglas Robar
    Douglas Robar says:
    Thanks to Sebastiaan's observations about the behavior of output method="html" I have updated the text of this post.

    See our.umbraco.org/forum/ourumb-dev-forum/bugs/3178-End-of-xslt-menu-doesn%27t-stay-on-level-one,-it-appears-in-the-sub-menu for the full discussion.

    cheers,
    doug.

Leave a comment

Name:
 
Email:
 
Website:
Comment:
 
Human?:
 
 
 
powered by <XSLTsearch>

Categories

Follow Us

RSS Feed
Follow us on Twitter
Follow us on Flickr