Overview
A wise PRogrammer once said, "The one constant in computing is change." There couldn't be a truer
statement. This article is about such change, specifically moving from HTML to the next generation, XHTML
(Extensible Hypertext Markup Language).
This article includes the following sections:
An Introduction to XHTML
Implementing XHTML Today
Changing HTML to XHTML
Conclusion
Additional XHTML Resources and Facts
The analysis is from a server-side perspective, meaning it applies equally well to asp, jsp, php or other
server-side driven projects.
An Introduction to XHTML
XHTML (now in version 1.1) is the merging of HTML 4 and xml. It represents such an important advancement
that the World Wide Web Consortium (W3C), the international standards body for the web, is replacing HTML
with XHTML as the standard tool for creating web pages.
XHTML is built to open doors to other formats. For example, XHTML can be used to format content for
pagers, whereas HTML cannot. XHTML will replace WAP and other markup languages. It is a cornerstone in the
revolutionary change in thinking beginning to occur in web site design. Instead of viewing a web site as a
stand alone data island, XHTML will expand web applications, allowing web sites to control and send
information which will drive countless devices, presentation styles and other web sites. XHTML is the
starting point for this tremendous change we are about to experience in how we use the web.
Using XHTML has many advantages over using HTML. Because of its structure, XHTML is faster. Its well
formed documents result in quicker and smaller parsers. These smaller parsers waste less time verifying
and doing logic sorting that's required for hodge podge HTML documents. While faster results are not
available yet, expect improved performance from the next generation of XHTML-based browsers.
The architecture of XHTML allows tags, attributes and document types to be uniquely defined by the users
of XHTML. HTML restrictions no longer apply. Over time, this will allow for the development of industry
and project specific XHTML documents. To explore this idea more fully, see the W3C page.
A significant limitation of HTML today is the form field. The W3C established special task groups to
expand the functionality of XHTML and one of these is working to improve form field usage. The
XHTML/XForms specifications are still under development but when done will dramatically change the way we
use forms. A list of some of the great features XForms will add includes:
Pre-built functions remove the need to use javaScript as heavily as in the past. It will be a great boon
for supporting small devices where Javascript may not have been available.
Elements are device independent, allowing flexibility to add voice or other input methods.
Data is transmitted from the form in XML format.
Data types are predefined.
Forms will be separated into 3 distinct layers: presentation, logic and data. Splitting forms into these
logical partitions will make it easy for forms to work on different kinds of browsers and devices while
maintaining a standard back end.
What other advancements does the future hold for forms? Only the final specifications will tell the full
story on all the features. The draft specifications for XForms were released in April 2000. The final
specifications are expected by year end. XForms will likely be one of the driving forces to upgrade to
XHTML in the future. For more information on XForms see W3C and W3schools.
Another advantage of XHTML is that it is a XML-based system. XML is an great technology and it is being
used in many exciting ways. While programmers would like to use XML in a variety of applications, it still
isn't practical to use for many projects. XHTML changes this because it makes XML easy to use with any
project. Learning XHTML means expanding XML knowledge and skills. It means learning to think in XML. XHTML
enables sites to use XML conveniently in day-to-day web business. It is the stepping stone that will
finally give everyone easy access to the power and convince of XML.
Implementing XHTML Today
How soon does XHTML need to be implemented? That depends on a number of factors, many of them
infrastructure related. The current generation of tools, such as editors and browsers, need updating to
use XHTML efficiently and smoothly. Then these updated tools need to make their way into common use.
Furthermore, some of the standards, like XForms, are still under development, and once developed will
likely change (much like any new software) soon after the first full release. Addressing these
infrastructure issues will likely take from one to four years.
Nothing, however, is stopping conversion from beginning now, and, in fact, it's a good idea to start
learning the basics of XHTML, incorporating it into current projects and planning for it in new projects.
It's a good time to begin changing programming habits to enable a smooth transition in the future. This is
possible for a few reasons.
For the most part, XHTML content which doesn't match standard HTML will still usually work with HTML
parsers. This is because the parsers ignore most errors. When a parser encounters something that isn't
quite right in the page it usually won't cause a failure. This isn't always true, such as for scripting
(discussed below), but it is possible to at least make most end pages of projects completely XHTML
compliant.
As another example, because XHTML is case sensitive, tags are written in lower case. While this may seem
like a relatively minor change, it is one, none the less, which can be implemented immediately, creating a
good programming habit. Similarly, nesting rules are strict in XHTML and can be followed in HTML to
ingrain good programming habits. Both of these topics are discussed more below.
Implementing XHTML in HTML web applications now also helps ensure that the output will be XHTML compatible
later. Designing with an eye to the future is important whether that future be 6 months or 10 years from
now. Changing a web page is easy, but updating the components takes more thought and time.
How do you implement a migration plan? Begin to write code which is XHTML compliant but don't require the
end pages to be completely compliant at this stage. You may find you need to make significant changes in
how your dynamic server pages are written. If you use code from your library, make sure XHTML rather than
HTML is being produced. When the HTML pages are done with the components integrated, run them through a
conversion tool to update them and check for IDE-generated HTML that doesn't match the XHTML standard.
That's it! Just remember, the goal isn't necessarily to be 100% XHTML compliant, but rather to begin
learning and applying XHTML where it makes sense for your projects and web sites. It can be applied in
stages, so take advantage of this flexibility where it benefits you.
Changing HTML to XHTML
Here's some of the particulars you should consider in getting started with your conversion from HTML to
XHTML. This isn't a comprehensive list or discussion, but covers the major changes using the strict
document definition.
XHTML is based on XML standards. This means a document must follow "well formed" rules, that is, XML
syntax. The rules of most concern include:
XML is case sensitive. In XHTML this means every HTML tag must be written in lower case. So use <table>
not <TABLE>. Current HTML editing tools will fight you here. Don't worry about the case that is auto-
generated. Instead, when hand typing in HTML tags, get used to using lower case. Also, when generating
HTML dynamically make sure to use lower case.
IMPORTANT! While your should get use to writing your tags in lower case, don't worry about the case of
HTML tags that are automatically generated by the current generation of HTML editors. Tools are available
to clean up HTML pages to make them XHTML compliant. These tools, however, will not catch tags with
improper syntax generated by your code! Get in the habit of using the right syntax within your scriptlets,
JavaBeans, Com objects or wherever else you are generating your own HTML content.
Non-empty tags must be properly nested. This means tags do not cross over each other. In the invalid
example below notice that the form and table tags are improperly nested, that is, they cross over one
another. Then see how this is rectified in the correct example. Invalid Example:
<form action="test.htm">
<table>
<tr><td>hi
</form>
</td></tr>
</table>
Correct Example:
<form action="test.htm">
<table>
<tr><td>hi</td></tr>
</table>
</form>
Attribute values must be quoted. So <form action=test.htm> is not legal but <form action="test.htm"> is
legal.
All tags must be closed. For tags which don't normally have a closing element, end the tag within itself.
For example, <br> by itself is not legal. Rather, use <br/> . These tags may also end like </br>, but
<br/> syntax seems to work better with current browsers.
No attribute may appear more than once in the same tag. This shouldn't be a problem.
In addition to the changes driven by XML, more changes in tags are driven by XHTML's own DTD (document
type definition). Here's the highlights.
The first tag in a XHTML document must be <!DOCTYPE>. This tag informs the reader which definition to use
in describing the XHTML document. XHTML uses DTD modules to translate tags. In selecting among the three
DTDs follow these rules.
When writing pure XHTML use the strict DTD:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd">
When writing for the most HTML compatibility use the transitional DTD :
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "DTD/xhtml1-transitional.dtd">
When using frames use the frameset DTD:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "DTD/xhtml1-frameset.dtd">
The transitional DTD will be used for most pages.
The second tag in a XHTML document must be <html> and the xmlns attribute is mandatory.
The <title> tag is mandatory in a XHTML document.
Form tags must have an action attribute. For example, <form action="test.htm"></form>
Style tags such as <font> and <center> have been removed! Use style sheets for formatting.
Data (which in a HTML page would be text) must be enclosed within a set of valid tags. A partial list of
valid tags to enclose free standing data (text) includes "p", "h1" "div", "pre".
The first example is wrong because the data (text) is not enclosed within a defined tag set
<body>
HI, this is Wrong.
<br/>
<a href="http://validator.w3.org/check/referrer">validate</a>
</body>
This is the correct way to include data using the div tag.
<body>
<div>
HI, this is Right.
<br/>
<a href="http://validator.w3.org/check/referrer">validate</a>
</div>
</body>
Every <img> tag must have an alt attribute.
Every <style> tag must have a type attribute.
No stand-alone attributes (also known as minimized attributes) are allowed. For example, <option selected>
is no longer valid. Instead, it will look like <option selected="selected">.
"Inline" tags cannot contain "block-level" tags. For example, an anchor tag can't enclose a <table>
Scripting elements pose a problem for XHTML compatibility. The XML parser will parse the script as a XML
document unless you enclose your script in a CDATA block. Therefore, a JavaScript element would now look
like:
<script type="text/javascript">
<![CDATA[ alert("hello"); ]]>
</script>
This causes a hassle for all the current browsers as they will not like the CDATA block. For now, the only
solution is to call the JavaScript from an external file. For example:
<script language="JavaScript" type="text/javascript" src="main.js"></script>
For the server-side programmer this is a problem when you modify the JavaScript dynamically. Using a
separate file source for your JavaScript prevents you from being able to dynamically change your
JavaScript. This is because the JavaScript is being included on the client side so the server side won't
be able to touch it now. When modifying JavaScript using ASP, JSP or PHP scripting, use the standard HTML
method of script declaration. This is the one place where making JSP or ASP 100% compatible with XHTML
will be most problematic. Remember, however, the goal is not to be 100% compatible with XHTML, but to
begin incorporating XHTML where feasible, allowing a quick and easy transition when the time comes. When
that time arrives, new compatible browsers should be available and you'll be set to make the jump to 100%
compatibility.
Conclusion
In this article we've explored some advantages of XHTML and how to start using it right now with very
little hassle. XHTML is far more than a replacement for HTML. Thinking of it as HTML 5.0 unnecessarily
limits its power and the possibilities it will introduce. XHTML is meant to be expanded by the user
community. It creates XML documents which contain, define and manipulate data, going far beyond the
capabilities of HTML-based documents. It makes XML easy to use. To fully realize the potential XHTML
presents will require a new way of thinking about future applications. It creates fresh possibilities.
XHTML really is a new thing (not merely an upgrade) and the challenge ahead of us is to experiment and
discover where it can take us.