18 replies [Last post]
tommuir
Offline
Regular
Upper east side
Last seen: 7 years 32 weeks ago
Upper east side
Joined: 2004-04-05
Posts: 40
Points: 0

Sending XHTML as text/html Considered Harmful -*- Mode: text; -*-
=============================================

Author: Ian Hickson <ian@hixie.ch>

Abstract
--------

A number of problems resulting from the use of the text/html MIME type
in conjunction with XHTML content are discussed. It is suggested that
XHTML delivered as text/html is broken and XHTML delivered as text/xml
is risky, so authors intending their work for public consumption
should stick to HTML 4.01, and authors who wish to use XHTML should
deliver their markup as application/xhtml+xml.

Context
-------

This was originally written in September 2002 in the context of this
Web log entry:

http://ln.hixie.ch/?start=1031465247&count=1

It has since been regularly updated to correct errors that have been
brought up in various mailing lists and other discussion forums.

Note that this document compares XHTML 1.0 compliant to appendix C to
HTML 4.01, because that is the only variant of XHTML that may be sent
as text/html.

Executive Summary
-----------------

If you use XHTML, you should deliver it with the application/xhtml+xml
MIME type. If you do not do so, you should use HTML4 instead of XHTML.
The alternative, using XHTML but delivering it as text/html, causes
numerous problems that are outlined below.

Unfortunately, IE6 does not support application/xhtml+xml (in fact, it
does not support XHTML at all).

Why using text/html for XHTML is bad
------------------------------------

What usually happens to authors who decide to send XHTML as text/html
is the following:

1. Authors write XHTML that makes assumptions that are only valid for
tag soup or HTML4 UAs, and not XHTML UAs, and send it as
text/html. (The common assumptions are listed below.)

2. Authors find everything works fine.

3. Time passes.

4. Author decides to send the same content as application/xhtml+xml,
because it is, after all, XHTML.

5. Author finds site breaks horribly. (See below for a list of
reasons why.)

6. Author blames XHTML.

Steps 1 to 5 have been seen by every single person I have spoken to
who has switched to using the XHTML MIME type. The only reason step 6
didn't happen in those cases is that they were advanced authors who
understood how to fix their content.

SPECIFIC PROBLEMS

These are the issues that affect documents when they are switched from
text/html to application/xhtml+xml:

* <script> and <style> elements in XHTML sent as text/html have to be
escaped using ridiculously complicated strings.

This is because in XHTML, <script> and <style> elements are #PCDATA
blocks, not #CDATA blocks, and therefore <!-- and --> really _are_
comments tags, and are not ignored by the XHTML parser. To escape
script in an XHTML document which may be handled as either HTML4 or
XHTML, you have to use:

<script type="text/javascript"><!--//--><![CDATA[//><!--
...
//--><!]]></script>

To embed CSS in an XHTML document which may be handled as either
HTML4 or XHTML, you have to use:

<style type="text/css"><!--/*--><![CDATA[/*><!--*/
...
/*]]>*/--></style>

Yes, it's pretty ridiculous. If documents _aren't_ escaped like
this, then the contents of <script> and <style> elements get
dropped on the floor when parsed as true XHTML.

(This is all assuming you want your pages to work with older
browsers as well as XHTML browsers. If you only care about XHTML
and HTML4 browsers, you can make it a bit simpler.)

* A CSS stylesheet written for an HTML4 document is interpreted
slightly differently in an XHTML context (e.g. the <body> element
is not magical in XHTML, tag names must be written in lowercase in
XHTML). Thus documents change rendering when parsed as XHTML.

* A DOM-based script written for an HTML4 document has subtly
different semantics in an XHTML context (e.g. element names are
case insensitive and returned in uppercase in HTML4, case sensitive
and always lowercase in XHTML; you have to use the namespace-aware
methods in XHTML, but not in HTML4). BUT, if you send your
documents as text/html, then they will use the HTML4 semantics
DESPITE being XHTML! Thus, scripts are highly likely to break when
the document is parsed as XHTML.

* Scripts that use document.write() will not work in XHTML contexts.
(You have to use DOM Core methods.)

* Current UAs are, for text/html content, HTML4 user agents (at best)
and certainly not XHTML user agents. Therefore if you send them
XHTML you are sending them content in a language which is not
native to them, and instead relying on their error handling. Since
this is not defined in any specification, it may vary from one user
agent to the other.

* XHTML documents that use the "/>" notation, as in "<link />" have
very different semantics when parsed as HTML4. So if there was to
be a fully compliant HTML4 UA, it would be quite correct to show
">" characters all over the page.

For more details on this see the third bullet point in the section
entitled "The Myth of "HTML-compatible XHTML 1.0 documents".

COPY AND PASTE

The worst problem, and the main reason (I suspect) for most of the
REALLY invalid XHTML pages out there, is that authors who have no clue
about XHTML simply copy and pasted their DOCTYPE from another
document. So even if you write valid XHTML, by using XHTML, you are
likely to encourage authors who do not know enough to write valid
XHTML to claim to do so.

Why trying to use XHTML and then sending it as text/html is bad
---------------------------------------------------------------

These are not likely to be problems for authors who regularly validate
their pages, but other authors will run into these problems.

* Documents sent as text/html are handled as tag soup [1] by most UAs.

This is the key. If you send XHTML as text/html, as far as browsers
are concerned, you are just sending them Tag Soup. It doesn't
matter if it validates, they are just going to be treating it the
same was as plain old HTML 3.2 or random HTML garbage.

Since most authors only check their documents using one or two UAs,
rather than using a validator, this means that authors are not
checking for validity, and thus most documents that claim to be
XHTML on the web now are invalid.

See, for example, this study:
http://www.goer.org/Journal/2003/Apr/index.html#results
...but if you don't believe it, feel free to do your own. In any
random sample of documents that appear to claim to be XHTML, the
overwhelming majority of documents are invalid.

Therefore the main advantage of using XHTML, that errors are caught
early because it _has_ to be valid, is lost if the document is then
sent as text/html. (Yes, I said _most_ authors. If you are one of
the few authors who understands how to avoid the issues raised in
this document and does validate all their markup, then this
document probably does not apply to you -- see Appendix B.)

* If you ever switch your documents that claim to be XHTML from
text/html to application/xhtml+xml, then you will in all likelyhood
end up with a considerable number of XML errors, meaning your
content won't be readable by users. (See above: most of these
documents do not validate.)

* If a user saves such an text/html document to disk and later
reopens it locally, triggering the content type sniffing code since
filesystems typically do not include file type information, the
document could be reopened as XML, potentially resulting in
validation errors, parsing differences, or styling differences.
(The same differences as if you start sending the file with an XML
MIME type.)

* The only real advantage to using XHTML rather than HTML4 is that it
is then possible to use XML tools with it. However, if tools are
being used, then the same tools might as well produce HTML4 for you.
Alternatively, the tools could take SGML as input instead of XML.
(SGML is over a decade older than XML and the tools have existed
for years.)

* HTML 4.01 contains everything that XHTML 1.0 contains, so there is
little reason to use XHTML in the real world. It appears the main
reason is simply "jumping on the bandwagon" of using the latest and
(perceived) greatest thing.

The Myth of "HTML-compatible XHTML 1.0 documents"
-------------------------------------------------

RFC 2854 spec refers to "a profile of use of XHTML which is compatible
with HTML 4.01". There is no such thing. Documents that follow the
guidelines in appendix C are not valid HTML 4.01 documents. They just
happen to be close enough that tag soup parsers are able to handle
them just like most of the other pages on the Web.

The simplest examples of this are:

* The "/>" empty tag syntax actually has totally different meaning in
HTML4. (It's the SHORTTAG minimisation feature known as NET, if I
recall the name correctly.) Specifically, the XHTML

<p> Hello <br /> World </p>

...is, if interpreted as HTML4, exactly equivalent to:

<p> Hello <br>&gt; World </p>

...and should really be rendered as:

Hello
> World

* Script and style elements cannot have their contents hidden from
legacy UAs. The following XHTML:

<style type="text/css">
<!-- /* hide from old browsers */
p { color: red; }
-->
</style>

...is exactly equivalent to the following HTML4:

<style type="text/css">

</style>

...because comments are not ignored in XHTML <style> blocks.

* The "xmlns" attribute is invalid HTML4.

* The XHTML DOCTYPEs are not valid HTML4 DOCTYPEs.

Using XHTML and sending it as text/html is effectively the same, from
an HTML4 point of view, as writing tag soup (see "Why UAs can't handle
XHTML sent as text/html as XML" below).

Note: This is covered by HTMLWG issue XHTML-1.0/6232:
http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-1.0?id=6232;expression=appendix%20c;user=guest

Why UAs can't handle XHTML sent as text/html as XML
---------------------------------------------------

* Documents sent as text/html are handled as tag soup by most UAs.
This means that authors are not checking for validity, and thus
most XHTML documents on the web now are invalid. A conforming XML
UA would thus be unable to show as many documents as current UAs,
and would therefore never get enough marketshare to be relevant.

* It is impossible to reliably autodetect XHTML when sent as
text/html. This is why UAs could not ever treat text/html documents
as XML, even if they did not care about not being usable (see the
first point in this section).

+ You can't sniff for the five characters "<?xml" because:

- The <?xml ... ?> header is optional per Appendix C, and it is
recommended not to include it as it causes IE6 to trigger
quirks mode.

- SGML can also contain PIs (see the example below).

+ You can't trigger from the DOCTYPE since the W3C might introduce
new XHTML DOCTYPEs in future, so you don't know which DOCTYPEs
to look for. (Not to mention that DOCTYPEs are optional for
well-formed XHTML documents, DOCTYPE parsing is hard, DOCTYPEs
may be hidden in comments, and DOCTYPE sniffing has been called
harmful by many leading figures at the W3C and elsewhere.)

+ You can't trigger off the "<html xmlns" string because it might
be there but hidden in a comment (you'd need a complete XML
parser to step past comments, PIs, internal subsets, etc).

e.g. what language is this text/html document in?:

<?xml this is not?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN"
[ <!-- SYSTEM "not XHTML" --> ]>
<!-- -- -->
This is a comment. This document is not XHTML.
<html xmlns="http://www.w3.org/1999/xhtml"/>
Ok, I'm done now. -->
<html>
<title> Need a title in HTML4! </title>
<p> This is a valid HTML4 document.
</html>

* Even if you could detect XHTML, what do you do with a document that
is not well formed (such as the example above)? If you fall back on
HTML4, then there is no advantage to using an XML processor, and you
might as well always treat it as HTML4.

* The HTML working group said that UAs should not do this:
http://lists.w3.org/Archives/Public/www-html/2000Sep/0024.html

The advantages of XHTML
-----------------------

When sent as application/xhtml+xml, XHTML has several advantages:

1. UAs will immediately catch well-formedness errors

2. XHTML content will be able to be mixed-and-matched with content
from other well-known namespaces (in particular, MathML).

3. Tools interacting with XHTML documents are guarenteed a
well-formed document.

4. (with XHTML2) A broader vocabulary.

5. XHTML content can be parsed with a simpler parser than tag soup
can, and a _much_ simpler parser than SGML can.

However, none of these apply when an XHTML document is sent as
text/html, and since authors feel their pages should be readable on
the most popular Web browser, which does not support
application/xhtml+xml, there is basically no point in using XHTML at
the moment.

Conclusion
----------

There are few advantages to using XHTML if you are sending the content
as text/html, and many disadvantages.

In addition, currently, the majority (over 90% by most counts) of the
UA market is unable to correctly render real XHTML content sent as
text/xml (or other XML MIME types). For example, point IE at:

http://www.mozillaquestquest.com/

Only Mozilla, Mozilla-based browsers such as Netscape 6 and 7, recent
versions of Opera, and Safari, are able to correctly render that site.
(IE6 shows a DOM tree!)

Authors who are not willing to use one of the XML MIME types should
stick to writing valid HTML 4.01 for the time being. Once user agents
that support XML and XHTML sent as one of the XML MIME types are
widespread, then authors may reconsider learning and using XHTML.

(Advanced authors should also see appendix B.)

Further Reading
---------------

I wrote another document on a related matter: people wanting UAs to
treat XHTML documents sent as text/html as XML and not tag soup.

http://www.damowmow.com/playground/xhtml-in-uas.xhtml

Henri Sivonen wrote a similar document asking what is the point of
XHTML:

http://www.hut.fi/u/hsivonen/xhtml-the-point

There are also many mailing list posts on this matter, e.g. on
www-talk. The following post summarises some issues relating to using
text/html for XHTML content containing XML extensions:

http://lists.w3.org/Archives/Public/www-talk/2001MayJun/0046.html

Some people have run into the problems this document mentions, for
example:

http://flrant.com/index.php?id=P21

There are also some interesting points made in other posts, for
example:

| > But does Mozilla call its xml parser for http://www.w3.org/ ?
|
| Nope. If it did, it would render the page without any expanded
| character entity references, since Mozilla is not a validating
| parser and thus skips parsing the DTD and thus doesn't know what
| &nbsp;, &middot; and &copy; are. Not to mention that it would end up
| ignoring the print-media specific section of the stylesheet, which
| uses uppercase element names and thus wouldn't match any of the
| lower case elements (line 138 of the first stylesheet), and it would
| use an unexpected background colour for the page because the
| stylesheet sets the background on <body> and not <html>, which in
| XHTML will result in a different rendering to the equivalent in
| HTML4 (same sheet, line 5).
-- http://lists.w3.org/Archives/Public/www-talk/2001MayJun/0004.html

Or this post, near the end of the thread:

| I'm still looking for a good reason to write websites in XHTML _at
| the moment_, given that the majority of web browsers don't grok
| XHTML. The only reason I was given (by Dan Connolly [1]) is that it
| makes managing the content using XML tools easier... but it would be
| just as easy to convert the XML to tag soup or HTML before
| publishing it, so I'm not sure I understand that. And even then,
| having the content as XML for content management is one thing, but
| why does that require a minority of web browsers to have to treat
| the document as XML instead of tag soup? What's the advantage of
| doing that? And even _then_, if the person in control of the content
| is using XML tools and so on, they are almost certainly in control
| of the website as well, so why not do the content type munging on
| the server side instead of campaigning for UA authors to spend their
| already restricted resources on implementing content type sniffing?
|
| [1] http://lists.w3.org/Archives/Public/www-talk/2001MayJun/0031.html
-- http://lists.w3.org/Archives/Public/www-talk/2001JulAug/0005.html

Appendix A: application/xhtml+xml
---------------------------------

See: http://ln.hixie.ch/?start=1036767231&count=1

Appendix B: Advanced Authors
----------------------------

Some advanced authors are able to send back XHTML as
application/xhtml+xml to UAs that support it, and as text/html to
legacy UAs.

Assuming you are using XHTML 1.0 compliant to Appendix C (or have
otherwise checked that the XHTML 1.0 you send is compatible with Tag
Soup processors), then that's fine. All I am saying in this document
is that sending XHTML as text/html ONLY is harmful.

Note: Sending XHTML 1.1 as text/html is NEVER fine. There is no spec
that allows this. Sending XHTML 2.0 as anything in a production
(non-testing) context is NEVER fine either, since that spec has not
reached CR yet.

Also note that I would personally suggest that even advanced authors
not use XHTML sent as text/html, since many authors copy and paste
markup from others and thus may easily end up copying the valid XHTML
markup but using it as HTML4.

Appendix C: Acknowledgements
----------------------------

Thanks to Nick Boalch for the abstract. Thanks to Dan Connolly for
pedancy that has improved the quality of this document. Thanks to Ted
Shaneyfelt and many others for suggesting improvements to the text.

Appendix D: Footnotes
---------------------

[1] The term "handled as tag soup" refers to the fact that UAs
typically are very lenient in their error handling, and do not support
any of the "advanced" SGML features. For example, browsers treat the
string "<br/>" as "<br>" and not "<br>&gt;", the latter being what
SGML says they should do. Similarly, real world UAs have no problem
dealing with content such as " foo bar baz " even
though according to the HTML4 spec that is meaningless.

i am not here

Tags:
Hugo
Hugo's picture
Offline
Moderator
London
Last seen: 28 weeks 5 days ago
London
Joined: 2004-06-06
Posts: 15650
Points: 2788

XHTML as &quot;text/html&quot; considered harmfull.

Long post but interesting, and to my mind indicitive of the state of things at the moment; are their any conclusions to draw from all that, are we supposed to infer that we are all doing it wrong and should switch back to html 4 and not use xhtml or are we meant to reset our mime types, seem to problems either way we play it! just makes me feel that once again I'm doing everything wrong and don't have a clue Sad
having depressed me with that post tommuir what are your views on the subject? lets hear some thoughts on the topic!

Hugo.

Before you make your first post it is vital that you READ THE POSTING GUIDELINES!
----------------------------------------------------------------
Please post ALL your code - both CSS & HTML - in [code] tags
Please validate and ensure you have included a full Doctype before posting.
Why validate? Read Me

crazybat
crazybat's picture
Offline
Enthusiast
Surrey, British Columbia, Canada
Last seen: 5 years 34 weeks ago
Surrey, British Columbia, Canada
Joined: 2004-08-17
Posts: 58
Points: 0

XHTML as &quot;text/html&quot; considered harmfull.

Here's my thoughts...for what they are worth Smile

I don't think we are doing things wrong by creating a document that conforms to XHTML. In fact, I think we are embracing a semantic web that will someday (and I stress the word 'someday') become the norm.

However, we should be aware of the current state of things. Certain popular user agents are accepting of the 'text/html' content-type (*cough cough Internet Explorer! cough*) no matter how you design your page. So, even if you do create the perfect valid XHTML document, it's just tag soup.

So what I do is content negotiation that looks at what content-type the user agent has the highest 'q' value for. If the user agent likes 'application/xhtml+xml', then the user agent gets XHTML 1.1. If it's a browser likes 'text/html', then it's HTML 4.01 strict.

On another note of interest, if you do happen to have a valid HTML 4.01 Strict document and you are currently using CSS for presentation, there wouldn't be that much work changing over to valid XHTML. (ie: <br> to <br />, <img src="x.jpg" /> vs. <img src="x.jpg" />, etc.)

I think as long as you are aware of the current user agent situations and have your head around the direction of the W3C, then you are far ahead of the game.

my $0.02 Smile

Crazy Bat Designs
Home of the phpBB WASO
Helping You Reach The MOST People Possible.

tommuir
Offline
Regular
Upper east side
Last seen: 7 years 32 weeks ago
Upper east side
Joined: 2004-04-05
Posts: 40
Points: 0

XHTML as &quot;text/html&quot; considered harmfull.

Hugo wrote:
Long post but interesting, and to my mind indicitive of the state of things at the moment; are their any conclusions to draw from all that, are we supposed to infer that we are all doing it wrong and should switch back to html 4 and not use xhtml or are we meant to reset our mime types, seem to problems either way we play it! just makes me feel that once again I'm doing everything wrong and don't have a clue Sad
having depressed me with that post tommuir what are your views on the subject? lets hear some thoughts on the topic!

Hugo.

Well it's somthing I always suspected, tbh. I think, by using XHTML, we're all a bit ahead of ourselves, or more precisely, the infastructure needed to write the language. It's barely existant. I dunno... I'm happy to write HTML 4 untill I know a little more about XML, itself. I don't think there's anything I've ever used XHTML for that HTML 4 can't do. Having read it, I belive I need to buy several huge books on XHTML and XML... I feel ignorant, tbh.
Thought I'd be a good article for you guys though.

i am not here

seb
Offline
Enthusiast
Last seen: 7 years 26 weeks ago
Joined: 2003-09-20
Posts: 208
Points: 0

XHTML as &quot;text/html&quot; considered harmfull.

Well, I code as XHTML 1.1 for no real reason other than I get a warm feeling from knowing my docs are to the latest standard and validate correctly. I serve them as text/html I guess, never delved into IIS to change anything.

All works fine at the moment, I'll try not to worry myself unduly!

Fruitcake
Offline
Enthusiast
Perth, Australia
Last seen: 6 years 17 weeks ago
Perth, Australia
Timezone: GMT+8
Joined: 2004-04-12
Posts: 257
Points: 0

XHTML as &quot;text/html&quot; considered harmfull.

Well.

I'm kind of confused what to do now Tongue

As far as i know i write in XHTML 1.0 and i use the content-type automatically given to me by a html editor (making sure it relates to xhtml in some way of course).

Heh. I can design webpages, but i'm still a n00b in most respects Wink
I wouldn't mind if anyone could point me in the right direction to learn the theoretical side of things?

thanks,
Dan.

I am Dan, Dan I am.

MaxJ
Offline
Regular
Last seen: 10 years 48 weeks ago
Joined: 2003-08-28
Posts: 36
Points: 0

XHTML

I totally disagree with the logic of this argument. I don't see many real world problems with "well written" XHTML documents.

A web browser should do the best job possible of presenting the content. If the content is "tag soup", as the author puts it, the browser should do the best it can. (And let's face it, the current browsers do a pretty good job.)

Imagine a browser that gave the message:

Quote:
"Sorry, but the page you requested does not validate to W3C standards. No attempt will be made to display the content. I suggest you contact the author of the page and tell them to correct all the parsing errors."

It would be laughable. Nobody in their right minds would use such a browser, because only a tiny minority of pages in the real world have zero parse errors. Anyone who wrote a "standards only" browser that couldn't deal with tag soup (and it's not difficult to write a tag soup parser) would be a complete idiot.

Max

capmexbiz
Offline
Regular
Last seen: 9 years 43 weeks ago
Joined: 2004-09-29
Posts: 14
Points: 0

XHTML as &quot;text/html&quot; considered harmfull.

We do our best writing compliant pages. It's a first step. As problems arise we will be able to upgrade our pages with only minor changes.

By now I haven't seen any problems.

For me it'll be totally out of place to go back to Html 4.01.

Webmaster Resources for Business Websites

Stu
Stu's picture
Offline
Enthusiast
Bristol uk
Last seen: 10 years 27 weeks ago
Bristol uk
Joined: 2004-01-20
Posts: 282
Points: 0

XHTML as &quot;text/html&quot; considered harmfull.

1. Write xhtml and get it validated.
2. Serve it up as ~
<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" />
so that browsers which understand will get it right (IE will catch up one day).
3. Don't mess around with php trying to browser sniff and modify the doctype to suit.

That's the way I do it.

It's not what you do it's the way that you do it.
So do it with STYLE
http://www.s7u.co.uk

MaxJ
Offline
Regular
Last seen: 10 years 48 weeks ago
Joined: 2003-08-28
Posts: 36
Points: 0

That's the way to do it (NOT)

Stu,

Do you have an example of a commercial site that serves its pages as application/xhtml+xml? Such a site would not work in any version of Internet Explorer according to the above article.

I think that if you check your own site you will find that the server is sending out the HTTP header as text/html, so the application/xhtml+xml in your meta gets ignored. (If in doubt, why not set the HTTP header using PHP (or whatever) to application/xhtml+xml and see what happens????)

Max

klaus
Offline
Enthusiast
Last seen: 9 years 34 weeks ago
Joined: 2004-11-28
Posts: 67
Points: 0

XHTML as &quot;text/html&quot; considered harmfull.

From w3.org

Quote:

XHTML 1.0 can also be served as XML, and XHTML 1.1 is always served as XML. To serve XHTML as XML you use one of the MIME types application/xhtml+xml, application/xml or text/xml. The W3C recommends that you serve XHTML as XML using only the first of these MIME types - ie. application/xhtml+xml.

http://www.w3.org/International/articles/serving-xhtml/

- however on w3.org they use:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> :roll:

Based on valid xhtml1.0 strict and valid css there is no difference in appearance across any of the below browsers:

Macintosh OSX 10.3
Mozilla 1.6
Netscape 7.0
Safari 1.2
Explorer 5.2

Red Hat Linux 8.0
Mozilla 1.6
Netscape 7.0
Konqueror 3.0.5

Windows 2000 and XP
AOL 9.0
Firefox 1.0
FireFox 1.0PR
Mozilla 1.7.3
Netscape 6.2
Netscape 7.0
Netscape 7.1
Opera 6.0
Opera 7.0
Opera 7.1
IE 5
IE 5.5
IE 6.0

The server that I tested on is serving content as text/html however I have added the Meta tag:

<meta http-equiv="content-type" content="application/xhtml+xml; charset=iso-8859-1" />

If I set the server to serve application/xhtml+xml then Internet Explorer refuse to show the pages - instead it attempts to download the page which also fails.

other browsers react as:

Macintosh OSX 10.3
Mozilla 1.6 OK
Netscape 7.0 OK
Safari 1.2 OK
Explorer 5.2 ERROR
Opera 5.2 ERROR

Red Hat Linux 8.0
Mozilla 1.6 OK
Netscape 7.0 OK
Konqueror 3.0.5 ERROR

Windows 2000 and XP
AOL 9.0 ERROR
Firefox 1.0 OK - some minor design issues
Mozilla 1.7.3 OK
Netscape 6.2 OK- some minor design issues
Netscape 7.0 OK
Netscape 7.1 OK
Opera 6.0 ERROR
Opera 7.0 ERROR
Opera 7.1 ERROR
Opera 7.5 OK
IE 5 ERROR
IE 5.5 ERROR
IE 6.0 ERROR

However - as stated above - by serving normal text/html from the server it works perfectly on all above browsers

For those intestested...

Cheers
Klaus

Hugo
Hugo's picture
Offline
Moderator
London
Last seen: 28 weeks 5 days ago
London
Joined: 2004-06-06
Posts: 15650
Points: 2788

XHTML as &quot;text/html&quot; considered harmfull.

Yes they are serving up text/html as they should be, after all the content is html even if you add the x at the front If you read further through the recommendations they state that xhtml 1.0 may be served up as either text/html or application/xhtml+xml .
Unless for some reason you need the added functionality that xml brings you would serve up as text/html . if serving up as xml you have to include the xml prolog which causes IE and others to switch into Quirks Mode also using the meta tag with app/xhtml+xml should have no effect as xml ignores meta tags.

Hugo.

Before you make your first post it is vital that you READ THE POSTING GUIDELINES!
----------------------------------------------------------------
Please post ALL your code - both CSS & HTML - in [code] tags
Please validate and ensure you have included a full Doctype before posting.
Why validate? Read Me

Shaneyfelt
Offline
newbie
Last seen: 9 years 23 weeks ago
Timezone: GMT-10
Joined: 2005-02-13
Posts: 2
Points: 0

Views and opinions expressed are not necessarily those...

The credit in the original post is a bit misleading,
as many of my suggestions for improvement
were not followed.

Anonymous
Anonymous's picture
Guru

XHTML as &quot;text/html&quot; considered harmfull.

crazybat wrote:
I don't think we are doing things wrong by creating a document that conforms to XHTML. In fact, I think we are embracing a semantic web that will someday (and I stress the word 'someday') become the norm.
Agreed. Adhering to standards will allow us to progress. The only other option is to just give up and that's no option at all.

Shaneyfelt
Offline
newbie
Last seen: 9 years 23 weeks ago
Timezone: GMT-10
Joined: 2005-02-13
Posts: 2
Points: 0

XHTML as &quot;text/html&quot; considered harmfull.

I don't want to come across as fully supporting
views in a document with credits to my name
as a contributor, so here's some clarification

> Abstract
> --------
>
> A number of problems resulting from the
> use of the text/html MIME type
> in conjunction with XHTML content are
> discussed. It is suggested that
> XHTML delivered as text/html is broken

"It's little and broken, but still good... yeah - still good" - Stitch

There are workarounds for the problems, so
"don't throw the baby out with the bathwater."

An early version of this very article was quoted to me when I
tried to get Composer (Mozilla's editor) developers add some
support for XHTML or at least well-formed documents. It may
be partly responsible for the incredible delay. Will the standard
be a decade old before XML is finally implemented?

>, so authors intending their work for public consumption
> should stick to HTML 4.01

I disagree. Advise like this is probably counterproductive to
public adoption of standards?

> Context
> -------
>
> It has since been regularly updated to
> correct errors that have been
> brought up in various mailing lists and other
> discussion forums.

My main suggestion is to tell how we can
adapt our XHTML to workaround the
problems rather than throwing out XHTML
for public consumption.

Also, don't assume web page authors
have the ability to change how their pages are
served. In some cases they can only
create static documents without control over the
server.

> Note that this document compares
> XHTML 1.0 compliant to appendix C to
> HTML 4.01, because that is the only
> variant of XHTML that may be sent
> as text/html.

Yes, Appendix C. should be followed

> Executive Summary
> -----------------
>
> If you use XHTML, you should deliver
> it with the application/xhtml+xml
> MIME type. If you do not do so, you should
> use HTML4 instead of XHTML.
> The alternative, using XHTML but delivering
> it as text/html, causes
> numerous problems that are outlined below.

Workarounds are outlined below as well.

> Unfortunately, IE6 does not support
> application/xhtml+xml (in fact, it
> does not support XHTML at all).

So we should use text/html instead.

> Why using text/html for XHTML is bad
> ------------------------------------

Or not!

> What usually happens to authors
> who decide to send XHTML as text/html
> is the following

Irrelevant. They should be educated to
write correct XHTML using whatever
browser workarounds are necessary.
If you never jump to XHTML, many tools that
require well-formedness will fail. (take XSLT,
for example, or parsers for many
programming languages that could take
data hot off the web if it were only in some
XML format - they shouldn't be forced to
tidy the page before parsing it or to use
lax parsers)

> 1. Authors write XHTML that makes
> assumptions that are only valid for
> tag soup

If the authors make bad assumptions,
you should clarify those assumptions rather
than advising authors to stick with obsolete standards.

> 2. Authors find everything works fine.
>
> 3. Time passes.
>
> 4. Author decides to send the same
> content as application/xhtml+xml,
> because it is, after all, XHTML.
>
> 5. Author finds site breaks horribly.
> (See below for a list of
> reasons why.)
>
> 6. Author blames XHTML.

Then a page is needed telling the correct
way to write XHTML that can be served as
either content-type.

> […]
>
> SPECIFIC PROBLEMS
>
> These are the issues that affect
> documents when they are switched from
> text/html to application/xhtml+xml
>
> * <script> and <style> elements in XHTML
> sent as text/html have to be
> escaped using ridiculously complicated strings.

Two choices
--either--
Use external stylesheets
--or--
One day (maybe today) phase out the
escape sequences and target your page for the
maybe 96.5+% of browsers
(NN4+,Moz,FF,Opera,IE5+) that know
about the style and script tags and simply
don't escape (maybe someone with Mosaic 1.0 will have
to ignore some text, but then they won't have
table or PNG support, either.)

> This is because in XHTML, <script>
> and <style> elements are #PCDATA
> blocks, not #CDATA blocks, and
> therefore <!-- and --> really _are_
> comments tags, and are not ignored
> by the XHTML parser. To escape
> script in an XHTML document which
> may be handled as either HTML4 or
> XHTML, you have to use
>
> <script type="text/javascript"><!--
> //--><![CDATA[//><!--
>...
> //--><!]]></script>
>
> To embed CSS in an XHTML document
> which may be handled as either
> HTML4 or XHTML, you have to use
>
> <style type="text/css"><!--/*-->
> <![CDATA[/*><!--*/
>...
> /*]]>*/--></style>
>
> Yes, it's pretty ridiculous. If
> documents _aren't_ escaped like
> this, then the contents of
> <script> and <style> elements get
> dropped on the floor when
> parsed as true XHTML.

External stylesheets and scripts avoid the whole issue.

> (This is all assuming you want
> your pages to work with older
> browsers as well as XHTML
> browsers. If you only care about XHTML
> and HTML4 browsers, you can make it a bit simpler.)
>
> * A CSS stylesheet written for
> an HTML4 document is interpreted
> slightly differently in an XHTML
> context (e.g. the <body> element
> is not magical in XHTML, tag names
> must be written in lowercase in
> XHTML). Thus documents change
> rendering when parsed as XHTML.

Workarounds add appropriate html, body
rules to the style. Not a serious enough
issue to stop writing XHTML..

> * A DOM-based script written for an
> HTML4 document has subtly
> different semantics in an XHTML
> context (e.g. element names are
> case insensitive and returned in
> uppercase in HTML4, case sensitive
> and always lowercase in XHTML;
> you have to use the namespace-aware
> methods in XHTML, but not in HTML4).
> BUT, if you send your
> documents as text/html, then they
> will use the HTML4 semantics
> DESPITE being XHTML! Thus, scripts
> are highly likely to break when
> the document is parsed as XHTML.

That depends. Do it correctly (always
use lowercase text as required by XHTML,
etc.) to minimize problems. And what is
new about having to test your code for
portability?

> * Scripts that use document.write()
> will not work in XHTML contexts.
> (You have to use DOM Core methods.)

This is browser dependant.

> * Current UAs are, for text/html content,
> HTML4 user agents (at best)
> and certainly not XHTML user agents.
> Therefore if you send them
> XHTML you are sending them content
> in a language which is not
> native to them, and instead
> relying on their error handling. Since
> this is not defined in any specification,
> it may vary from one user
> agent to the other.

You're giving the w3c a hard time here.
Appendix C helps make code work with
existing UA's even though the standards
for which they were built to comply didn't
define what happens, for example,
when "/>" is encountered.

> * XHTML documents that use
> the "/>" notation, as in "<link />" have
> very different semantics when parsed
> as HTML4. So if there was to
> be a fully compliant HTML4 UA, it would
> be quite correct to show
> ">" characters all over the page.

This is more of a theoretical argument
than a practical one. Anyone who makes a
new UA to the old standards knows the
situation and should take this into
consideration.

> For more details on this see the
> third bullet point in the section
> entitled "The Myth of "HTML-compatible
> XHTML 1.0 documents".

Or more clearly, "The Practicalities of
Serving XHTML Documents that are
Compatible with Existing User Agents
that were Designed for HTML." The goal
that the W3C met was to be backwards
compatible with the tools, even if that means
working with an undefined area of the
old standard. In this way, older software
unknowingly sometimes gets it right for
newer web pages.

> COPY AND PASTE
>
> The worst problem, and the main
> reason (I suspect) for most of the
> REALLY invalid XHTML pages out
> there, is that authors who have no clue
> about XHTML simply copy and pasted
> their DOCTYPE from another
> document. […]

So this is the reason for delaying
XHTML until MS decides to change their
browser? You're probably unwittingly
giving them more power to stifle adoption of
new standards than you realize.

> Why trying to use XHTML and then sending it as text/html is bad
> ---------------------------------------------------------------
>
> These are not likely to be problems
> for authors who regularly validate
> their pages, but other authors will
> run into these problems.
[…]
Then ask people to validate, don't give up XHTML

> The Myth of "HTML-compatible XHTML 1.0 documents"
> -------------------------------------------------
>
> RFC 2854 spec refers to "a profile
> of use of XHTML which is compatible
> with HTML 4.01". There is no such
> thing. Documents that follow the
> guidelines in appendix C are not
> valid HTML 4.01 documents. They just
> happen to be close enough that
> tag soup parsers are able to handle
> them just like most of the other pages
> on the Web.
>
> The simplest examples of this are
>
> * The "/>" empty tag syntax actually
> has totally different meaning in
> HTML4. (It's the SHORTTAG minimisation
> feature known as NET, if I
> recall the name correctly.) Specifically,
> the XHTML
>
> <p> Hello <br /> World </p>
>
>...is, if interpreted as HTML4, exactly equivalent to
>
> <p> Hello <br>&gt; World </p>
>
>...and should really be rendered as
>
> Hello
> > World

Not so common in practice.

> * Script and style elements cannot
> have their contents hidden from
> legacy UAs. The following XHTML

If this is important to you, use external stylesheets.

> * The "xmlns" attribute is invalid HTML4.

So it won't validate as HTML4. Big deal.
It will validate as XHTML instead.

> * The XHTML DOCTYPEs are
> not valid HTML4 DOCTYPEs.

So it won't validate as HTML4.
It will validate as XHTML instead.

> Why UAs can't handle XHTML sent as text/html as XML
> ---------------------------------------------------
>
> * Documents sent as text/html are
> handled as tag soup by most UAs.
> This means that authors are not checking for validity,

Logic error. Maybe they're lazy or unaware,
or (most likely) don't have time to
mess with it, that's all.

> and thus
> most XHTML documents on the web
> now are invalid. A conforming XML
> UA would thus be unable to show as
> many documents as current UAs,
> and would therefore never get enough
> marketshare to be relevant.

That's peculation, and a possible invalid
assumption that such a UA is what is needed.

> * It is impossible to reliably
> autodetect XHTML when sent as
> text/html. This is why UAs could
> not ever treat text/html documents
> as XML, even if they did not care
> about not being usable (see the
> first point in this section).

Insufficient reason for not updating
pages to good XHTML.

> * Even if you could detect XHTML,
> what do you do with a document that
> is not well formed (such as the
> example above)? If you fall back on
> HTML4, then there is no advantage
> to using an XML processor, and you
> might as well always treat it as HTML4.

This reasoning won't help pages get
well-formed, it keeps people using outdated
HTML 4.0, thinking there's something
wrong with making their documents well-
formed. The W3C perhaps should have
defined well-formed tags in a well-formed
HTML, but they chose to allow older UAs
to go about processing a truly valid
XHTML document as if it were an HTML
document, and, in many cases, format it
reasonably well.

> * The HTML working group said
> that UAs should not do this
> http//lists.w3.org/Archives/Public/www-html/2000Sep/0024.html

Their point seems to be that the browser shouldn't
sniff, not that people shouldn't write pages
that will work well in either case without sniffing.

There's nothing wrong with writing
XHTML, even if you know a browser
unfamiliar with the site will treat it as
well-formed HTML. Other tools that may be
programmed in advance for that site
could still make use of the well-formed XML
structure of the document if you do it right.
Don't preclude this possibility.

> The advantages of XHTML
> -----------------------
>
> When sent as application/xhtml+xml,
> XHTML has several advantages
>
> 1. UAs will immediately catch
> well-formedness errors
>
> 2. XHTML content will be able to be
> mixed-and-matched with content
> from other well-known namespaces
> (in particular, MathML).
>
> 3. Tools interacting with XHTML documents
> are guarenteed a
> well-formed document.
>
> 4. (with XHTML2) A broader vocabulary.
>
> 5. XHTML content can be parsed with a
> simpler parser than tag soup
> can, and a _much_ simpler parser than SGML can.
>
> However, none of these apply when
> an XHTML document is sent as
> text/html, and since authors feel their
> pages should be readable on
> the most popular Web browser,
> which does not support
> application/xhtml+xml, there is basically
> no point in using XHTML at
> the moment.

Very bad reasoning. If you follow this
to its logical conclusion, we will wait
indefinitely for the non-conforming
heavyweight web browser, and they will have no
reason to deliver, thus ensuring people
continue depending on their quirks. Let's
get on with the new standard now that
half a decade has already passed since the
standard was approved.

>
> Conclusion
> ----------
>
> There are few advantages to using XHTML
> if you are sending the content
> as text/html, and many disadvantages.

I disagree

> In addition, currently, the majority (over 90%
> by most counts) of the UA market is unable
> to correctly render real XHTML content sent as
> text/xml (or other XML MIME types). For example,
> point IE at

That figure is declining fast. Now FireFox alone has
maybe 20% of the market share. Writing pages in
XHTML will help force competition among the User Agents.
That's competition to meet standards. And that's good
competition.

> http//www.mozillaquestquest.com/
>
> Only Mozilla, Mozilla-based browsers
> such as Netscape 6 and 7, recent
> versions of Opera, and Safari, are
> able to correctly render that site.
> (IE6 shows a DOM tree!)

You could write an XSL Transform to make
IE6 display it, but what's the point?
After all, its intended audience seems
to be strictly Mozilla users. (Perhaps that
IE bug was exploited on purpose).

> Authors who are not willing to use
> one of the XML MIME types should
> stick to writing valid HTML 4.01 for
> the time being. Once user agents
> that support XML and XHTML sent
> as one of the XML MIME types are
> widespread, then authors may
> reconsider learning and using XHTML.

Don't mix up authors and webmasters.
The author on a large system may have no
authority whatsoever to dictate what the
webmaster does. Learn to write good
XHTML that also works for older user agents.
That's all.

> (Advanced authors should also see appendix B.)
>
>
> Further Reading
> ---------------
> […]
>
>
> | I'm still looking for a good reason
> | to write websites in XHTML _at
> | the moment_, given that the majority
> | of web browsers don't grok
> | XHTML. The only reason I was given
> | (by Dan Connolly [1]) is that it
> | makes managing the content using
> | XML tools easier... but it would be
> | just as easy to convert the XML to
> | tag soup or HTML before
> | publishing it, so I'm not sure I
> | understand that.

Here's a reason If you publish information,
it would be nice if I could set up a web
page that could use that information
without having to look through the entire
document each time (something like transclusion).
XML formats like XHTML are part of the solution.
Unfortunately, for the sake of security (keeping sites from
mimicking others), this kind of functionality is
largely disabled in XSLT. Perhaps one day people
will not be so naive with the security problems
and these tools can be made to work correctly
directly by the browser (without tidy in the middle)
instead of giving a false sense of security with
silly bandaids. (IMHO ecommerce sites are in general
irresponsibly luring naive customers by failing to alert
them of risks.)
[…]

> Appendix B Advanced Authors
> ----------------------------
>
> Some advanced authors are able to send back XHTML as
> application/xhtml+xml to UAs that support it, and as text/html to
> legacy UAs.
>
> Assuming you are using XHTML 1.0
> compliant to Appendix C (or have
> otherwise checked that the
> XHTML 1.0 you send is compatible with Tag
> Soup processors), then that's fine.
> All I am saying in this document
> is that sending XHTML as text/html ONLY is harmful.

That last sentence is confusing
because of its ambiguity. After a moment of
reflection, I think it means you need to
be more careful than just send any XHTML
document as-is as text/html.

> Appendix C Acknowledgements
> ----------------------------
>
> Thanks to Nick Boalch for the abstract.
> Thanks to Dan Connolly for
> pedancy that has improved the quality
> of this document. Thanks to Ted
> Shaneyfelt and many others for
> suggesting improvements to the text.

Now you have the rest of the story
about improvements suggested for the text.

_-T

Hugo
Hugo's picture
Offline
Moderator
London
Last seen: 28 weeks 5 days ago
London
Joined: 2004-06-06
Posts: 15650
Points: 2788

XHTML as &quot;text/html&quot; considered harmfull.

Shaneyfelt,
Many thanks for taking the time to clarify the original article, what a shame your 'improvements' were not included in the original which I did have many problems with and have since spent some time clarifying for myself the real state of play.
There seems to be much confusion amongst people on this subject and I worry when articles tend to try and put people off using XHTML especially the shouldn't serve xhtml as text/html which is just too sweeping a statement and contrary to W3C view that you may for XHTML 1.0. I would much rather people write in the disciplined way of XHTML and help to push standards adoption forward.
All the perceived problems are minor and can be dealt with.

Again thanks for conveying that which should have been in the original article.

Hugo.

Before you make your first post it is vital that you READ THE POSTING GUIDELINES!
----------------------------------------------------------------
Please post ALL your code - both CSS & HTML - in [code] tags
Please validate and ensure you have included a full Doctype before posting.
Why validate? Read Me

Anonymous
Anonymous's picture
Guru

XHTML as &quot;text/html&quot; considered harmfull.

So, serving XHTML 1.1 Strict as text/html is a bad thing? Am I misunderstanding this?

I've done it and tested in every browser at my disposal...I need to read up on this a bit.

Hugo
Hugo's picture
Offline
Moderator
London
Last seen: 28 weeks 5 days ago
London
Joined: 2004-06-06
Posts: 15650
Points: 2788

XHTML as &quot;text/html&quot; considered harmfull.

triumph you must not serve up XHTML 1.1 as text/html it has to have the mime type application/xhtml+xhtml and requires the use of the xml prolog line before the DTD, certain browsers just don't understand and won't parse xhtml+xml and the use of the prolog switches IE into quirks mode (bad thing) unless you specifically require the xml features
it is best to stick to XHTML 1.0 which can be served up as text/html .

This link may help in understanding ( read all the sub links as well)

http://www.w3.org/International/articles/serving-xhtml/

Hugo

Before you make your first post it is vital that you READ THE POSTING GUIDELINES!
----------------------------------------------------------------
Please post ALL your code - both CSS & HTML - in [code] tags
Please validate and ensure you have included a full Doctype before posting.
Why validate? Read Me

Anonymous
Anonymous's picture
Guru

XHTML as &quot;text/html&quot; considered harmfull.

Hugo wrote:
triumph you must not serve up XHTML 1.1 as text/html it has to have the mime type application/xhtml+xhtml and requires the use of the xml prolog line before the DTD, certain browsers just don't understand and won't parse xhtml+xml and the use of the prolog switches IE into quirks mode (bad thing) unless you specifically require the xml features
it is best to stick to XHTML 1.0 which can be served up as text/html .
I will stick to 1.0 since I know that now. Even though I've run into zero problems serving XHTML 1.1 as text/html I don't want to tempt fate. Smile