[main] [misc] [graphics] [page design] [site design] [xhtml] [css] [xml] [xsl] [schema] [javascript] [php] [mysql]

HVCC Home
Blackboard HVCC
Blackboard Manual
Faculty Association

xhtml main
1. what is xhtml
a. xhtml hello world
b. what is html
c. html syntax
d. types of html tags
* e. what html is not
f. xml and xhtml
g. html to xhtml
2. xhtml document basics
3. xhtml basics
4. xhtml special chars
5. xhtml attributes
6. xhtml hyperlinks
7. xhtml images
8. xhtml tables
9. xhtml forms
10. xhtml frames
11. xhtml meta content


print version

Note that all external links will open up in a separate window.

This is a stripped down version of these pages for older browsers. These pages are really meant to be viewed in a standards compliant browser.

Directions for surfing with audio.

What HTML is Not

These tutorials are about XHTML, the Extensible Hypertext Markup Language.

What HTML is Not

HTML is a very forgiving language. This is because most current browsers are also forgiving and are usually able to take a best guess at what you meant to say, even of you didn't write it out entirely correctly. This helped the growth of HTML in its early days because it made it easy to learn just by copying someone else's code and experimenting. This also meant it was easy to learn bad habits.

well-formed: a well formed documents adheres to the rules of syntax for the language it is written in

In contrast, XHTML requires that documents be well-formed, which is to say that they have to be coded with the exacting precision of a normal programming language. Therefore, the first good coding habit is to not take advantage of the leniency of HTML. One of the reasons for learning XHTML is that it helps to prevent sloppy coding.

Although people have come up with very innovative ways to get HTML to do some very interesting things, HTML is, by definition, a document mark-up language. It is a tool for structuring primarily text documents primarily for online viewing. The advanced formatting features of high end word processors and desk-top publishing packages are not meant to be part of HTML, although people have found ways to mimick many of these special features in HTML.

HTML is meant to structure content not style documents

Content should always be the primary concern in an HTML document, appearance secondary. This is an important point not only because of what HTML was designed to do, but also because no two browsers will show a Web page in quite the same way. Trying to format a document to exacting standards in HTML is bound to failure. Instead you have to aim for a general structure that can easily adapt to whatever browser the viewer is using.

HTML is not a magic bullet, it should be used in conjunction with other tools to create documents, not do all the work itself

The HTML 4.0 and 4.01 standards have actively taken a step away from using HTML to do specialized document formatting by deprecating many HTML formatting commands in favor of using Cascading Style Sheets for specialized document formatting. Cascading Style Sheets can be combined with a server-side scripting language such as JavaScript to allow you to specify browser-specific document formatting. This helps to keep HTML focused on being a document presentation language.

The XHTML 1.0 standard has taken this a step further by deprecating almost all of those elements of HTML that are used for styling but have CSS equivalents.

Part of the purpose of HTML 4.01, and subsequently, XHTML 1.0 is to take a step toward converting the Web standard from HTML to XML, or Extensible Mark-up Language. XML is an implementation of SGML, the Standard Generalized Markup Language, for use on the Internet.

XHTML is meant as an intermediary step between HTML and XML

SGML was one of the original document creations languages. It still exists and is a very powerful and even more complex document generation tool. HTML 4.01 is also a backwards from older HTML standards to the original SGML standards. The coding standards that made SGML a powerful document processing language did not follow along with the portions of it that were borrowed to create the original version of HTML.

The lack of standards in the old HTML standards (as it were) are the primary motivation for the development of a new set of standards.

Stick to the Standards

To ensure that a Web page will be viewable on multiple browsers, there are some simple steps you can take:

  • Do not use non-standard extensions that are specific to one browser or another.
  • Do not exploit known bugs in the browsers to do special tricks with the pages. This works great until the bug is fixed.
  • Do not use tags for things they aren't intended to do.
  • Just because HTML allows for sloppy code, doesn't mean you should write sloppy code. Instead, think of this convenience as a known bug and avoid it.
  • In other words, stick to the standards.

The Limits of HTML

HTML is, simply put, a sloppy language, and the people who make Web browsers include a great deal of code in that allows the browser to try to guess what the author really meant when they wrote what they did.

This is not a problem for a desktop PC, which normally has plenty of computing power left over for extra processing. This is a problem for systems that have fewer resources, with limited memory and processing power. For instance, Web TV, or Personal Digital Assistants, or even Web-aware cell phones.

In other words, HTML is going to run into problems as technology advances and more user agents are Web-ready.

So what are the problems with HTML?

Too Much Room for Mistakes

HTML is very forgiving of mistakes. If you forget to include an end tag delimiting an element, the browser will try to second guess you based on context. This looseness got its start all the way back with Mosaic, which didn't even know what a paragraph end tag was, and would cheerfully accept something like:

<p>
This is <b>important!</i>
</p>
<cubic>So pay attention!</cubic>

And display it as:

This is important! So pay attention!

What this piece of code actually says is:

<p> == begin a new paragraph
  This is == write out this text
  <b> == begin to boldface the following element
    important! == write out this text
  </i> == stop italicizing the preceeeding element (we aren't italicizing anything, but some older browsers will accept any command to end styling as a command to end all styling)
</p> == end the paragraph
<cubic> == begin a new cubic (there is no such HTML tag, but HTML browsers just ignore what they don't understand.)
  So pay attention! == write out this text
</cubic> == end the cubic

Confused? Come back to this example later and it should make more sense then.

HTML browsers engage in two actions which allow for sloppy code. The first is trying to second guess mistakes, such as missing end tags to delimit elements. The second is simply ignoring things it doesn't understand.

Having a browser ignore things it doesn't understand has its benefits. It helps to keep older browsers from choking on newer commands that came into being after the browser was developed. However, there is no reason to force a browser to do your work for you. If an HTML document is written according to standards, there is not reason for a browser to have to second guess the author.

This looseness on the part of Web browsers has led to a situation where most Web pages out there do not even meet up to the most basic of HTML standards. This is a problem.

Imagine that you have a car where, each time you remove the key from the ignition it randomly rescrambles the ignition lock to one of one hundred different possible combinations, requiring that you always carry all one hundred keys with you. That is what a Web browser goes through opening up new pages that aren't written according to standards. That is also why browsers take up far more resources on your computer than they need to.

Non-Standard Extensions

With the development of the World Wide Web, two players pretty much controlled the browser field. These were, and are, Netscape and Microsoft. The problem with this was that with only two players in the game, there wasn't much incentive to adhere to standards. Instead the two companies tried to repeatedly trump each other by adding non-standard extensions to HTML.

extension: a vendor specific addition to the standards. Extensions are usually proprietary and should be avoided if possible

An extension is a proprietary addition to the standards. Since they are proprietary, other companies cannot make use of them.

This meant that the Web began to develop with two incompatible sets of technologies, both meant to do the same thing. To use extensions, you had to either write for Microsoft Internet Explorer or Netscape Navigator, but not for both.

A major use of Internet scripting languages currently is testing for the type of browser being used and serving up or generating different documents depending on the software the client is using.

With some people writing for Explorer, others for Netscape, and still others trying to code complicated workarounds, the Web is a mess of non-standard code. This has been resolved to some extent by both companies agreeing to adhere to new standards. Netscape has even stopped supporting some of the extensions it developed.

However, this is not a complete answer. Sticking to standards is a good thing. But being able to move beyond current standards is also a good thing. In other words, there has to be room to grow.

HTML is a static language. If you want to change it, you have to rewrite all the programs that read it to be able to handle the changes. This leads either to proprietary extensions or lack of change.

The benefit of XHTML over HTML in this respect, as we will discover later, is that the standards include rules on how to extend the language in a uniform way that will be compatible across platforms.

HTML is a Structural Language

There are three basic types of tags in any markup language.

  • structural
  • stylistic
  • descriptive

One problem with HTML is that is is meant to be a structural markup language, which is good for what it does, but not ideal.

Structural Markup

structural markup: markup that defines the physical structure of a document

HTML, when it was first developed, was developed as a structural markup language. The markup tags were designed to specify the structure of the document and its contents. For instance, tags would be used to indicate heading levels, or whether something was a paragaph or a menu list.

A well written HTML document still starts with the structure. The page may look great, but is it usable if you first put in all the lists, followed by all the paragraphs, followed by the headings, etc.? Probably not. Try reading a book by first going through and only reading the words that begin with an "A", and then with a "B", and so on. You will quickly discover just how important structure is to a text document.

The first rule of creating any type of content is that structure is more important than anything else you do to your document

Text documents work because they have structure.

The first rule of creating any type of content is that structure is more important than anything else you do to your document. The larger the project, the more important this becomes. I will repeat this often.

stylistic markup: markup that defines the appearance of content within a document

Stylistic Markup

Stylistic markup is markup that is specifically meant to change the appearance of some element in the document. As HTML grew, there were many stylistic tags added to the HTML definition, from basics like commands to bold-face or italicize something, to commands that create blinking or moving text.

The use of stylistic markup is discouraged in favor of CSS, which is a Web development language entirely dedicated to stylistic markup of HTML documents.

descriptive markup: markup that describes the nature of the content within a document

Descriptive Markup

Descriptive markup is markup that describes the nature of the content. It is also called semantic markup. In HTML, there are descriptive equivalents to stylistic markup commands that are preferred over those stylistic commands.

For instance, the <i> tag is used to create italicized elements. But what does the author of the page mean by italicizing something?

  • Could the italics be there because they wanted to emphasize (<em>) that point?
  • Or perhaps because it is a term they are defining (<dfn>)?
  • Or maybe because it is a citation (<cite>) for a source being used?

They may all look like italics on the screen, but they all have different meanings. Descriptive markup allows you to specify what those meanings are.

But if they all look the same on the screen, then what is the point of using different tags for each?

They may all look the same on the screen, but imagine that you are not a user, but rather a program trying to catalog keywords on the page. How would you go about it?

Well, if there are items on the page that are marked as definitions terms, then it is a fair bet that they are useful keywords. This is not true of text that is just italicized.

How about another program that is compiling a bibliography of sources for a page?

Stylistic markup in action:

<p>
It is <i>important</i> that
you enter the following code:
</p>

<pre>
if (x == "<i>your name</i>") { qfunc(x); }
</pre>

<p>
This is as per the directions in <i>A Helpful Text</i>
by Russ Russelson.  Remember that <tt>qfunc</tt> 
is a <i>spurious</i> function, which means I 
just made it up.
</p>

Descriptive markup in action

<p>
It is <em>important</em> that 
you enter the following code:
</p>

<p>
<code>
if (x == "<var>your name</var>") { qfunc(x); }
</code>
</p>

<p>
This is as per the directions in <cite>A Helpful
Text</cite> by Russ Russelson.  Remember that 
<code>qfunc</code> is a <dfn>spurious</dfn>
function, which means I just made it up.
</p>

In either case it would display as:

It is important that you enter the following code:

if (x == "your name") { qfunc(x); }

This is as per the directions in A Helpful Text by Russ Russelson. Remember that qfunc is a spurious function, which means I just made it up.

Modern Web pages need to address more than just how the page looks on the screen, they need to address what the page means. There are an increasing number of user agents out there that make use of things other than visual cues to determine the nature of the content. They need to be addressed.

user agent: software that allows the user to interact with the computer to perform some task, such as surf the Web

HTML was created to markup the structure of a document for online viewing on a desktop computer screen. In other words, it was designed to take print documents and format them for viewing on electronic visual media of a certain size.

Since this is how we are used to thinking about computerized text, this seems perfectly straight-forward, but in the world of computing, we are not talking about text, we are talking about information. The question is not how we format text, but how we structure information.

While HTML is a structural markup language, XML, which we will discuss shortly, is a descriptive markup language. It is designed to create documents where the computer processing the document can understand what the parts of the document are and assemble it appropriately.

Content is better organized if it is organized descriptively and the structure is implied in the nature of that description. It makes more sense to organize the content of a bibliography based on what the citations are rather than which of the words are italicized. Visually these usually amount to the same thing on the printed page, but they are entirely different concepts.

The benefits of descriptive markup come from the fact that well structured descriptive markup yields documents that are self-describing. In other words, they not only contain the content, but that also contain a description of the nature of the content.

This ability to be self-referential means that the documents are easier to search electronically, and are more easily adaptable across multiple media and diverse user agents. This increase in platform independence greatly increases the accessibility of the document.

XML is a descriptive language, and many of its features are implied in XHTML.

A Good Web Language

So what goes into a good Web language? An ideal Web development language would the following features:

  • strictness
  • standards driven
  • extensible
  • descriptive markup
  • powerful
  • simple
  • humanly comprehensible

A good Web language is a language that requires strict adherence to standards and good coding practices.

It should be easy to confirm that it is well-written, accurate, and complete. Web browsers should not have to second guess mistakes, but rather should be allowed to assume there are none and to be able to ignore what it doesn't understand.

This means that the language needs to be standards driven, not market driven. In order to achieve this, standards need to be flexible, or account for the need for flexibility. There must be a built in mechanism for modifying the language.

The language should be descriptive.

Structural markup languages assume humans reading a document in a visual medium. This does not account for people using other media, or computers trying to parse the document for a purpose other than screen display. Descriptive markup overcomes this by actually stating what the content is in the code, not just delimiting it based on its mode of presentation.

The language should be powerful but simple.

It should be easily comprehensible by both computers and people. SGML is very powerful, but very complex. HTML is very simple but not very powerful. Something in between is needed.

The solution to this set of goals was the development of XML, or Extensible Markup Language. It meets all of the above criteria except one. It is still a very complex language. On the other hand, since it is a meta-language, it can be used to define other markup languages. One of these is XHTML.

XHTML is not meant to address all of the above criteria. Rather it is meant to be an interim improvement on HTML that also serves as a transitional language to XML. It is also meant to be used as a tool when XML is overkill, such as in developing simple Web pages.

[top]