| [main] [misc] [graphics] [page design] [site design] [xhtml] [css] [xml] [xsl] [schema] [javascript] [php] [mysql] | |
Note that all external links will open up in a separate window. This is a stripped down version of these pages for older browsers. These pages are really meant to be viewed in a standards compliant browser. |
What is HTML?These tutorials are about XHTML, the Extensible Hypertext Markup Language. HTML is a Markup Languagemarkup: an encoding system embedded directly in a document to indicate how that document should be formatted HTML stands for hypertext markup language. In other words, it is a markup language for writing hypertext documents. We'll talk about hypertext a little later, but a markup language is an encoding system used to indicate how a document should be formatted. Even before computers, editors had markup languages that were pretty much standard across the publishing industry. They would mark whether to bold-face or italicize some text, how the text should be aligned on the page, how the page was to be laid out. With the advent of computers, this methodology was a natural way to adapt computers to the task of printing formatted documents. A document marked up with HTML contains commands in the body of the text that tell the computer to format the text and other features of the document in a certain way. HTML is a Set of StandardsHTML is a very forgiving language, but it is a good idea to be exacting in writing an HTML document. Well written documents are easier to maintain and contain fewer possible errors. How HTML is supposed to work is determined by a set of standards. In computing terms, a standard is how someone should implement programs, or in the case of Web pages, how someone should use the mark-up language in question. HTML is a mixture of standards and extensions. Extensions are special features added to the language by Web browser companies for use specifically in their own Web browsers. The intent of extensions is to make a particular Web browser do more interesting things than the competition so that people will want to write Web pages for one browser at the expense of another, thereby forcing viewers to get the software if they want to view the page. extension: a vendor specific addition to the standards. Extensions are usually proprietary and should be avoided if possible Extensions have been great for the purpose of advancing the technology, but not so good in terms of promoting usability and compatability. What makes for a good networked technology is broad platform compatability. But extensions to date have been proprietary. Good networked technologies are not about the toys that come with them, no matter how nice they are. This is why many companies stick with their old, clunky computers far longer than might seem prudent from an outside perspective. It took enough work already to get everything standardized and talking to everything else. Why waste money by trying to change it? The point of standards is to try to make something as universally accessible and usable as possible. They also help to ensure that things work correctly. W3C: the World Wide Web Consortium is an organization formed of interested parties who review and set standards for protocols and programming languages for use on the World Wide Web HTML standards are set by the [World Wide Web Consortium], or W3C. W3C is a governing body composed of dues paying members most of whom represent large corporation with a vested interest in the development of HTML and the World Wide Web. HTML has gone through numerous generations in terms of the standards which define the language. Roughly every two years, the standards for the language have been completely rethought. Here is a short history of the key points. HTML 2.0The first definitive HTML standard for general consumption was HTML 2.0 (which was HTML 1.0 with all the bugs worked out). It was a simple, straightforward standard with few of the advanced features we find being implemented on the Web these days. It stayed true to the notion of the Word Wide Web as a collection of interlinked documents and of HTML as a means of formatting documents for online reading. HTML 3.2While W3C began work on HTML 3.0, a new player began to dominate the scene. Netscape tried to take the bull by the horns and develop their own set of standards. Equipped with lots of neat add-ons so that everyone would want what they were offering, they succeeded. The additional features added to the Netscape implementation of HTML were enough to make Netscape the browser of choice in the mid-90s. Rather than side with one browser, the W3C tabled the new HTML 3.0 guidelines. Eventually, the W3C released the HTML 3.2 standard, which was an acknowledgment of the Netscape browser extensions as the de facto standard. HTML 4.0It was about this time that Bill Gates realized that Microsoft needed to take a more active interest in the World Wide Web. On top of beginning to invest a great deal of effort in improving the Microsoft Internet Explorer, the company also set about trying to prove what a large corporation could achieve when it tried to apply the same sort of leverage to the W3C as Netscape did previously. The result was HTML 4.0. deprecated: no longer recommended for use. Not cancelled or removed from existence, just no longer officially supported In spite of being a coup for Microsoft, HTML 4.0 is much more of a proper standard than HTML 3.2. HTML 4.0 rolled back some of the extensions that had been part of the HTML 3.2 standard. Many features of the HTML 3.2 standard are now deprecated. This means that those features should still be supported in new browsers, but their use is not recommended since they are being phased out. HTML 4.01HTML 4.01 is the most recent HTML standard. It includes some subtle but significant changes from HTML 4.0. While HTML 4.0 supported Cascading Style Sheets (CSS), HTML 4.01 recommends that they be used to the exclusion of other formatting methods whenever possible. The reason it is only a recommendation is that to require it would cease to allow backwards compatibility of new pages. This means they would not be properly viewable on older browsers. These pages are an excellent example. If you look at them in Netscape 4.x or some other browser not up to the current standards, much of the formatting will disappear. In fact, in order to use stylesheets as extensively as I do, the server writes out an different page for older browsers with older formatting methods those browsers will understand. More and more Web developers using the HTML 4.01 standard. But it is by no means universal. Even many of the currently available HTML editors that write the code for you write HTML 3.2 code, not 4.01 code. Perhaps by the time you read this, this will have changed. Unlike previous standards, HTML 4.01 serves more to recommended a direction of development rather than a standard of current procedure. That direction is toward the implementation of a XML-based Web content architecture. We will talk about XML later. Microsoft has put a great deal of effort into ensuring that Explorer 5.x is HTML 4.01 compliant. Netscape Navigator 4.x, on the other hand, is stubbornly non-compliant, and many of the interesting things one can do with the 4.01 standard will not work in Netscape 4.x. The newer version of the Netscape browser, Netscape 6, is compliant with all new standards. My browser of choice for compliance with current standards is either [Opera] or [Mozilla]. Opera has a fair amount of staffing overlap with W3C and makes a point of striving for full compliance. Mozilla is an open source project upon which the new version of Netscape is based. XHTML 1.0If you look at the HTML 4.01 standard online, it is a couple hundred pages long. The XHTML 1.0 standard on the other hand is very brief. The reason for this is that XHTML is the redefinition of HTML 4.01 in XML. The only things that are specified in this standard are what changes need to be made to the HTML 4.01 standard to transform it into XHTML. Unlike the other standards, which are comprehensive, this is more an addendum to the most recent HTML standard. Most newer browsers are fully XHTML compliant. Most newer browser are actually XML 1.0 compliant. We talk about XML in a different tutorial. Currently, the standards for Web development include:
XHTML 1.1There is also a new version of the XHTML standard in the works which seeks to modularize XHTML. Currently XHTML is an all or nothing deal. By modularizing it, you can just use the pieces you need. This is to bring it in line with the way in which XML is designed to work. Since using XHTML 1.1 requires a solid understanding of how XML works, and is really an issue of how the software that processes the code is implemented, and not something you need to worry about yet. HTML is SGMLSGML: the Standard Generalized Markup Language is the definitive markup language for computing The Standard Generalized Markup Language, or SGML, is a standard document markup language that defines nearly every known way of marking up text for the computer. It allows documents to be marked up for display in most any language and most any medium. In the 1980s, it was the way in which documents were written in most any large scale situation where documents needed to be shared across multiple platforms and multiple media. In the late 1980s, A man by the name of Tim Berners-Lee came up with an idea of combining it with a program he created called Enquire to adapt it for use as a means of marking up scientific documents for sharing over networks. Enquire: a program created by Tim Berners Lee that allowed him to track relationships between documents. It was similar to [Hypercard], a language developed by [Apple] for creating interactive help menus SGML is a very complicated language. It is difficult to learn. It also requires a fair amount of processing power and expensive programs meant for multi-user networks in order to use it. Tim Berners-Lee realized that he only needed a small piece of the SGML standard to do what he wanted. In fact he wanted to create something as easy to use as possible so that he and his scientist buddies could focus on the content of the documents, not writing the code to make them display properly. The result of his work was HTML. HTML is a subset of SGML. It is a portion developed for creating electronic documents for online viewing. By using only a small subset of SGML in creating HTML, there is less to learn so it is easier for people to use. It also takes less time, memory, and disk space for a computer to process HTML. meta-language: a language that is used to describe or define other languages To understand the relationship between SGML and HTML it helps to understand a little about SGML. As noted SGML is a markup language. This means that it is used to mark up documents for viewing in some medium. SGML is also a meta-language. Not only can it be used to mark up documents, but it can also be used to define other markup languages. Meta, in this context means something that is self-referential, or self-defining. Meta-languages are easy to understand. If I say, "the cat is on the chair", then I am using language. If I say, "in the sentence 'the cat is on the chair', 'cat' and 'chair' are nouns", then I am using meta-language. I am using a set of terminology that describes the language I am using. In this case, "sentence" and "noun", are words in our language that describe something about the language itself. They are part of a meta-language we refer to as English grammar. One of the languages thus defined through SGML is HTML. When a markup language is defined through SGML, it is said to be an application of SGML. HTML is one application of SGML. Document Type Definition (DTD): a document or block of code that tells a browser or other piece of software how to process a document marked up with a given SGML-related markup language The way in which SGML is used to define another language is through the creation of a Document Type Definition, or DTD. A Document Type Definition can be seen as a giant look-up table that tells a browser or other user agent how a document can be structured, what can be in it and, and what order things have to occur in. In the case of HTML, the DTD specifies what the valid HTML tags are, what order, if any, they have to occur in, and what contents they can and cannot have. user agent: software that allows the user to interact with the computer to perform some task, such as surf the Web A Web browser is a type of user agent, in this case an on-screen viewer, with an SGML DTD for HTML included with the progam. It is because of this built-in DTD that browsers know how to interpret and present a document. Every SGML document requires a DTD for the user agent to interpret it. For HTML, you have software with that DTD built right in. Later on, we will find that XML also requires a DTD, but since one is not built into the Web browser for XML, you have to provide your own.
If you want to look at the DTD for HTML 4.0, you can find the definitive version at:
These pages can be found at:
[http://academ.hvcc.edu/~kantopet/]
|