19 SGML reference information for
HTML
The following sections contain the formal SGML definition of HTML 4. It
includes the SGML declaration, the Document Type Definition (DTD), and the
Character entity references, as well as a sample SGML
catalog.
These files are also available in ASCII format as listed below:
- Default DTD:
- strict.dtd
- Transitional DTD:
- loose.dtd
- Frameset DTD:
- frameset.dtd
- SGML declaration:
- HTML4.decl
- Entity definition files:
- HTMLspecial.ent
HTMLsymbol.ent
HTMLlat1.ent
- A sample catalog:
- HTML4.cat
Many authors rely on a limited set of browsers to check on the documents
they produce, assuming that if the browsers can render their documents they are
valid. Unfortunately, this is a very ineffective means of verifying a
document's validity precisely because browsers are designed to cope with
invalid documents by rendering them as well as they can to avoid frustrating
users.
For better validation, you should check your document against an SGML parser
such as nsgmls (see
[SP]), to verify that HTML documents conform to the HTML 4 DTD. If the document type declaration of your
document includes a URI and your SGML parser supports this type of system
identifier, it will get the DTD directly. Otherwise you can use the following
sample SGML catalog. It assumes that the DTD has been saved as the file
"strict.dtd" and that the entities are in the files "HTMLlat1.ent",
"HTMLsymbol.ent" and "HTMLspecial.ent". In any case, make sure your SGML parser
is capable of handling [ISO10646]. See
your validation tool documentation for further details.
Beware that such validation, although useful and highly recommended, does
not guarantee that a document fully conforms to the HTML 4 specification. This
is because an SGML parser relies solely on the given SGML DTD which does not
express all aspects of a valid HTML 4 document. Specifically, an SGML parser
ensures that the syntax, the structure, the list of elements, and their
attributes are valid. But for instance, it cannot catch errors such as setting
the width attribute of an
IMG element to an invalid value (i.e., "foo" or "12.5"). Although
the specification restricts the value for this attribute to an "integer
representing a length in pixels," the DTD only defines it to be CDATA, which actually allows any value. Only a
specialized program could capture the complete specification of HTML 4.
Nevertheless, this type of validation is still highly recommended since it
permits the detection of a large set of errors that make documents invalid.
This catalog includes the override directive to ensure that processing
software such as nsgmls uses public identifiers in preference to system
identifiers. This means that users do not have to be connected to the Web when
retrieving URI-based system identifiers.
OVERRIDE YES
PUBLIC "-//W3C//DTD HTML 4.01//EN" strict.dtd
PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" loose.dtd
PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" frameset.dtd
PUBLIC "-//W3C//ENTITIES Latin1//EN//HTML" HTMLlat1.ent
PUBLIC "-//W3C//ENTITIES Special//EN//HTML" HTMLspecial.ent
PUBLIC "-//W3C//ENTITIES Symbols//EN//HTML" HTMLsymbol.ent
|