|
Kaye and Geoff's web page documentation
IntroductionThis is the page which collects together bits of HTML and a few miscellaneous subjects which have not been mentioned or covered in enough detail elsewhere. Many of the HTML tags described here are used to achieve formatting effects and so may be officially frowned on in favour of the use of style sheets, but we have already outlined our reservations about CSS. Not only are style sheets often implemented inconsistently and incompletely, but in many cases in-line tags are just easier to use and interpret. This motley crew of tags begins with one which has been mentioned in passing but not discussed in any detail...
The <font> tagText enclosed within <font>...</font> tags is modified as specified by the tag's attributes. These may be any combination of:
The <font> tag must be used with care. Setting the text size to 1 can make it so small that some fonts on some computers "break up" because not enough pixels are allocated to completely draw every character. Using the <small>...</small> tag is much safer. When setting text colours make sure that the background does not "hide" the text because it is a similar shade or brightness. And avoid red text on green backgrounds or vice versa - nearly 10% of the world's males suffer from red-green colour blindness (and it usually looks awful anyway). Remember that you cannot know what fonts are available on any computer displaying your page, so you must not rely on the "face" attribute for any important effect. Browsers typically allow the viewer to override "face" and other font attributes, and if the browser cannot find any of the specified fonts then it uses its default. Here are some examples showing the effect of the <font> tag with your current browser settings:
There is also a <basefont> tag, which is used without a terminating tag. It sets the default font size of the text on the page and from which relative font sizes (using the size="+n" or size="-n" attribute in a <font> tag) are calculated. For example:
This tag should not be used. It is preferable to leave viewers to fix the basic size of text they prefer in their browser settings; your page design should be flexible enough to accommodate the resultant variations in text size. For this reason <basefont> is not in modern versions of standard HTML and some browsers ignore it (which provides an even better reason for not using it).
More text characteristicsThere are a series of tags which (like bold and underline, which we have already discussed) change their enclosed text in a defined way. They do not need any discussion; the effect can be appreciated from the following examples:
<address>...</address> and <cite>...</cite> are designed to display addresses and citations, respectively. For example:
You may feel that you can format addresses and citations more appropriately than relying on the default behaviour of whichever browsers are used to display your web pages.
The <br> and <p> tags revisitedThe break and paragraph tags appear to be quite similar; the only difference in their behaviour might seem to be that the <p> inserts a blank line, and the <br> does not. They are so simple that they are among the first tags to be described in our documentation. But there is a subtle difference in how they behave. The <br> tag always forces a line break, so that if you place a sequence of them in your HTML then you will get a sequence of blank lines (just as you would expect). However, if you enter an unbroken sequence of <p> tags, you do not get a sequence of blank lines; you get just one normal paragraph break. In other words, the browser treats a series of <p> tags as though there is only one, echoing the way that white space is handled. The same rule is applied wherever the browser acts as though a paragraph break exists (for example immediately before and after <blockquote>...</blockquote> and <form>...</form> tags - although not all browsers treat these "block-level" elements in an identical way). So...
bbb is displayed exactly the same as... The <br> tag normally applies just to the text which preceeds and follows it, but it can be extended with the clear attribute to consider adjacent images. This attribute can have a value of "left", "right" or "all", for example:
The quick brown fox
jumped over the lazy dog The effect is for the text after the break ("jumped over the lazy dog") to start below whichever is the lowest of the "The quick brown fox" text or the image, like this:
Block definition tags<span>...</span> and <div>...</div> are tags which, on their own, do nothing. The difference between them is that <span> does nothing in-line whereas <div> does nothing for a block (ie. between new lines). You might wonder at the usefulness of such tags, but in fact they have two uses. The first is to define an arbitary block of text so that it can be used with DOM (document object model) functions and CSS. Manipulating the DOM generally requires a good knowledge of Javascript, so we will not consider it here. The second use depends on the inclusion of attributes with the tags. The most useful is the align attribute with the <div>...</div> tag which can take the same values ("left", "right" and "center") as we have seen in table cells and elsewhere; for example the following HTML has the same effect as the <center> tag: Block of text
InclusionsWe have already seen that images are held in separate files which are included in web pages with the <img> tag. Most browsers, usually with the help of plug-ins, can run programs written in the Java language. These programs are also held in external files, and can be invoked with the <applet>...</applet> tag. Music or other sounds can also be included in web pages, although different browsers use different ways to achieve this. Even other HTML documents can be included with yet another specialist tag - the <iframe>...</iframe> tag specifies an in-line frame to hold a web page in a window which is inserted into the current page in a very similar way to images. It became obvious to those responsible for setting HTML standards that in the future web pages might be required to handle even more forms of multimedia (maybe some not even invented yet), and that a uniform way of handling all inclusions from external files would be a Very Good Idea. So they came up with the generic <object>..</object> tag. Universal support for this tag for all multimedia has been slow in coming, but will presumably eventually be a reality. The use of the <object> tag can be illustrated by showing how it can be used as an alternative for the <img> tag: Notice that a combination of a data and a type attribute informs the browser where to find the external file and what type of file to expect, and thereby what to do with it. HTML already defines quite a few "types", all with the two-words-separated-by-a-slash format as shown in this example, and to handle a new multimedia format in the future we just need to give it a new unique type. These types, also called "mime types" or "content types", are also used in other contexts within HTML, giving this approach even more universality. The text between the tags is a description of the object - the equivalent of the alt value in an <img> tag. See if your browser can successfully deal with an image defined in this way. Do you see a green smiley? Cascading style sheetsCascading style sheets (CSS) are used to provide an overall template or set of rules for how page content can appear on the screen. They are used to specify fonts, colours, spacing and placing of text on the page, margins, and layered artwork. Style sheets are implemented somewhat differently in different blowsers, and between versions of the same browser (of course), providing another level of complication. In addition, they not supported by older versions of the browsers. Style sheets are part of the more recent W3C standard versions of HTML, so hopefully the incompatabilities will eventually be resolved. For what it is worth, our opinion is that style sheets, even if they were consistently interpreted by all browsers, are not as wonderful as some of their proponents suggest, particularly for smaller web sites and those with idiosyncratic design. They do not neccessarily separate content and appearance since content can be included in CSS and CSS can be included inside tags in the body section of HTML. The overhead imposed by large amounts of CSS can add significantly to download times, and the resultant HTML can get very dense and difficult to follow. Browsers generally display nothing until all the CSS information is available, so there can be a long wait during loading until anything appears on the screen. Style sheets seem to us to be more useful for large corporate sites which involve an imposed fixed style throughout, and where a large number of people (or a database) may have to make contributions and changes to the site. Extensive use of CSS seems to be much less advantageous for a personal web site which might only comprise a few pages, and is maintained by one person. In their favour, CSS allow you to create a single set of styles that specify detailed layout instructions for multiple HTML documents - you can set up an entire suite of related documents in the same style. By making a change to a single style document you would then automatically make the change to all documents that used that style sheet. Therefore, style sheets are particularly useful for web page developers who are setting up a large site with numerous documents using the same format, rather than for free-form page designers. In addition, there are many useful things which can be done only with CSS. So even while we generally avoid using them because we cannot predict what they will do with all browsers, we sometimes use CSS features which work on all the browsers we have access to, which cannot be achieved without CSS, and which will not render the page unreadable if they fail to work as expected. If you do use style sheets, you must be aware that any person viewing your page can have their own personal style sheets. If these clash with those you have included in your web pages, there can be only one winner, and by default it is the web page author. If this occurs then the viewer will not be well disposed towards you and your pages, alternatively your formatting will be overridden which may make a mess of your careful design. So you should be wary of setting out a page in such a way that viewing and understanding the content is dependent on the page layout. More information on CSS:
There is also software that will help in the creation of CSS: TopStyle - an HTML/CSS editor for Windows Controlling robots - the robots.txt fileThe name 'robots' here refers to 'spiders' or 'crawlers' or other similar programs which automatically trawl the web by downloading pages and following the links they find on them. The most obvious examples are search engines looking for pages to add to their indexes, but crawlers can also be used to collect email addresses to be used by spammers. Sometimes you would prefer that some of your pages were not indexed; they might be under development or temporary or not aimed at a general audience. You can ask web trawling robots to ignore these files by placing an appropriate entry in a robots.txt file. This is a simple text file, normally placed in the same directory as your index.html or equivalent file. If the files you did not want robots to see were all in a sub-directory called 'private' (it helps to collect them all into one or a limited number of directories) then the contents of the robots.txt file would look like this:
User-agent: *
Disallow: /private/ A user-agent of '*' means all robots and the second line is clear enough - it says that I do not allow you into the directory called 'private'. Of course, just like the related robots metatag (see The Head), you cannot force robots to obey this directive, but most legitimate search engines probably do (and spammers don't). If you want serious control and security then you need (on Unix servers) to investigate the .htaccess and .htpasswd files, but we will not deal with them here. Robots.txt files do not allow a great deal more sophistication than illustrated in the example above, but to get the complete syntax you can try the following sites: Dynamic HTMLDynamic HTML (DHTML) is a blanket term that covers HTML-aware technologies designed to give web pages more dynamic behaviour, improving interaction with the viewer . These enhancements are achieved by combining techniques and concepts such as Javascript and the document object model (DOM). This area presents a challenge. It provides the opportunity to create interesting and dynamic web pages but sometimes at the expense of universal accessability. The effect of many DHTML techniques differs from machine to machine, depending on the browser and host combination used. Some techniques are yet to be accepted by the official standards authorities, and others are restricted to one browser or platform. If you decide to use dynamic HTML techniques on your pages, you need to make sure that your target audience can take advantage of the enhancements. As time progresses and standards evolve DHTML should become more mainstream. In general using dynamic HTML involves programming and so is considerably more difficult to implement than straightforward HTML. Here are some concepts and technologies which contribute to DHTML: The document object model describes a branching data structure which contains every single element in the web page, allowing, after the page has been rendered, new elements to be added, existing elements to be deleted or even swapped around within the structure. The effects are seen immediately on the page. This manipulation is carried out by routines which are part of an updated Javascript. There are several levels of DOM. Level zero is the ad-hoc set of routines and properties which existed before the first W3C standard was proposed. DOM level 0 was implemented differently in different browsers, with only a limited subset of routines in common. To address this problem, around 1999 the W3C proposed the level 1 standard which has been adopted reasonably uniformly by most of the common browsers, although one major area of difference is in how events are processed. The latest (2007) standard is level 3 but it will take a while before we can expect most browsers to support this version. For reference information on the DOM try Javascript Kit or W3C's Document Object Model (Core) Level 1 Javascript is a programming language whose code can be (in fact almost always is) embedded within the HTML of a web page. It has no particular relationship with Java, which is a separate language. Javascript is interpreted by the browser, so it is run on the "client" computer rather than the server. Despite this it generates no serious security concerns since it is incapable of reading or writing to the local disk (with the exception of cookies, which is a very controlled process), or accessing any operating system parameters other than a limited set provided by the browser. AJAX (Asynchronous JavaScript And XML) is a series of Javascript routines which are designed to pass information between the web page and the web server. They use hypertext transfer protocol (HTTP) which is how the server normally communicates with the browser. Returned information can be used to update part of the page without requiring the whole page to be reloaded and re-rendered. Most modern browsers support these routines but older versions do not. Java is a highly portable object-oriented language similar to C++. It was devised by Sun Microsystems, and allows programmers to produce compiled (ie. efficient) code that will run on any computer with Java support. The major web browsers all support Java applets, allowing programs to be downloaded by and run under the control of the web browser. Unlike Javascript, Java has the ability to read and write local files, which in the past led to security problems with Java applets on the web. Later versions of Java seem to have fixed these bugs, but browsers have the ability to disable Java, and many people worried about security take advantage of this feature. As a result they cannot view any page which relies on Java applets. JSP (Java Server Page) is a web page that contains java scripts that have to be interpreted by the server delivering the page, before it is downloaded to the browser. In other words, it contains Java, but that code is used by the server, not the browser. This is potentially safer than allowing the code to be run by the browser. ASP (Active Server Pages) is Microsoft's version of JSP, where VBScript code (ie. VisualBasic rather than Java) is embedded in the web page. The code is interpreted by Microsoft's script interpreter on the server before the result is delivered to the browser. This makes the pages dependant on having a Microsoft-compatible server, but despite this ASP are quite widely used. More recent (ASP.NET) implementations allow the code to be separated from the HTML, which means it can be compiled and so will run much faster. This reflects the way that CGIs are set up on Unix servers. ActiveX only works on PC versions of Internet Explorer - not Mac versions and not with other browsers. In our opinion this is enough reason not to use it. |
||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Top Previous Next Index Home |