|
Kaye and Geoff's web page documentation
IntroductionWeb pages written using only HTML have limited scope for dynamic behaviour. They can include various types of animated pictures (and sounds) and they allow links to other pages, email and ftp, but that is about all. A much wider range of interaction is possible with the inclusion of Javascript but since it only runs within the browser environment and does not have any general way of reading data or any ability to write to a file, it cannot create or use information which needs to persist between browsing sessions (except for the trivial case of cookies) and which is shared between all people who access the web page.To achieve this sort of interaction requires the data, and the program which uses it, to be stored on a computer which is always on and always connected to the internet. In addition, for security reasons the program needs to be accessable to web pages via a controlled interface which only allows the desired information to be passed in either direction. So the obvious place for such programs is on a computer on the internet running a web server (or a similar computer to which it is networked). Programs designed to work in this way are called Common Gateway Interfaces (CGIs). A CGI is a script or program which runs under the direction of the web server, and typically adds dynamic behaviour to web pages by accessing databases, doing calculations from inputs, selecting files and so on. Users normally provide input data to a CGI using a form written in HTML. The web server is responsible for passing this data to the CGI, which processes it and then passes information back to the browser telling it what page to display next (often the information is the actual HTML for a web page containing the results of the processing). Note that all CGIs must return something - the browser sends the data to the web server which invokes the CGI, and the server expects something in return which it can pass back to the browser. This document explains the way in which a web page passes data to a CGI and what the CGI might do with that data. There are actual examples of code, but they may not be suitable (without some modification) to use in real web pages since they have generally been kept simple by leaving out error checking and any sophisticated behaviours. CGIs have to be located in a special area on the server (traditionally the cgi-bin directory on Unix systems); they cannot just be in your normal HTML page area. If you are an author on someone else's server (for example an ISP) and you want to write CGIs, you should talk to the webmaster about getting approval to write and test your CGI. It also helps to ask about access to telnet capabilities, access to the web server error log, and if there are any restrictions (for example limiting access to operating system features) which the ISP applies. It is normally important to know what operating system the web server runs on, since this will limit the choice of languages which are available for you to use for your CGI, and also define the system interactions which your CGI can take advantage of. All our examples here presume a Unix server and are written in Perl. Versions of Perl are available for Mac and PC as well as Unix.
Interaction between a form and the CGIWhat is sent from a form to the server?The browser sends the form's contents to the server as a single string, with each field separated by an ampersand (&). Each field is of the form name=data. The name is the value of the name attribute in the HTML which defines the form. This can be made clear in the following example, where we have created a form as follows: It is up to the CGI to unpack the information in the string, and handle the information as required. In fact, the string may not be exactly as shown, since most of the "special characters" (such as &$() or space) are translated by the browser - they appear as different characters, or as hexadecimal numbers preceded by a percent sign. Have a look at the translation table, to see how special characters are translated. The easiest way to illustrate this is by example: complete the fields in the form and it will send back a copy of the data received on the server. You can try the form a number of times, putting in "special" characters (for example -+&%()[]) to see what happens to them (our tests suggest that the only non-alphanumeric characters less than ASCII code 127 which are not escaped are asterisk, hyphen, period and underscore; the rules say that all other characters should be escaped).
You can view the Perl script used to pass this information back to the client, and a Perl package which will substitute original characters for the escaped versions. The preceding example has the method attribute in the form tag set to "post". If the method attribute is "get" then the information is passed to the CGI not via standard input, but as an environment variable called "QUERY_STRING". For example with this form:
You can view the Perl script which is invoked by this form. The environment variable which carries the passed information is called "QUERY_STRING" because there is an alternative way of doing the same thing - by appending a query string to the URL in the action attribute of the form tag, for example:
There is yet another way of passing information into an environment variable. If the CGI is invoked with an action such as: It can be a bit "kludgy" transfering large amounts of information via an environment variable, so generally using method="post" is the preferred approach for passing information to a CGI. However the example below illustrates one simple application where the query string can be very useful - when a CGI is invoked without using a form. Here we want a CGI to be invoked from a pair of menu items, where each item produces a variant of the CGI's output possiblilities. View the Perl script and see how it works or try it out:
Example menu
Environment variablesWhen a browser 'converses' with a server, it must identify itself, and it may send parameters in the calling string. As a writer of CGIs you have access to the information which the server knows about the browser (for example the browser type and the IP address of its server or proxy). This information is passed in (Unix) environment variables which can be accessed by your CGI. Again, this is easy to illustrate by using a CGI to return the environment variables. You can use the form below (whose only active element is a submit button) to look at the environment variables:
The HTML for the form is straightforward: and you can view the perl script which is used to pass the environment variable values back to the browser.
More complex CGI examplesThe examples given above are fairly trivial, but CGIs can perform a wide range of complex tasks. Unlike static HTML and even Javascript, CGIs can read and write files and have access the full power of the server operating system. They also offer a level of code and data security not available in Javascript, which is always viewable from the browser.The following examples illustrate some of the power of CGIs. The first is a script to email the contents of a form. Email processor Of course it is possible to just invoke the mail system using a "mailto:" value for the action attribute within a form, for example:
Browsers vary in how a "mailto" is handled. Some include the message as text in the mail, and some even manage to format it to some extent, but others just send it "raw" with all the escaped characters (as explained above) included. Others respond to the "mailto" by starting up a local copy of a mail program without sending anything immediately. If you want to know how your browser will act under these circumstances, the easiest way to find out is to try it and see, but remember that others using your web pages may have a different browser. If you want the behaviour of your web page to be predictable under these circumstances, you can pass the contents of the form to a CGI which then processes the information and emails the result to the desired recipient. This allows full control of the process - for example the input fields can be checked to ensure that all required fields are filled in and the information can be reformatted to make it easy to read. More sophisticated processing can also be carried out, such as redirecting the email depending on the contents of the message, sending it to more than one recipient, saving the contents in a database, and so on. The CGI will be more useful if it exhibits some "general" behaviour so that it can be used with many different forms. The names of the form elements can be used to indicate required fields and the CGI can reject input which does not have these fields filled in. In the following form which uses the femail CGI the email address is required (indicated by the "req_" prefix to its name). We have also 'hidden' the email address from harvesters which would like to add us to their spam lists:
There are any number of possible improvements which could be made to this simple system. For example the return and error pages are not very pretty; they could be improved, or even combined with the page containing the form (with some query strings and a bit of Javascript). The form fields could be checked before submission with Javascript to save on server-side processing and net traffic. In the same way that some fields are defined as 'required', we could use names beginning with 'num_' to insist that they contain a number (for example the "lucky number" field above), and other similar types of validation could be included. Guestbook The web is an excellent way to get feedback on the services and information you offer. One of the features commonly found on web sites is a guestbook, where visitors can register their comments. This is the sort of application where you probably do not want the information emailed to you immediately - it is enough to check out the guestbook from time to time, to see what comments have been made. The example outlined here is rather simple; a more complex (and realistic) version might, for example, check the content for offensive words or attempts to breach security, present a more appealing layout and allow you to archive out-of-date entries. You can try out the example but please do not try to use it to enter active links to your own site - that is not what it is for (and it will not work). We assume that the guestbook will be made available via a webpage to anyone who wants to look at it, and anyone will be able to contribute comments, so two CGIs are required - one to add an entry, and one to display the existing entries. It is easiest to invoke each one from its own form, but both forms can be on the same web page, for example: Guestbook
Here is the form as it looks on the web page. Note the extra feature: the option of specifying the number of entries to display. You can try it out.
Example guestbook
As long as we are happy with a straightforward page layout, the CGI to read the guestbook is very simple, since it can take advantage of Perl's access to Unix system calls. The writing CGI is a bit more complex, but we can keep it reasonably simple by keeping the HTML for the guestbook in three files: an unchanging header, a central section containing entries which we add to by appending new entries on the end, and an unchanging footer. In a 'serious' system you might also have one or more 'private' CGIs to delete or archive entries. Site searching Many web sites (including ours) allow users to search the site for a keyword or phrase, or a more complex arrangement of words. This facility is a very powerful method of providing information about the site and allowing rapid navigation to the areas of interest. There are a number of ways to implement site searching, but here we will look at the most flexible - writing your own CGI to do the task. To illustrate the basic requirements, we start with a simple example - to search a single page and display lines containing a given word. The form needs to provide the name of the page to be searched, an input box to accept the word, and a button to submit the form, for example:
Search for a wordThe CGI which the form invokes is written in Perl. You can try it here: This search is not in fact very useful - it only (rather poorly) duplicates a feature found in most browsers. So our second example is more complex - in fact it is very close to the script we use to implement searching on our site. It is more useful in that it returns a page of active links to pages containing the search word. The word can be a phrase (it can contain spaces) and the search can be limited to a subset of pages - this makes sense with a site like ours which is comprised of a number of more-or-less unrelated sub-sites. There is no need to provide a working example of this search - it is at the top of each of our major pages. The HTML for the form looks like this:
Because our search feature only has a small space at the top of the page, it does not allow sophisticated search rules - whatever is entered is what is searched for (multiple words separated by spaces are treated as a phrase rather than separate key words). Also, it is not suitable as a general searching script - it expects our particular site structure and names, although it could be modified to work with a different structure. You can try it out by going to any of the major pages, for example the home page.
Problems: creating, testing and debugging your codeThe complexity of CGIs and the interactions between them and web pages means that getting it all working is much more challenging than writing web pages in HTML.If you want to create CGI scripts, you should talk to the webmaster who looks after your server first. Your ISP may not allow user-written CGIs, and even if they do, they may want to closely examine anything you do before it is allowed on their server. This is because CGIs run under the control of the web server, which usually has more privileges than normal users are allowed, so there will always be system security concerns with CGIs. One precaution commonly applied is illustrated in our resub package which removes any backquote characters - under some conditions these can be used to invoke Unix commands from the information passed to the CGI. You might also like to investigate Perl's 'taint' mode. Detailed instruction on programming in Perl or other languages is well beyond the scope of these pages, but here are some hints on writing and debugging your CGI scripts:
|
|||||||
|
Top Previous Index Home |