Kaye and Geoff's web page documentation 

Perl script to return a web page with links to web pages containing a phrase

For those who want to follow the nitty-gritty of the code, the comments explain what is happening. $pathn controls whether the whole site or just a sub-section will be searched and $maxnlevels controls the number of directory levels to go to (it is also used as an error flag) - this value could need incrementing if part of the site is expanded. If someone enters a search string containing one or more single quote characters, they will be removed because the ultimate shell command must have the search string enclosed in quotes. There must be a fix for this but we cannot think of a straight-forward and universal one - it seems that back-slashing does not provide a solution.

#! /usr/bin/perl # # package: substitute escaped characters and single quotes # require 'Re_sub.pm'; # # cgi to search web pages # simple html to use this cgi would be something like... # # <form action="/cgi-bin/search2.pl" method="post"> # <input type="text" name="sstring" size="80" value="fred"> # <input type="text" name="pathn" size="1" value="0"> # <input type="submit" value="search"> # </form> # # write the HTML header code # print "Content-type: text/html\n\n"; print '<html> <head> <title>KG search results</title> </head>'; print "\n"; print '<body bgcolor="#ffffff"> <blockquote>'; print "\n\n"; # # set up a few defaults # base path # extension, base URL # maximum number of levels to search, maximum number of matches to display # $path = '/home/kg/www/'; $ext = '.html'; $url = 'http://www.kgweb.org.au/'; $maxnlevels = 5; $maxnmatches = 20; # # get the path code and string from standard input # it has the form name=value&name=value # convert escaped characters back to their originals # split input string into name=value[s] and pick out two required fields # $paramstr = <STDIN>; foreach $item (split(/&/, $paramstr)) { ($name, $value) = split (/=/, $item); $value = &resub ($value); if ($name eq "sstring") {$sstring = $value}; if ($name eq "pathn") {$pathn = $value}; } # # set up the encoded path if required - the default is the whole site # using a code combines flexibility with hard-coding therefore security # if ($pathn == 1) { $path = $path . 'travel/'; $url = $url . 'travel/'; } if ($pathn == 2) { $path = $path . 'food/'; $url = $url . 'food/' } if ($pathn == 3) { $path = $path . 'kandg/'; $url = $url . 'kandg/' } if ($pathn == 4) { $path = $path . 'docs/'; $url = $url . 'docs/' } if ($pathn == 5) { $path = $path . 'froggy/'; $url = $url . 'froggy/' } # # suppress quote characters which would otherwise annoy the shell # check that the search string is substantial # set up a grep command to find files containing the search string # switches: i = ignore case # l = return filename (only if match) # F = fixed string (no special characters) # the 2>&1 assigns standard error to standard output # $sstring =~ s/'//g; unless ((length($sstring) > 0) && ($sstring =~ /\w/)) { $maxnlevels = -1; } $cmd = "grep -liF '$sstring' $path*$ext 2>&1"; # # do some more HTML # print "<br><br><br> \n"; print "<b>Results</b> of search for '$sstring': \n"; print "<p> \n"; # # initialise the number of levels to search counter # initialise the matched files counter # $nlevels = 0; $nmatches = 0; # # do each level in turn # do each matching file in turn # grep or shell error messages start with "grep: or bash:" # ignoring any directory or file containing "private" allows for some security # while ($nlevels < $maxnlevels) { foreach $mfilename (`$cmd`) { next if $nmatches > $maxnmatches; next if ($mfilename =~ /grep:|bash:/); next if ($mfilename =~ /private/i); chop $mfilename; # # get the title from the file if there is one # $line = '1'; $title = 'Untitled'; unless (open (MFILE, "$mfilename")) { $line = '0'; } while ($line) { $line = <MFILE>; if ($line =~ /\<title\>/i) { ($a, $title, $b) = split (/\<.?title\>/i, $line); $line = '0'; } } close (MFILE); # # turn the path into a URL - preserves current search depth # output HTML for a link to the file # $mfilename =~ s/$path/$url/; print '<a href="' . $mfilename . '"> ' . $title . '</a><br>' . "\n"; $nmatches++; } # # set the grep command to work at the next level down # count the level # $cmd =~ s#\*#\*/\*#; $nlevels++; } # # finish off the html # print "<p>\n"; if ($maxnlevels < 0) { print "Invalid search string\n"; } else { print "$nmatches matches found\n"; } print '</blockquote> </body> </html>'; print "\n"; # # open the log file # get the date and encode invalid search string # append the information from this search to the log file # if (open (LOGFILE, ">>/home/kg/docs/search2.log")) { $d = `date`; chop ($d); $s = substr('maintravfoodkangdocsgame', $pathn*4, 4); if ($maxnlevels < 0) { $nmatches = -9; } print LOGFILE "$d $ENV{'REMOTE_HOST'} $ENV{'HTTP_USER_AGENT'} "; print LOGFILE " \[$s\] $sstring $nmatches\n"; close (LOGFILE); } exit;
Top
Close