|
| |
|
The
ICE Search Engine
NOTE: If you have Microsoft® FrontPage® account:
FrontPage 2000 has it's own built-in Search engine . If you have a FrontPage account, you should use the
Web Bots included in Microsoft FrontPage. Please go to our FrontPage
Tips page for more information
QuickJump
Directory
About
ICE
The
ICE Search Engine allows users to search your web server by keywords. You
can easily configure an option to search only specific directories instead
of the whole server. ICE is an index based search engine; every time you
upload new pages you'll want to update the index it keeps.
The
ICE
Search Engine is free software for individuals, schools, and universities.
If yours is a commercial server, there may be a small shareware fee. Please
see the author's
home page for details. Note: As of October 1998 the authors website
cannot be found at this address any longer.
To
see it in action, go here.
You
can take a sneak peek at the scripts using your web browser.
View
ice-form.pl
View
ice-idx.pl
Tools
needed
How
to Install the ICE Search Engine
-
Download
a copy of the two scripts you will need.
ice-form.pl
ice-idx.pl
-
Edit
the CGI script (" ice-form.pl ") with a Text Editor. There are 3 things
you must change.
$domain="DOMAIN";
$userid="USERID";
$websitename="WebSite";
$bodytag=" <BODY> ";
-
In
the $websitename variable, put the name you use for your site. This
is descriptive only. For instance if your domain name is gadgets.com but
your company is called Gadgets Limited, then you would replace WebSite
with "Gadgets Limited"
-
In
the $userid variable, replace USERID with your UserID, webxxxxx,
e.g. web2011f
-
In
the $domain variable, replace DOMAIN with your domain name,
e.g. domain.com
-
Optional:
Replace <BODY> with your own BODY tag of your web site. For instance,
if your web site uses a background image, you would replace <BODY> with
<BODY BACKGROUND="imagename.jpg">
However,
there are few restrictions. You must use full paths, and you must
put a backslash ( \ ) before every quote.
Example:
<BODY
BACKGROUND=\"http://domain.com/images/image.jpg\">
-
Edit
the CGI script ("ice-idx.pl") with a Text Editor. There is only one thing
you need to change.
As
above, just replace USERID with your UserID in $userid="USERID";
-
FTP
to your Virtual Server. Once connected to your web site directory (/usr/local/www/data/UserID),
create a subdirectory called "cgi-bin" (minus the quotes of course!).
You
should now have a directory called: /usr/local/www/data/UserID/cgi-bin
-
Upload
the edited CGI scripts "ice-form.pl" and "ice-idx.pl" to this directory.
IMPORTANT
99%
of all script problems occur when you do not upload the files in 'ASCII/Text'
format!
These
script files MUST be uploaded in ASCII/TEXT format. "RAW DATA" transfers
or other types of transfer extensions will not work! "Auto" mode on WS_FTP
will not do this for you. Make sure the "Auto" box is UNCHECKED and the
ASCII button is CHECKED.
More
help is vailable over FTP here.
**********
-
Telnet
to your "/usr/local/www/data/UserID/cgi-bin" directory.
-
Set
the permissions for the "ice-form.pl" and "ice-idx.pl" by typing in the
following commands at the Telnet prompt:
cd
website/cgi-bin
chmod
755 ice-form.pl
chmod
700 ice-idx.pl
-
THIS
IS IMPORTANT: type perl
./ice-idx.pl
This
creates the index file. Every time you upload new pages that you want to
be searchable, you'll want to Telnet in and type that command.
Now
type: logout
-
That's
it! Now, in your web pages you'll want to create a link to your new Search
Engine page. Put something like this in your web pages:
<a
href="http://DOMAIN/cgi-bin/USERID/ice-form.pl">Search Engine</a>
NOTE:
Replace DOMAIN with your own domain name and USERID with your own UserID!
To
test your Search Engine, upload your completed web page with the new link
into your normal Virtual Server web directory, and follow the link to the
Search page. Type in a keyword and hit the "Start" button and it will give
you links to all the pages where that word appears.
-
Top
of Page
-
im1
User Support Page
QuickStart
Instructions (for experienced users)
-
Shift-click
here to
download ice-form.pl
Shift-click
here to
download ice-idx.pl
-
Edit
in ASCII mode. For ice-form.pl, there are 4 variables that need to be changed;
for ice-idx.pl, there is one. It is very simple. The documentation
in the script will tell you what to do.
-
Upload
(in ASCII format) to your cgi-bin directory
-
Telnet
into your cgi-bin directory
-
Type
chmod 755 ice-form.pl and chmod 700 ice-idx.pl
to set the permissions.
-
Type
perl ice-idx.pl to create the index.
-
That's
it! Now, in your web pages you'll want to create a link to your new Search
Engine page. Put something like this in your web pages:
<a
href="http://DOMAIN/cgi-bin/USERID/ice-form.pl">Search Engine</a>
-
Top
of Page
-
Internet
Marketing 1 User Support Page
ICE
options and special notes
-
Default
settings:
-
ICE
excludes from the search words 3 letters or less.
-
If
a word appears in over 60 percent of your documents, ICE excludes
it from being searched.
-
To
Search Subdirectories Only: you can configure ICE to let you choose whether
to search the whole site, or a subdirectory. Edit (in ASCII!!) ice-form.pl
and look at this part:
# To Search Subdirectories Only
#
# To search subdirectories only, change the Directory Name to whatever name you want,
# and change "subdir" to your subdirectory. Make sure there is a slash on the end, and
# *not* at the front. Then, uncomment the code block below (delete the '#' from the
# beginning of the line.
#
# local(@directories)=(
# "Directory Name (subdir/)",
# ); It
should be self-explanatory.
-
To
return no more than a maximum number of hits:
# Maximum number of hits to return
# Example:
# $MAXHITS=100;
# Delete '#' from next line to use this
# $MAXHITS=100; Just
change the $MAXHITS variable to a number of your choice, and delete the
'#'
-
To
Exclude Directories from the search:
# ADVANCED USERS:
#
# To exclude directories from the search, put the full server paths of the
# subdirectories. NOTE: once you exclude a directory, all of *its* subdirectories
# are excluded. Just change the SUBDIR to your subdirectory and leave the rest alone.
#
# You must UNCOMMENT (remove the '#' from each line) for this to work.
@excludedirs=(
"/usr/local/www/data/$userid/cgi-bin",
# "/usr/local/www/data/$userid/SUBDIR",
# "/usr/local/www/data/$userid/SUBDIR",
# "/usr/local/www/data/$userid/SUBDIR",
); The cgi-bin directory is already excluded for security reasons. Do not change
this. You can exclude directories from being publicly searchable by changing
SUBDIR to the path of your subdirectory, and uncommenting the line. Add
more lines if you have multiple directories you wish to exclude.
-
Other
Options: in ice-idx.pl, there are options for international characters,
word length exclusion, and common word exclusion. These are self-explanatory:
# The ICE indexer will support full international characters by
# converting them to their html equivalent if $ISO is set.
# This has a slightly negative impact on the indexing speed, so
# set it to "y" only if you index files with 8 bit international
# characters. OTHERWISE DON'T! iso2html seems to cause a memory
# leak, causing the indexer to run forever. I'm working on it.
$ISO="n";
# Type of system (for figuring out the path delimiting character)
# that ice-idx.pl runs on. Select one of "UNIX", "MAC", or "PC"
$TYPE="UNIX";
# Minimum length of word to be indexed
$MINLEN=3;
# Stop indexing a word that appears in over X percent of all files
$MAXPERCENT=60;
-
Top
of Page
-
im1
User Support Page
|
|