ENG/RUS   Main :: RiSearch :: RiSearch Pro :: RiCoord :: RiMap :: RiSearch PHP :: RiLax :: Forum

Introduction :: Manual :: Order :: License

Main

RiLax Manual

© S. Tarasov

Configuration

      Edit file riconfig.pm to set several parameters. Most of them are self documented and does not require explanation.

  1.  $DB_NAME[NN] = 'filename';  - name of database file.

  2.  $record_separator[NN] = "\n";  - records delimiter in database (in most cases newline character).

  3.  $field_separator[NN] = "::";  - fields delimiter in records.

  4.  site_size => 2,  - this variable controls database size and searching speed.

  5.  compact_index => 1,  - in compact mode index will take less space, but it will be limited to 65535 records.

  6.  numbers => '0-9',  - during the indexing script removes all non alphabetic characters from page and index what is left. As alphabetic character script interprets Latin characters and characters of regional alphabet (will be discussed later). Here you may add other characters, which should be indexed (such as numbers, underscore sign and so on).

  7.  INDEXING_SCHEME => 2,  - words indexing scheme. If indexing scheme equal "1", index is build on the whole word base. Fastest method, but script will find only words equal to the keyword.

    When indexing scheme is "2", index is based on the beginning of each word. Script will find all words, which begin with given keyword. For example, for query "port" the words "portrait" and "portion" also will be found.

  8.  use_stop_words => "YES",  - list of common words, which should not be indexed.

  9.  min_length => 3,  - minimal word length for indeixing.

  10.  max_length => 32,  - maximal word length for indeixing (longer words will be truncated).

  11.  CAP_LETTERS => '\xC0-\xDF\xA8',  - Put here list of capital letters of your language (which are different from Latin). Do the same for small letters.

  12.  def_search_type => 1,  - Default search type. Possible values: 0 - substring search (can be used only with INDEXING_SCHEME => 2), 1 - exact word search.

  13.  def_search_mode => "AND",  - Default search mode. Possible values: "AND" or "OR".

Multiple database support

      Script can work with several databases simultaneously. For each database you have to set in configuration file datafile name, records and fields separator and which fields shoudl be indexed:

# Database filename
$DB_NAME[1] =  'zip_mt.db';
$DB_NAME[2] =  'zip_tn.db';
$DB_NAME[3] =  'zip_wi.db';

# Records separator in your database ("\n" for newline).
$record_separator[1] = "\n";
$record_separator[2] = "\n";
$record_separator[3] = "\n";

# Fields separator.
$field_separator[1] = ";";
$field_separator[2] = ";";
$field_separator[3] = ";";

# List here fields numbers, which should be indexed.
# Fields numbering starts from 0.
$index_fields[1] = [ qw(0 4 5 6 7 8 9)];
$index_fields[2] = [ qw(0 4 5 6 7 8 9)];
$index_fields[3] = [ qw(0 4 5 6 7 8 9)];

      Databases can have different fields number, but advanced search will work only when all databases have similar format. For each database you have to write separate "results_NN" section in template file with design for search results. By default script will search all databases, but you can limit search to one or several databases by adding "db" parameter to query: "...&db=1&db=3".

      You can also build separate index for every database. Run indexing script with additional parameter:

perl index.pl -d=2
and script will index only database number "2" and index will be stored in directory "db/db_2".

      To search this index use parameter "d" in search query:

http://www.server.com/cgi-bin/search.pl?q=query&d=2
Script can search in only one separate index in this case.

Results sorting

      If your database contains long text fields, search results can be sorted by relevance, like in full-text search system. First you have to turn this option "On" in configuration file - "allow_sort_by_rating => 1". Then choose relevance calculation algorithm. As relevance can be used number of words in given record (recommended for relatively short text fields). Or relevance can be calculated using algorithm similar to classic Tf*Idf algorithm (recommended for long texts).

      Script can take into account word frequency in index, so that rare words will get higher rating ("word_freq => 1").

      You can set different weight for fields, so that certain field will get higher rating:

$attr_weight[0] = 1;
$attr_weight[1] = 5;
$attr_weight[2] = 5;
$attr_weight[3] = 1;

      Results sorting can be improved by storing in index words positions in fields (word_dist => 1). This information can be used for phrase search and also close words will get higher rating during relevance calculation (weight_dist => 3).

      In order to sort results by relevance, add parameter "s=R" to query.

Results caching

      Results of search can be cached to minimize response time when displaying second page. Results will be cached only if large number of records was found or search took long time. Cache files will be stored in separate directory "cache". Script can erease old results itself or you can do it manualy.

      If "check_cache" option is set, script will use cache for every query, if this query was asked recently. Otherwise cache will be used only when displaying next pages with search results.

  1.  enable_cache => "YES",  - turn "On" or "Off" caching.

  2.  check_cache => "YES",  - use cache for every query.

  3.  min_doc_found => 1000,  - use cache if number of found documents is larger then specified here.

  4.  min_search_time => 0.2,  - use cache if search time is longer then specified here.

  5.  delete_cache => "YES",  - delete old results automatically.

  6.  delete_cache_delay => 3600,  - delete old results after NNN seconds.



http://risearch.org S.Tarasov, © 2000-2003