ENG/RUS   Main :: RiSearch :: RiSearch Pro :: RiCoord :: RiMap :: RiSearch PHP :: RiLax :: Forum

Introduction :: Manual :: Order :: License

Main

RiSearch Pro v.3.2 Manual

© S. Tarasov

Multilanguage support

      Script has several features, which can be usefull for indexing documents in languages other then English.

Local characters

      If your language uses non-latin characters, script will have difficulty in converting words into lowercase. You have to tell to script which characters in your language are capital characters, and which are small characters. Use "CAP_LETTERS" and "LOW_LETTERS" configuration parameters for this task.

# Capital letters
CAP_LETTERS => 'ÄÖÜ',

# Lower case letters
LOW_LETTERS => 'äöü',

      You can also use characters codes in HEX format:

# Capital letters
CAP_LETTERS => '\xC0-\xDF\xA8',

# Lower case letters
LOW_LETTERS => '\xE0-\xFF\xB8',

      With defaul setting script will work with russian "Windows-1251" encoging and with many European languages, however small tuning may be required for some languages. For example in German language there is no "big/small" variants for letter "ß", so your config should look like:

# Capital letters
CAP_LETTERS => 'ÄÖÜß',

# Lower case letters
LOW_LETTERS => 'äöüß',

Characters translation

      Sometimes local characters in Web are replaced by combination of latin characters (for example "ae" is used instead of "ä"). Therefore the same word can have two different spelling, like "Ägypten" and "Aegypten". Characters translation function allows to find both variants, when user ask for "Ägypten" or "Aegypten". First you have to turn this function ON - "allow_characters_translation => 1". Then create translation rules. Below is example for German language, other languages can be used in similar way:

translation_rules => {

    'ä' => 'ae',
    'ö' => 'oe',
    'ü' => 'ue',
    'ß' => 'ss',

},


http://risearch.org S.Tarasov, © 2000-2003