|
|
| ENG/RUS Main :: RiSearch :: RiSearch Pro :: RiCoord :: RiMap :: RiSearch PHP :: RiLax :: Forum |
|
|
|
|
| Introduction :: Manual :: Order :: License |
|
|
Main
RiSearch Pro v.3.2 Manual© S. TarasovMultilanguage supportScript has several features, which can be usefull for indexing documents in languages other then English. Local charactersIf your language uses non-latin characters, script will have difficulty in converting words into lowercase. You have to tell to script which characters in your language are capital characters, and which are small characters. Use "CAP_LETTERS" and "LOW_LETTERS" configuration parameters for this task. # Capital letters CAP_LETTERS => 'ÄÖÜ', # Lower case letters LOW_LETTERS => 'äöü', You can also use characters codes in HEX format: # Capital letters CAP_LETTERS => '\xC0-\xDF\xA8', # Lower case letters LOW_LETTERS => '\xE0-\xFF\xB8', With defaul setting script will work with russian "Windows-1251" encoging and with many European languages, however small tuning may be required for some languages. For example in German language there is no "big/small" variants for letter "ß", so your config should look like: # Capital letters CAP_LETTERS => 'ÄÖÜß', # Lower case letters LOW_LETTERS => 'äöüß', Characters translationSometimes local characters in Web are replaced by combination of latin characters (for example "ae" is used instead of "ä"). Therefore the same word can have two different spelling, like "Ägypten" and "Aegypten". Characters translation function allows to find both variants, when user ask for "Ägypten" or "Aegypten". First you have to turn this function ON - "allow_characters_translation => 1". Then create translation rules. Below is example for German language, other languages can be used in similar way:
translation_rules => {
'ä' => 'ae',
'ö' => 'oe',
'ü' => 'ue',
'ß' => 'ss',
},
|
| http://risearch.org | S.Tarasov, © 2000-2003 |