Supported languages and code pages
You can specify that the text documents be parsed using
a particular language when you first create a text search index. You
can also specify that the query terms be interpreted in a particular
language while searching. In addition, you can specify a code page
when you create a text search index on a binary data type column.
Language specification
A locale is a combination of language and territory (region or country) information
and is represented by a five-character locale code. You define the
message locale for a text search administration procedure by passing
the procedure the locale code. Refinements of these locale codes are
possible depending on the locales installed on the DB2® server.
There is an important difference
between specifying a language when you create a text search index
and specifying a language when you issue a search query:
- The locale that you specify in your db2ts CREATE INDEX command determines the language used to tokenize or analyze documents for indexing. If you know that all documents in the column to be indexed use a specific language, specify the applicable locale when you create the text search index. If you do not specify a locale, the database territory will be used to determine the default setting for LANGUAGE. To have your documents automatically scanned to determine the locale, in the SYSIBMTS.TSDEFAULTS view, set the LANGUAGE attribute to AUTO. The SYSIBMTS.TSDEFAULTS view describes database defaults for text search using attribute-value pairs.
- The locale that you specify in a search query is used to perform linguistic processing on the query and to help identify the base forms of the query term. After the locale of the base form has been identified, the locale does not play any part in the search process itself. Thus, you could use the English language for a query and obtain German documents in the search result if the search term in its base form is present in the documents.
The following table lists the locales that DB2 Text Search supports for document processing.
| Locale code | Language | Territory |
|---|---|---|
| ar_AA | Arabic | Arabic countries or regions |
| cs_CZ | Czech | Czech Republic |
| da_DK | Danish | Denmark |
| de_CH | German | Switzerland |
| de_DE | German | Germany |
| el_GR | Greek | Greece |
| en_AU | English | Australia |
| en_GB | English | United Kingdom |
| en_US | English | United States |
| es_ES | Spanish | Spain |
| fi_FI | Finnish | Finland |
| fr_CA | French | Canada |
| fr_FR | French | France |
| it_IT | Italian | Italy |
| ja_JP | Japanese | Japan |
| ko_KR | Korean | Korea, Republic of |
| nb_NO | Norwegian Bokmål | Norway |
| nl_NL | Dutch | Netherlands |
| nn_NO | Norwegian Nynorsk | Norway |
| pl_PL | Polish | Poland |
| pt_BR | Portuguese | Brazil |
| pt_PT | Portuguese | Portugal |
| ru_RU | Russian | Russia |
| sv_SE | Swedish | Sweden |
| zh_CN | Chinese | China |
| zh_TW | Chinese | Taiwan |
Code page specification
You can index documents
if they use one of the supported DB2 code pages. Although specifying the code page when creating
a text search index is optional, doing so helps to
identify the character encoding of binary columns. If you do not specify
a code page for binary columns, the code page from the column property
is used. The list of supported territory codes and code pages
can be found here.
No comments:
Post a Comment