tsearch2, ispell, utf-8 and german special characters

Markus Wollny

Hi!

Sorry to bother you, but I just don't know how to get tsearch2 configured correctly for my setup. I've got a 7.4.3 database-cluster initdb'ed with de_DE@euro as locale, the database is with Unicode encoding.

I made and installed contrib/tsearch2 after installing the dump/reload-patch http://www.sai.msu.su/~megera/postgr...e_7.4.patch.gz as advised by the docs. So far everything is looking good, I have generated a snowball stemmer dictionary and an ispell dictionary as described in the docs and created a new configuration 'default_german' as described.

This is working somehow:
SELECT to_tsvector('default_german',
'tsearch2 erlernen ist wie zur Schule zu gehen');
-> 'gehen':10 'schulen':8 'erlernen':3 'tsearch2':2

though I don't quite understand why "Schule" is converted to "schulen" and not the other way round, but so be it. My problem lies, as every so often, with the non-ascii-characters, namely german umlauts and the ß.

SELECT to_tsvector('default_german',
'ich muß tsearch2 begreifen ');

returns null. So does any phrase which contains ÄÖÜäüß or anything that's beyond ASCII.

Another thing is the ISpell functionality; the docs are quite vague on thispart when it comes to explaining which file(s) to use to create german.med.. In ISpell conventions, umlauts seem to be represented as A" a" O" o" U" u" and thus when doing

SELECT lexize('de_ispell', 'Äther');
I receive NULL

whereas
SELECT lexize('de_ispell', 'A"ther');
gives me {"a\"ther"}
as result.

I downloaded igerman98-20030222.tar.bz2 from http://j3e.de/ispell/igerman98/dict/ which seems to be the recommended ISpell dictionary distribution forthe german language as noted on http://fmg-www.cs.ucla.edu/fmg-membe...l#German-dicts

Of course there are no german.0 or german.1 files in this distribution which would be the obvious counterparts to english.0 and english.1 mentioned inthe tsearch2-docs; there is however a file all.words built on installation, which seems to be the basis for building the hash-file later on. The first few lines of this file are

A"bte/N
A"btissin/F
a"chten/DIXY
A"chtens
A"chtung/P
a"chzen/DIXY
a"chzt/EGPX
A"cker/N

In order to get the .med-File I did sort -u -t/ +0f -1 +0 -T /usr/tmp -o german.med all.words

There is an option to generate another wordlist via make isowordlist - but this didn't resolve the umlaut-issue either, neither in the standard encoding provided in the package nor after conversion to UTF-8 (I tried both withand without a BOM).

Now has anybody actually managed to get a working configuration with tsearch2 and german language support in a unicode-database? What am I doing wrong? I just can't find any more hints in the docs, and there's a topic on the OpenFTS-Mailinglist with somewhat similar issues ( http://sourceforge.net/mailarchive/f...&forum_id=7671 ), but nothing which would actually help to resolve it.

Kind regards

Markus

Nov 23 '05 #1

Subscribe Reply

2062

Similar topics

tsearch2 and unexpected exists

by: Nigel J. Andrews | last post by:

This will be a little vague, it was last night and I can't now do the test in that db (see below) so can't give the exact wording. I seem to remember a report a little while ago about tsearch v2...

PostgreSQL Database

backend crashing despite tsearch2 patch

by: psql-mail | last post by:

I have applied the recent tsearch2 patch and recompiled the tsearch2 module but I am still experiencing the same backend crashes as I previously described. Thanks for any help, Mat GDB...

PostgreSQL Database

tsearch2 and aspell

by: Pavel Stehule | last post by:

Hello Can I use tsearch2 with aspell? I didn't find any info about it, and I don't know anything about difference between ispell and aspell. Thank You Pavel Stehule ...

PostgreSQL Database

questions about tsearch2 (for czech language)

by: Pavel Stehule | last post by:

Hello I try tsearch2 within czech environment. It is works fine, but I have two questions. 1. I have words "se", "ve" in my czech stop words. But I get this words in result. Why? Have I...

PostgreSQL Database

making tsearch2 dictionaries

by: Ben | last post by:

I'm trying to make myself a dictionary for tsearch2 that converts numbers to their english word equivalents. This seems to be working great, except that I can't figure out how to make my lexize...

PostgreSQL Database

tsearch2: restoring problem

by: Fischer Ulrich | last post by:

Hi I have a problem with the restoring of a database which uses tsearch2. I made a backup as discribed in 'tsearch-v2-intro' on the tsearch2 page. Now i'm trying to restore it into a...

PostgreSQL Database

ispell tsearch2 dictionary

by: Ben | last post by:

I just made myself an ispell dictionary for tsearch2, thinking (very incorrectly, it turns out) that looking up a misspelled word with the ispell dictionary would return possible words that I...

PostgreSQL Database

Installing FullTextSearchTool tsearch2

by: Marcel Boscher | last post by:

Hello everybody, i tried to "J.U.S.T" install the FullTextSearchTool tsearch2 under the guidiance of : http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/...

PostgreSQL Database

Rules and locking within a transaction?...

by: Net Virtual Mailing Lists | last post by:

Hello, If I have a rule like this: CREATE OR REPLACE RULE sometable_update AS ON UPDATE TO table2 DO UPDATE cache SET updated_dt=NULL WHERE tablename='sometable'; CREATE OR REPLACE RULE...

PostgreSQL Database

Tsearch2 and Unicode?

by: Dawid Kuroczko | last post by:

I'm trying to use tsearch2 with database which is in 'UNICODE' encoding. It works fine for English text, but as I intend to search Polish texts I did: insert into pg_ts_cfg('default_polish',...

PostgreSQL Database

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

C# / C Sharp