"Peter Münster" <lo**@signature.invalidwrote in message
news:Pi*****************************@gaston.deltad ore.bzh...
On Wed, 30 Aug 2006, Kimmo Laine wrote:
>That might be a multibyte-string related problem. If the string is
encoded
using multibyte charset, such as utf-8, it could be the reason
str_word_count is confused.
Yes, you're right: I've just tried with fr_FR.iso885915 and it works.
That's great. :)
>Once you've installed multibyte library, you could try writing a regular
expression for counting the words and use it with the mb_ereg* functions.
Thanks for the hint. As a workaround I use already a regular expression to
get the words, but str_word_count() is still better than my solution:
str_word_count() detects constructs like "it's" and "week-end" etc.
There were some examples of regexp substitutions for str_word_count in the
php.net manualpage, in the user contributions. You might want to check them.
For example rcATinterfacesDOTfr suggests that
$word_count = count(preg_split('/\W+/', $text, -1,
PREG_SPLIT_NO_EMPTY));
should work. The advantage in this solution is that there is mb_eregi_split
as well, wo you could use this with the mb-functions if you wanted to use
utf-8.
I try to enforce utf-8 whenever it is possible simply because of it's
advantages in an international multilingual communication even thou it has
it's disadvantages as well.
--
"Ohjelmoija on organismi joka muuttaa kofeiinia koodiksi" - lpk
http://outolempi.net/ahdistus/ - Satunnaisesti päivittyvä nettisarjis
sp**@outolempi.net || Gedoon-S @ IRCnet || rot13(xv***@bhgbyrzcv.arg)