473,554 Members | 3,205 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Welsh language - ISO-8859-1 or Unicode ?

Hello -

I'm working on a team that is planning to add Welsh language support to a
large existing IT system which is partially web-based and
English-language-only so far. I've heard that 2 characters in Welsh
(w-circumflex and y-circumflex) are not supported in our default ISO-8859-1
character set, so a partial move to Unicode for internal storage of text
might be required.

I haven't yet found a Welsh-language website that uses these 2 characters,
so are they actually used much in Welsh? Is not supporting them likely to
cause problems?

Thanks
Jun 27 '08 #1
25 5805
"Simon" <ds*******@eeee .invalid.comwro te in message
news:48******** *************** @news.gradwell. net...
Hello -

I'm working on a team that is planning to add Welsh language support to a
large existing IT system which is partially web-based and
English-language-only so far. I've heard that 2 characters in Welsh
(w-circumflex and y-circumflex) are not supported in our default
ISO-8859-1
character set, so a partial move to Unicode for internal storage of text
might be required.

I haven't yet found a Welsh-language website that uses these 2 characters,
so are they actually used much in Welsh? Is not supporting them likely to
cause problems?

Thanks
I've just found a webpage that uses y-circumflex at the end of the third
paragraph, so it can't be that uncommon:
http://news.bbc.co.uk/welsh/hi/newsi...00/7462534.stm

This webpage uses ISO-8859-1 with entities for the y-circumflex. Using
entities would be very messy in my application, so if support for these
characters is needed, I would have to go for Unicode.
I guess my question still is: would not supporting these 2 characters be
considered bad practice for a Welsh-language business application?
Jun 27 '08 #2
Scripsit Simon:
I'm working on a team that is planning to add Welsh language support
to a large existing IT system which is partially web-based and
English-language-only so far.
Do you plan to add other languages later? Is this about names only or
also about prose texts? After all, ISO-8859-1 is insufficient even for
normal English prose; think about dashes and proper quotations marks.
I've heard that 2 characters in Welsh
(w-circumflex and y-circumflex) are not supported in our default
ISO-8859-1 character set,
Right. They are included in ISO-8859-14 (a.k.a. ISO Latin 8, or
"Celtic"), but thats not a feasible option on the WWW (IE does not
recognize that encoding).
so a partial move to Unicode for internal
storage of text might be required.
That might be easy, or it might be extremely complicated. But that's
really beyond the scope of these groups. As far as WWW authoring is
concerned, Unicode - specifically UTF-8 - is a good option, but you
could keep using ISO-8859-1 and represent those letters using character
references like ŵ for w with circumflex. But you might have to deal
with the encoding problem of the data bases involved, for example, and
with data entry.
I haven't yet found a Welsh-language website that uses these 2
characters, so are they actually used much in Welsh?
I don't know Welsh, but I expect those characters to be so rare that
using some clumsy notation like character references for them wouldn't
be a major problem.
Is not supporting them likely to cause problems?
Some people might say that it is tolerable to omit the circumflex, but
it may be distinctive (i.e. the only difference between otherwise
identical words, thought the context usually resolves the issue). And in
2008, I think it is inappropriate to add support to languages to IT
systems without supporting them properly, with all the characters needed
for their correct writing.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Jun 27 '08 #3
"Jukka K. Korpela" <jk******@cs.tu t.fiwrote in message
news:lh******** **********@read er1.news.saunal ahti.fi...
Scripsit Simon:
I'm working on a team that is planning to add Welsh language support
to a large existing IT system which is partially web-based and
English-language-only so far.

Do you plan to add other languages later? Is this about names only or
also about prose texts? After all, ISO-8859-1 is insufficient even for
normal English prose; think about dashes and proper quotations marks.
I've heard that 2 characters in Welsh
(w-circumflex and y-circumflex) are not supported in our default
ISO-8859-1 character set,

Right. They are included in ISO-8859-14 (a.k.a. ISO Latin 8, or
"Celtic"), but thats not a feasible option on the WWW (IE does not
recognize that encoding).
so a partial move to Unicode for internal
storage of text might be required.

That might be easy, or it might be extremely complicated. But that's
really beyond the scope of these groups. As far as WWW authoring is
concerned, Unicode - specifically UTF-8 - is a good option, but you
could keep using ISO-8859-1 and represent those letters using character
references like ŵ for w with circumflex. But you might have to deal
with the encoding problem of the data bases involved, for example, and
with data entry.
I haven't yet found a Welsh-language website that uses these 2
characters, so are they actually used much in Welsh?

I don't know Welsh, but I expect those characters to be so rare that
using some clumsy notation like character references for them wouldn't
be a major problem.
Is not supporting them likely to cause problems?

Some people might say that it is tolerable to omit the circumflex, but
it may be distinctive (i.e. the only difference between otherwise
identical words, thought the context usually resolves the issue). And in
2008, I think it is inappropriate to add support to languages to IT
systems without supporting them properly, with all the characters needed
for their correct writing.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/
Thanks for your reply.

Unfortunately multi-lingual support has not really been a priority in the
system design up to now,
although it has always been a possible future requirement. The system is a
complex mixture of
databases, Windows applications and web applications. I believe all the
databases and programming
languages we use already support Unicode , so I would aim to use that
support, rather than character
references which would be clumsy as you say.
Jun 27 '08 #4
Scripsit Simon:
I believe all
the databases and programming
languages we use already support Unicode , so I would aim to use that
support, rather than character
references which would be clumsy as you say.
Sounds like a simple way to go then. It is surely simplest to use
Unicode throughout, especially if character data needs to be transferred
between applications as plain text (where no character references or
markup can be used). It's also simplest in data entry if people
immediately see what they have typed, and entering characters with
circumflex should not be a problem; you can e.g. use the keyboard layout
outlined at
http://en.wikipedia.org/wiki/Keyboar...ngdom_extended

Yet, it's always possible that some software component doesn't grok
Unicode. Let's hope such problems are solvable. The web-related
components shouldn't be a problem.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Jun 27 '08 #5
Simon wrote:
Hello -

I'm working on a team that is planning to add Welsh language support to a
large existing IT system which is partially web-based and
English-language-only so far. I've heard that 2 characters in Welsh
(w-circumflex and y-circumflex) are not supported in our default ISO-8859-1
character set, so a partial move to Unicode for internal storage of text
might be required.

I haven't yet found a Welsh-language website that uses these 2 characters,
so are they actually used much in Welsh? Is not supporting them likely to
cause problems?
It could be a support problem (though I don't know why, given the
availability of UTF-8 as well as the option of numeric character
references): see the note at the bottom of

http://www.menai.ac.uk/clicclic/

As made clear at

http://www.cs.cf.ac.uk/fun/welsh/Lesson01.html

the circumflex really is supposed to appear in these locations. (Note
that even on this page, section 1.2 explains that because of support
issues, they are using their own ugly work-around for accented
characters.) Examples are given: "ty^" = "house", along with the pair
"gw^ydd" = "goose" and "gwy^dd" = "trees", which are pronounced differently.
Jun 27 '08 #6
Message-ID: <48************ ***********@new s.gradwell.netf rom Simon
contained the following:
>
Unfortunatel y (for me) that webpage uses character entities to represent the
characters outside ISO-8859-1. This isn't really a workable approach for me,
because the text I'm displaying will be stored and processed in various
databases and applications (web and non-web). I will probably end up storing
and processing the data using UCS-2 or similar and generating webpages in
UTF-8.

Surely you can add the character entities using a script when the pages
are generated?
--
Geoff Berrow 011000100110110 0010000000110
001101101011011 001000110111101 100111001011
100110001101101 111001011100111 010101101011
Jun 27 '08 #7
Harlan Messinger wrote:
Blinky the Shark wrote:
>Holy crap. I'm looking at two of your posts, and both in the body and in
the article's line in the headers pane, your name is not in the font I
have configured. And it's a *different* not-configured-by-me font in the
body than in the headers pane.

I noticed the same thing, in Thunderbird.
His FROM line reads

From: =?UTF-8?B?77yh772O772 E772S772F772B77 2T44CA77yw772S7 72J772M772P772Q ?=

I don't know what to make of this.
Jun 27 '08 #8
Harlan Messinger <hm************ *******@comcast .netwrites:
Harlan Messinger wrote:
>Blinky the Shark wrote:
>>Holy crap. I'm looking at two of your posts, and both in the body and in
the article's line in the headers pane, your name is not in the font I
have configured. And it's a *different* not-configured-by-me font in the
body than in the headers pane.

I noticed the same thing, in Thunderbird.

His FROM line reads

From: =?UTF-8?B?77yh772O772 E772S772F772B77 2T44CA77yw772S7 72J772M772P772Q ?=

I don't know what to make of this.
If I cut and paste to my utf-8-dump program:

$ utf-8-dump -f '[%u] %n\n'
Andre as 
[U+FF21] FULLWIDTH LATIN CAPITAL LETTER A
[U+FF4E] FULLWIDTH LATIN SMALL LETTER N
[U+FF44] FULLWIDTH LATIN SMALL LETTER D
[U+FF52] FULLWIDTH LATIN SMALL LETTER R
[U+FF45] FULLWIDTH LATIN SMALL LETTER E
[U+FF41] FULLWIDTH LATIN SMALL LETTER A
[U+FF53] FULLWIDTH LATIN SMALL LETTER S
[U+3000] IDEOGRAPHIC SPACE
[U+000A] <control>

Presumably your newsreader thinks it needs a separate font to find
suitable gyphs for these characters (mine does too).

--
Ben.
Jun 27 '08 #9
Scripsit Andre as Pr ilop:
ISO-8859-1 does not even contain a euro sign (€), which seems to be
an even stronger argument to move to Unicode asap than the missing
Ŵ ŵ Ŷ ŷ for Welsh.
Not really, because
a) the UK does not use the euro currency
b) the euro sign can conveniently be written using the entity reference
&euro;
c) the euro sign should not be used in normal text, according to
reputable language authorities; instead, the currency name should be
written, except perhaps in tables and other contexts where saving space
is crucial.

For commercial pages oriented towards countries using the euro, the euro
sign is needed, but it’s not really comparable to the issue of letters
needed for proper writing of a language.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Jun 27 '08 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

86
3866
by: Randy Yates | last post by:
In Harbison and Steele's text (fourth edition, p.111) it is stated, The C language does not specify the range of integers that the integral types will represent, except ot say that type int may not be smaller than short and long may not be smaller than int. They go on to say, Many implementations represent characters in 8 bits, type...
134
7908
by: evolnet.regular | last post by:
I've been utilising C for lots of small and a few medium-sized personal projects over the course of the past decade, and I've realised lately just how little progress it's made since then. I've increasingly been using scripting languages (especially Python and Bourne shell) which offer the same speed and yet are far more simple and safe to...
2
1784
by: Bobbus | last post by:
Hi, I'm trying to change the culture to Welsh by doing the following: Thread.CurrentThread.CurrentCulture = New CultureInfo("cy-GB") Thread.CurrentThread.CurrentUICulture = Thread.CurrentThread.CurrentCulture The above code works okay on Windows XP, but fails to work on Server 2003. The following error occurs on Server 2003:
3
2030
by: Adrian Parker | last post by:
How do I add the Welsh language to IE6 ? And what would the ISO code be for it ? I've tried using the User Defined cy as some sites say, but the following code just uses the catch block as it's not found. Try Thread.CurrentThread.CurrentCulture = CultureInfo.CreateSpecificCulture(Request.UserLanguages(0))
0
2787
by: Mrs Sarah Welsh | last post by:
Dear Friend I am Mrs. Sarah Welsh, an English woman who is suffering from cancerous ailment. I am married to Sir Jim Welsh who also is an Englishman though dead now. My husband worked with the Nigeria Railways for over two decade before the cold hand of death took him away on the 23rd of July 2003 at about 2:00AM. Our marriage lasted for...
7
4932
by: Robert Seacord | last post by:
The CERT/CC has just deployed a new web site dedicated to developing secure coding standards for the C programming language, C++, and eventually other programming language. We have already developed significant content for the C programming language that is available at: https://www.securecoding.cert.org/ by clicking on the "CERT C...
22
2463
by: David Mathog | last post by:
One thing that keeps coming up in this forum is that standard C lacks many functions which are required in a workstation or server but not possible in an embedded controller. This results in a plethora of "don't ask here, ask in comp.x.y instead", for queries on functions that from the documentation available to the programmer appear to be...
12
2121
by: lucky | last post by:
hi guys, right now i'm going through System.Globalization Namespace. and i found very intersting class there called CultureInfo. i was trying to get cultureInfo on the basis of name but i didnt find the way to do it. for example, if i pass the language name "Danish","German",Russian", i'm suppose to get the cultureInfo object of the...
14
1417
by: Sreenivas | last post by:
I want to know the standards document of the c++ language .Where can i get?? How different is c++ from vc++?
3
8484
by: amriksingh24 | last post by:
i have to write a function that given a graph G,produces a greedy colouring of vertices using welsh - powell algorithm
0
7783
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
1
7542
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
7873
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
6127
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development projectplanning, coding, testing, and deploymentwithout human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
0
5143
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3534
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2007
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1115
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
825
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.