471,330 Members | 1,593 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,330 software developers and data experts.

String formatting for complex writing systems

Hi guys,

I'm writing a piece of software for some Thai friend. At the end it
is supposed to print on paper some report with tables of text and
numbers. When I test it in English, the columns are aligned nicely,
but when he tests it with Thai data, the columns are all crooked.

The problem here is that in the Thai writing system some times two or
more characters together might take one single space, for example งิ
(u"\u0E07\u0E34"). This is why when I use something like u"%10s"
% ..., it just doesn't work as expected.

Is anybody aware of an alternative string format function that can
deal with this kind of writing properly?

Any suggestion is highly appreciated. Thanks!
Andy

Jun 27 '07 #1
4 971
En Wed, 27 Jun 2007 04:20:52 -0300, Andy <fu******@gmail.comescribió:
I'm writing a piece of software for some Thai friend. At the end it
is supposed to print on paper some report with tables of text and
numbers. When I test it in English, the columns are aligned nicely,
but when he tests it with Thai data, the columns are all crooked.

The problem here is that in the Thai writing system some times two or
more characters together might take one single space, for example งิ
(u"\u0E07\u0E34"). This is why when I use something like u"%10s"
% ..., it just doesn't work as expected.

Is anybody aware of an alternative string format function that can
deal with this kind of writing properly?
The same thing happens even in English if you print using a proportional
width font, a "W" is usually wider than an "i" or "l" letter.
You could use a reporting library or program (like ReportLab, generating
PDF files), but perhaps the simplest approach is to generate an HTML page
containing a table, and display and print it using your favorite browser.

--
Gabriel Genellina
Jun 27 '07 #2
On Jun 27, 12:20*am, Andy <fukaz...@gmail.comwrote:
Hi guys,

I'm writing a piece of software for some Thai friend. *At the end it
is supposed to print on paper some report with tables of text and
numbers. *When I test it in English, the columns are aligned nicely,
but when he tests it with Thai data, the columns are all crooked.

The problem here is that in the Thai writing system some times two or
more characters together might take one single space, for example งิ
(u"\u0E07\u0E34"). *This is why when I use something like u"%10s"
% ..., it just doesn't work as expected.

Is anybody aware of an alternative string format function that can
deal with this kind of writing properly?
In general case it's impossible to write such a function for many
unicode characters without feedback from rendering library.
Assuming you use *fixed* font for English and Thai the following
function will return how many columns your text will use:

from unicodedata import category
def columns(self, s):
return sum(1 for c in s if category(c) != 'Mn')

-- Leo

Jun 27 '07 #3
On Jun 27, 3:10*am, Leo Kislov <Leo.Kis...@gmail.comwrote:
On Jun 27, 12:20*am, Andy <fukaz...@gmail.comwrote:
Hi guys,
I'm writing a piece of software for some Thai friend. *At the end it
is supposed to print on paper some report with tables of text and
numbers. *When I test it in English, the columns are aligned nicely,
but when he tests it with Thai data, the columns are all crooked.
The problem here is that in the Thai writing system some times two or
more characters together might take one single space, for example งิ
(u"\u0E07\u0E34"). *This is why when I use something like u"%10s"
% ..., it just doesn't work as expected.
Is anybody aware of an alternative string format function that can
deal with this kind of writing properly?

In general case it's impossible to write such a function for many
unicode characters without feedback from rendering library.
Assuming you use *fixed* font for English and Thai the following
function will return how many columns your text will use:

from unicodedata import category
def columns(self, s):
* * return sum(1 for c in s if category(c) != 'Mn')
That should of course be written as def columns(s). Need to learn to
proofread before posting :)

-- Leo

Jun 27 '07 #4
Thanks guys!

I've used the HTML and the unicodedata suggestions, each on a
different report. These worked nicely!

Andy

Jul 2 '07 #5

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

4 posts views Thread by Dennis Myrn | last post: by
32 posts views Thread by tshad | last post: by
17 posts views Thread by Modica82 | last post: by
7 posts views Thread by L. Scott M. | last post: by
1 post views Thread by schoedl | last post: by
reply views Thread by rosydwin | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.