472,142 Members | 1,162 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,142 software developers and data experts.

universal unicode font for reportlab

I need to create multi lingual invoices from reportlab. I think it is
possible to use UTF 8 strings but there is a problem with the font. I
could not find any free TTF font that can do latin1, latin2, arabic,
chinese and other languages at the same time. Is there a single font
that is able to handle these languages? (Most of our invoices will be
for EN, FR, DE, HU, SK, CZ, RO but some of them needs to be in Chinese.)

Thanks,

Laszlo

Sep 8 '08 #1
16 8353
Laszlo Nagy <ga*****@shopzeus.comwrites:
I could not find any free TTF font that can do latin1, latin2,
arabic, chinese and other languages at the same time. Is there a
single font that is able to handle these languages?
The GNU Unifont <URL:http://en.wikipedia.org/wiki/GNU_Unifont>
<URL:http://unifoundry.com/unifont.htmlcovers an impressive range of
the Unicode Basic Multilingual Plane.

Unifont is originally a bitmap font, but was recently made available
in TrueType format
<URL:http://www.lgm.cl/trabajos/unifont/index.en.html>.

Both are available in Debian 'lenny'; the 'unifont' and 'ttf-unifont'
packages, respectively.

--
\ “Science doesn't work by vote and it doesn't work by |
`\ authority.” —Richard Dawkins, _Big Mistake_ (The Guardian, |
_o__) 2006-12-27) |
Ben Finney
Sep 8 '08 #2
Laszlo Nagy <ga*****@shopzeus.comwrites:

>I could not find any free TTF font that can do latin1, latin2,
arabic, chinese and other languages at the same time. Is there a
single font that is able to handle these languages?

The GNU Unifont <URL:http://en.wikipedia.org/wiki/GNU_Unifont>
<URL:http://unifoundry.com/unifont.htmlcovers an impressive range of
the Unicode Basic Multilingual Plane.

Unifont is originally a bitmap font, but was recently made available
in TrueType format
<URL:http://www.lgm.cl/trabajos/unifont/index.en.html>.

Both are available in Debian 'lenny'; the 'unifont' and 'ttf-unifont'
packages, respectively.
I found out that dejavu is what I need. It covers the languages I need
and more:

http://dejavu.svn.sourceforge.net/vi.../langcover.txt
Thanks four your help!

L

Sep 8 '08 #3
>>
The GNU Unifont <URL:http://en.wikipedia.org/wiki/GNU_Unifont>
<URL:http://unifoundry.com/unifont.htmlcovers an impressive range of
the Unicode Basic Multilingual Plane.

Unifont is originally a bitmap font, but was recently made available
in TrueType format
<URL:http://www.lgm.cl/trabajos/unifont/index.en.html>.

Both are available in Debian 'lenny'; the 'unifont' and 'ttf-unifont'
packages, respectively.
I found out that dejavu is what I need. It covers the languages I need
and more:

http://dejavu.svn.sourceforge.net/vi.../langcover.txt
Sorry, this did not work either. Dejavu does support cyrillic and greek
characters but I have to load a different ttf for that. They are no
unified. :-( The only one that worked so far was "unifont.tff" but it is
very ugly above point size=10.

Can you tell me what kind of font Geany is using on my Ubuntu system?
The preferences tells that it is "monospace" but when I load
VeraMono.ttf in reportlab, it will not even display latin2 characters.
In contrast, please look at this example that show my test program in Geany:

http://www.shopzeus.com/geany.jpg

It is a real scalable truetype font, displaying latin 1, latin2,
chinese, russian and japanese characters. Is it the same font? Does this
mean that reportlab is buggy? If I could load the same font that geany
uses, it would probably solve my problem forever.

Thanks,

Laszlo
Sep 8 '08 #4
Iain Dalton wrote:
Why don't you want to use multiple typefaces? Many programs that deal
with multilingual strings use multiple fonts (cf. any Web browser and
Emacs).
You are right, but these PDF documents will show mixed strings. The end
user can enter arbitrary strings into the database, and they must be
presented. For example, the name of a product can be arabic or german.
It might be possible to guess the language used from the unicode string,
and then select a different font. But I don't want to go into that trouble.

It would be a great idea to use pango. Apparently pango is able to
change fonts on the fly and render the requested glyph. However, if I
use pango then I loose the much higher level of abstraction that comes
with reportlab and platypus: I need automatic page headers and footers,
I need to be able to repeat table headers on each page automatically
(when the table doesn't fit one page) etc. Developing my own "platypus"
like engine for pango and PDF rendering is a nightmare.

Better than that, I can develop my own flowable object for platypus: a
special paragraph that changes the used true type font on the fly.
(Split input string into parts, determine language for the parts and
display each part with its own font.) But of course this is a lot of
extra programming.

The simplest solution would be to use a font that is able to handle all
encodings that I need.

Thanks,

Laszlo

Sep 8 '08 #5
The simplest solution would be to use a font that is able to handle all
encodings that I need.
My OpenOffice on WinXP uses a unicode font, I believe Lucida Sans
Unicode, that seems to cover the entire BMP. I don't know whether it
was already installed or installed by OO or how one would get to it to
extract it.

Sep 8 '08 #6
Terry Reedy <tj*****@udel.eduwrote:
>My OpenOffice on WinXP uses a unicode font, I believe Lucida Sans
Unicode, that seems to cover the entire BMP.
Lucida Sans Unicode only covers a small subset of Unicode. It may seem
to cover a wider range because Windows (and possibly OpenOffice) will
automatically substitute characters from other fonts, if necessary.
>I don't know whether it was already installed or installed by OO or
how one would get to it to extract it.
It's a standard Windows font.

Ross Ridge

--
l/ // Ross Ridge -- The Great HTMU
[oo][oo] rr****@csclub.uwaterloo.ca
-()-/()/ http://www.csclub.uwaterloo.ca/~rridge/
db //
Sep 9 '08 #7


Ross Ridge wrote:
Terry Reedy <tj*****@udel.eduwrote:
>My OpenOffice on WinXP uses a unicode font, I believe Lucida Sans
Unicode, that seems to cover the entire BMP.

Lucida Sans Unicode only covers a small subset of Unicode. It may seem
to cover a wider range because Windows (and possibly OpenOffice) will
automatically substitute characters from other fonts, if necessary.
Sorry, I posted the wrong name.
Ariel Unicode MS is the one that seems pretty complete.

>I don't know whether it was already installed or installed by OO or
how one would get to it to extract it.

It's a standard Windows font.
From the MS, I would guess that is a Windows font too ;-).

Sep 9 '08 #8
Terry Reedy <tj*****@udel.eduwrote:
>Sorry, I posted the wrong name.
Ariel Unicode MS is the one that seems pretty complete.
....
From the MS, I would guess that is a Windows font too ;-).
It's made by Microsoft, but it's not a standard Windows font. I think
it comes with Microsoft Office.

Ross Ridge

--
l/ // Ross Ridge -- The Great HTMU
[oo][oo] rr****@csclub.uwaterloo.ca
-()-/()/ http://www.csclub.uwaterloo.ca/~rridge/
db //
Sep 9 '08 #9


Jeroen Ruigrok van der Werven wrote:
-On [20080909 05:23], Terry Reedy (tj*****@udel.edu) wrote:
>Ariel Unicode MS is the one that seems pretty complete.

Not really. It misses a lot of characters.
Well, it has Latin, Greek, Cyrillic, Hebrew, Arabic, several south
Asian, Tibetan, CJK, Japanese, Korean, and numerous symbols and special
forms. I don't know what it misses, but I think that covers what the OP
asked for.

Sep 9 '08 #10
Ross Ridge wrote:
Terry Reedy <tj*****@udel.eduwrote:
>Sorry, I posted the wrong name.
Ariel Unicode MS is the one that seems pretty complete.
...
>From the MS, I would guess that is a Windows font too ;-).

It's made by Microsoft, but it's not a standard Windows font. I think
it comes with Microsoft Office.
I need to use HTML anyway. I realized that universal unicode fonts are
above 5MB in size. The report would be a 10KB PDF, but I need to embed
the font before I can send it to anyone. Since some reports needs to be
sent in emails, I need to use something else. I cannot be sending 10MB
emails for "one page" reports.

I ended up implementing the reports in HTML. I'm assuming that the
user's browser is capable of displaying any characters needed. Now there
is another problem: how to print an HTML without page header/footer
information, from a browser? But that is another problem and probably
has nothing to do with Python.

Thanks for your help anyway.

Best,

Laszlo

Sep 10 '08 #11
Laszlo Nagy <ga*****@shopzeus.comwrote:
I need to use HTML anyway. I realized that universal unicode fonts are
above 5MB in size. The report would be a 10KB PDF, but I need to embed
the font before I can send it to anyone. Since some reports needs to be
sent in emails, I need to use something else. I cannot be sending 10MB
emails for "one page" reports.
I thought that usually when you embed a font in a PDF only the glyphs which
are actually used in the document get embedded. Unfortunately a quick test
with reportlab seems to show that it doesn't do that optimisation: it looks
as though it just embeds the entire font.

--
Duncan Booth http://kupuguy.blogspot.com
Sep 10 '08 #12
Duncan Booth <du**********@suttoncourtenay.org.ukwrote:
>I thought that usually when you embed a font in a PDF only the glyphs which
are actually used in the document get embedded. Unfortunately a quick test
with reportlab seems to show that it doesn't do that optimisation: it looks
as though it just embeds the entire font.
Yah, PDF files normally only contain an embedded subset of the fonts used.
It might possible to use Ghostscript's ps2pdf command (which can take a
PDF file as input) to strip out the unused glyphs from the embedded fonts.

Ross Ridge

--
l/ // Ross Ridge -- The Great HTMU
[oo][oo] rr****@csclub.uwaterloo.ca
-()-/()/ http://www.csclub.uwaterloo.ca/~rridge/
db //
Sep 10 '08 #13
Duncan Booth <du**********@invalid.invalidwrote:
>
Laszlo Nagy <ga*****@shopzeus.comwrote:
>I need to use HTML anyway. I realized that universal unicode fonts are
above 5MB in size. The report would be a 10KB PDF, but I need to embed
the font before I can send it to anyone. Since some reports needs to be
sent in emails, I need to use something else. I cannot be sending 10MB
emails for "one page" reports.
I thought that usually when you embed a font in a PDF only the glyphs which
are actually used in the document get embedded. Unfortunately a quick test
with reportlab seems to show that it doesn't do that optimisation: it looks
as though it just embeds the entire font.
No, it does subsetting. There was a debate a year or two ago on the
reportlab list about how the font subset should be named in the resulting
PDF file.

Is it possible you have an older release?
--
Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
Sep 11 '08 #14
Tim Roberts <ti**@probo.comwrote:
Duncan Booth <du**********@invalid.invalidwrote:
>>
Laszlo Nagy <ga*****@shopzeus.comwrote:
>>I need to use HTML anyway. I realized that universal unicode fonts
are above 5MB in size. The report would be a 10KB PDF, but I need to
embed the font before I can send it to anyone. Since some reports
needs to be sent in emails, I need to use something else. I cannot
be sending 10MB emails for "one page" reports.
I thought that usually when you embed a font in a PDF only the glyphs
which are actually used in the document get embedded. Unfortunately a
quick test with reportlab seems to show that it doesn't do that
optimisation: it looks as though it just embeds the entire font.

No, it does subsetting. There was a debate a year or two ago on the
reportlab list about how the font subset should be named in the
resulting PDF file.

Is it possible you have an older release?
It was 2.1 downloaded about 30 minutes before my post.

The not too scientific test I did was to copy the font embedding example
from the Reportlab documentation, modify it enough to make it actually
run, and then change the output to have only one glyph. The resulting
PDF is virtually identical. I'm not a reportlab expert though so I may
have made some blindingly obvious beginners mistake (or maybe it only
subsets fonts over a certain size or glyphs outside the ascii range?).
---------- rlab.py ------------
import os, sys
import reportlab
folder = os.path.dirname(reportlab.__file__) + os.sep + 'fonts'
afmFile = os.path.join(folder, 'LeERC___.AFM')
pfbFile = os.path.join(folder, 'LeERC___.PFB')
from reportlab.pdfbase import pdfmetrics
justFace = pdfmetrics.EmbeddedType1Face(afmFile, pfbFile)
faceName = 'LettErrorRobot-Chrome' # pulled from AFM file
pdfmetrics.registerTypeFace(justFace)
justFont = pdfmetrics.Font('LettErrorRobot-Chrome',faceName,'WinAnsiEncoding')
pdfmetrics.registerFont(justFont)
from reportlab.pdfgen.canvas import Canvas
canvas = Canvas('temp.pdf')
canvas.setFont('LettErrorRobot-Chrome', 32)
if sys.argv:
canvas.drawString(10, 150, 'TTTT TTTTTT TT TT')
canvas.drawString(10, 100, 'TTTTTTTTTTTTTTTTTTTTT')
else:
canvas.drawString(10, 150, 'This should be in')
canvas.drawString(10, 100, 'LettErrorRobot-Chrome')
canvas.save()
-------------------------------

--
Duncan Booth http://kupuguy.blogspot.com
Sep 11 '08 #15
Duncan Booth <du**********@invalid.invalidwrote:
I may have made some blindingly obvious beginners mistake
I made the blindingly stupid beginners mistake of cleaning up the code
before posting it and breaking it in the process. The 'if' should of
course say:
if len(sys.argv) 1:

However my original test was done by toggling commented out lines of
code so my conclusion remains the same: the only differences between the
output are the creation date, the page stream, and the digest.

--
Duncan Booth http://kupuguy.blogspot.com
Sep 11 '08 #16
Duncan Booth <du**********@invalid.invalidwrote:
>
The not too scientific test I did was to copy the font embedding example
from the Reportlab documentation, modify it enough to make it actually
run, and then change the output to have only one glyph. The resulting
PDF is virtually identical. I'm not a reportlab expert though so I may
have made some blindingly obvious beginners mistake (or maybe it only
subsets fonts over a certain size or glyphs outside the ascii range?).

---------- rlab.py ------------
import os, sys
import reportlab
folder = os.path.dirname(reportlab.__file__) + os.sep + 'fonts'
afmFile = os.path.join(folder, 'LeERC___.AFM')
pfbFile = os.path.join(folder, 'LeERC___.PFB')
from reportlab.pdfbase import pdfmetrics
justFace = pdfmetrics.EmbeddedType1Face(afmFile, pfbFile)
faceName = 'LettErrorRobot-Chrome' # pulled from AFM file
pdfmetrics.registerTypeFace(justFace)
justFont = pdfmetrics.Font('LettErrorRobot-Chrome',faceName,'WinAnsiEncoding')
OK, look the other way while I backpedal furiously. The conversation on
the mailing last year was focused on TrueType fonts. Those are subsetted.

EmbeddedType1Face, used for Type 1 fonts, does appear to embed the entire
font.
--
Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
Sep 13 '08 #17

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

3 posts views Thread by Andreas Jung | last post: by
5 posts views Thread by John Roth | last post: by
4 posts views Thread by Active8 | last post: by
5 posts views Thread by =?Utf-8?B?S2V2aW4gVGFuZw==?= | last post: by
reply views Thread by jonathon | last post: by
reply views Thread by leo001 | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.