473,388 Members | 1,215 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,388 software developers and data experts.

universal unicode font for reportlab

I need to create multi lingual invoices from reportlab. I think it is
possible to use UTF 8 strings but there is a problem with the font. I
could not find any free TTF font that can do latin1, latin2, arabic,
chinese and other languages at the same time. Is there a single font
that is able to handle these languages? (Most of our invoices will be
for EN, FR, DE, HU, SK, CZ, RO but some of them needs to be in Chinese.)

Thanks,

Laszlo

Sep 8 '08 #1
16 8590
Laszlo Nagy <ga*****@shopzeus.comwrites:
I could not find any free TTF font that can do latin1, latin2,
arabic, chinese and other languages at the same time. Is there a
single font that is able to handle these languages?
The GNU Unifont <URL:http://en.wikipedia.org/wiki/GNU_Unifont>
<URL:http://unifoundry.com/unifont.htmlcovers an impressive range of
the Unicode Basic Multilingual Plane.

Unifont is originally a bitmap font, but was recently made available
in TrueType format
<URL:http://www.lgm.cl/trabajos/unifont/index.en.html>.

Both are available in Debian 'lenny'; the 'unifont' and 'ttf-unifont'
packages, respectively.

--
\ “Science doesn't work by vote and it doesn't work by |
`\ authority.” —Richard Dawkins, _Big Mistake_ (The Guardian, |
_o__) 2006-12-27) |
Ben Finney
Sep 8 '08 #2
Laszlo Nagy <ga*****@shopzeus.comwrites:

>I could not find any free TTF font that can do latin1, latin2,
arabic, chinese and other languages at the same time. Is there a
single font that is able to handle these languages?

The GNU Unifont <URL:http://en.wikipedia.org/wiki/GNU_Unifont>
<URL:http://unifoundry.com/unifont.htmlcovers an impressive range of
the Unicode Basic Multilingual Plane.

Unifont is originally a bitmap font, but was recently made available
in TrueType format
<URL:http://www.lgm.cl/trabajos/unifont/index.en.html>.

Both are available in Debian 'lenny'; the 'unifont' and 'ttf-unifont'
packages, respectively.
I found out that dejavu is what I need. It covers the languages I need
and more:

http://dejavu.svn.sourceforge.net/vi.../langcover.txt
Thanks four your help!

L

Sep 8 '08 #3
>>
The GNU Unifont <URL:http://en.wikipedia.org/wiki/GNU_Unifont>
<URL:http://unifoundry.com/unifont.htmlcovers an impressive range of
the Unicode Basic Multilingual Plane.

Unifont is originally a bitmap font, but was recently made available
in TrueType format
<URL:http://www.lgm.cl/trabajos/unifont/index.en.html>.

Both are available in Debian 'lenny'; the 'unifont' and 'ttf-unifont'
packages, respectively.
I found out that dejavu is what I need. It covers the languages I need
and more:

http://dejavu.svn.sourceforge.net/vi.../langcover.txt
Sorry, this did not work either. Dejavu does support cyrillic and greek
characters but I have to load a different ttf for that. They are no
unified. :-( The only one that worked so far was "unifont.tff" but it is
very ugly above point size=10.

Can you tell me what kind of font Geany is using on my Ubuntu system?
The preferences tells that it is "monospace" but when I load
VeraMono.ttf in reportlab, it will not even display latin2 characters.
In contrast, please look at this example that show my test program in Geany:

http://www.shopzeus.com/geany.jpg

It is a real scalable truetype font, displaying latin 1, latin2,
chinese, russian and japanese characters. Is it the same font? Does this
mean that reportlab is buggy? If I could load the same font that geany
uses, it would probably solve my problem forever.

Thanks,

Laszlo
Sep 8 '08 #4
Iain Dalton wrote:
Why don't you want to use multiple typefaces? Many programs that deal
with multilingual strings use multiple fonts (cf. any Web browser and
Emacs).
You are right, but these PDF documents will show mixed strings. The end
user can enter arbitrary strings into the database, and they must be
presented. For example, the name of a product can be arabic or german.
It might be possible to guess the language used from the unicode string,
and then select a different font. But I don't want to go into that trouble.

It would be a great idea to use pango. Apparently pango is able to
change fonts on the fly and render the requested glyph. However, if I
use pango then I loose the much higher level of abstraction that comes
with reportlab and platypus: I need automatic page headers and footers,
I need to be able to repeat table headers on each page automatically
(when the table doesn't fit one page) etc. Developing my own "platypus"
like engine for pango and PDF rendering is a nightmare.

Better than that, I can develop my own flowable object for platypus: a
special paragraph that changes the used true type font on the fly.
(Split input string into parts, determine language for the parts and
display each part with its own font.) But of course this is a lot of
extra programming.

The simplest solution would be to use a font that is able to handle all
encodings that I need.

Thanks,

Laszlo

Sep 8 '08 #5
The simplest solution would be to use a font that is able to handle all
encodings that I need.
My OpenOffice on WinXP uses a unicode font, I believe Lucida Sans
Unicode, that seems to cover the entire BMP. I don't know whether it
was already installed or installed by OO or how one would get to it to
extract it.

Sep 8 '08 #6
Terry Reedy <tj*****@udel.eduwrote:
>My OpenOffice on WinXP uses a unicode font, I believe Lucida Sans
Unicode, that seems to cover the entire BMP.
Lucida Sans Unicode only covers a small subset of Unicode. It may seem
to cover a wider range because Windows (and possibly OpenOffice) will
automatically substitute characters from other fonts, if necessary.
>I don't know whether it was already installed or installed by OO or
how one would get to it to extract it.
It's a standard Windows font.

Ross Ridge

--
l/ // Ross Ridge -- The Great HTMU
[oo][oo] rr****@csclub.uwaterloo.ca
-()-/()/ http://www.csclub.uwaterloo.ca/~rridge/
db //
Sep 9 '08 #7


Ross Ridge wrote:
Terry Reedy <tj*****@udel.eduwrote:
>My OpenOffice on WinXP uses a unicode font, I believe Lucida Sans
Unicode, that seems to cover the entire BMP.

Lucida Sans Unicode only covers a small subset of Unicode. It may seem
to cover a wider range because Windows (and possibly OpenOffice) will
automatically substitute characters from other fonts, if necessary.
Sorry, I posted the wrong name.
Ariel Unicode MS is the one that seems pretty complete.

>I don't know whether it was already installed or installed by OO or
how one would get to it to extract it.

It's a standard Windows font.
From the MS, I would guess that is a Windows font too ;-).

Sep 9 '08 #8
Terry Reedy <tj*****@udel.eduwrote:
>Sorry, I posted the wrong name.
Ariel Unicode MS is the one that seems pretty complete.
....
From the MS, I would guess that is a Windows font too ;-).
It's made by Microsoft, but it's not a standard Windows font. I think
it comes with Microsoft Office.

Ross Ridge

--
l/ // Ross Ridge -- The Great HTMU
[oo][oo] rr****@csclub.uwaterloo.ca
-()-/()/ http://www.csclub.uwaterloo.ca/~rridge/
db //
Sep 9 '08 #9


Jeroen Ruigrok van der Werven wrote:
-On [20080909 05:23], Terry Reedy (tj*****@udel.edu) wrote:
>Ariel Unicode MS is the one that seems pretty complete.

Not really. It misses a lot of characters.
Well, it has Latin, Greek, Cyrillic, Hebrew, Arabic, several south
Asian, Tibetan, CJK, Japanese, Korean, and numerous symbols and special
forms. I don't know what it misses, but I think that covers what the OP
asked for.

Sep 9 '08 #10
Ross Ridge wrote:
Terry Reedy <tj*****@udel.eduwrote:
>Sorry, I posted the wrong name.
Ariel Unicode MS is the one that seems pretty complete.
...
>From the MS, I would guess that is a Windows font too ;-).

It's made by Microsoft, but it's not a standard Windows font. I think
it comes with Microsoft Office.
I need to use HTML anyway. I realized that universal unicode fonts are
above 5MB in size. The report would be a 10KB PDF, but I need to embed
the font before I can send it to anyone. Since some reports needs to be
sent in emails, I need to use something else. I cannot be sending 10MB
emails for "one page" reports.

I ended up implementing the reports in HTML. I'm assuming that the
user's browser is capable of displaying any characters needed. Now there
is another problem: how to print an HTML without page header/footer
information, from a browser? But that is another problem and probably
has nothing to do with Python.

Thanks for your help anyway.

Best,

Laszlo

Sep 10 '08 #11
Laszlo Nagy <ga*****@shopzeus.comwrote:
I need to use HTML anyway. I realized that universal unicode fonts are
above 5MB in size. The report would be a 10KB PDF, but I need to embed
the font before I can send it to anyone. Since some reports needs to be
sent in emails, I need to use something else. I cannot be sending 10MB
emails for "one page" reports.
I thought that usually when you embed a font in a PDF only the glyphs which
are actually used in the document get embedded. Unfortunately a quick test
with reportlab seems to show that it doesn't do that optimisation: it looks
as though it just embeds the entire font.

--
Duncan Booth http://kupuguy.blogspot.com
Sep 10 '08 #12
Duncan Booth <du**********@suttoncourtenay.org.ukwrote:
>I thought that usually when you embed a font in a PDF only the glyphs which
are actually used in the document get embedded. Unfortunately a quick test
with reportlab seems to show that it doesn't do that optimisation: it looks
as though it just embeds the entire font.
Yah, PDF files normally only contain an embedded subset of the fonts used.
It might possible to use Ghostscript's ps2pdf command (which can take a
PDF file as input) to strip out the unused glyphs from the embedded fonts.

Ross Ridge

--
l/ // Ross Ridge -- The Great HTMU
[oo][oo] rr****@csclub.uwaterloo.ca
-()-/()/ http://www.csclub.uwaterloo.ca/~rridge/
db //
Sep 10 '08 #13
Duncan Booth <du**********@invalid.invalidwrote:
>
Laszlo Nagy <ga*****@shopzeus.comwrote:
>I need to use HTML anyway. I realized that universal unicode fonts are
above 5MB in size. The report would be a 10KB PDF, but I need to embed
the font before I can send it to anyone. Since some reports needs to be
sent in emails, I need to use something else. I cannot be sending 10MB
emails for "one page" reports.
I thought that usually when you embed a font in a PDF only the glyphs which
are actually used in the document get embedded. Unfortunately a quick test
with reportlab seems to show that it doesn't do that optimisation: it looks
as though it just embeds the entire font.
No, it does subsetting. There was a debate a year or two ago on the
reportlab list about how the font subset should be named in the resulting
PDF file.

Is it possible you have an older release?
--
Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
Sep 11 '08 #14
Tim Roberts <ti**@probo.comwrote:
Duncan Booth <du**********@invalid.invalidwrote:
>>
Laszlo Nagy <ga*****@shopzeus.comwrote:
>>I need to use HTML anyway. I realized that universal unicode fonts
are above 5MB in size. The report would be a 10KB PDF, but I need to
embed the font before I can send it to anyone. Since some reports
needs to be sent in emails, I need to use something else. I cannot
be sending 10MB emails for "one page" reports.
I thought that usually when you embed a font in a PDF only the glyphs
which are actually used in the document get embedded. Unfortunately a
quick test with reportlab seems to show that it doesn't do that
optimisation: it looks as though it just embeds the entire font.

No, it does subsetting. There was a debate a year or two ago on the
reportlab list about how the font subset should be named in the
resulting PDF file.

Is it possible you have an older release?
It was 2.1 downloaded about 30 minutes before my post.

The not too scientific test I did was to copy the font embedding example
from the Reportlab documentation, modify it enough to make it actually
run, and then change the output to have only one glyph. The resulting
PDF is virtually identical. I'm not a reportlab expert though so I may
have made some blindingly obvious beginners mistake (or maybe it only
subsets fonts over a certain size or glyphs outside the ascii range?).
---------- rlab.py ------------
import os, sys
import reportlab
folder = os.path.dirname(reportlab.__file__) + os.sep + 'fonts'
afmFile = os.path.join(folder, 'LeERC___.AFM')
pfbFile = os.path.join(folder, 'LeERC___.PFB')
from reportlab.pdfbase import pdfmetrics
justFace = pdfmetrics.EmbeddedType1Face(afmFile, pfbFile)
faceName = 'LettErrorRobot-Chrome' # pulled from AFM file
pdfmetrics.registerTypeFace(justFace)
justFont = pdfmetrics.Font('LettErrorRobot-Chrome',faceName,'WinAnsiEncoding')
pdfmetrics.registerFont(justFont)
from reportlab.pdfgen.canvas import Canvas
canvas = Canvas('temp.pdf')
canvas.setFont('LettErrorRobot-Chrome', 32)
if sys.argv:
canvas.drawString(10, 150, 'TTTT TTTTTT TT TT')
canvas.drawString(10, 100, 'TTTTTTTTTTTTTTTTTTTTT')
else:
canvas.drawString(10, 150, 'This should be in')
canvas.drawString(10, 100, 'LettErrorRobot-Chrome')
canvas.save()
-------------------------------

--
Duncan Booth http://kupuguy.blogspot.com
Sep 11 '08 #15
Duncan Booth <du**********@invalid.invalidwrote:
I may have made some blindingly obvious beginners mistake
I made the blindingly stupid beginners mistake of cleaning up the code
before posting it and breaking it in the process. The 'if' should of
course say:
if len(sys.argv) 1:

However my original test was done by toggling commented out lines of
code so my conclusion remains the same: the only differences between the
output are the creation date, the page stream, and the digest.

--
Duncan Booth http://kupuguy.blogspot.com
Sep 11 '08 #16
Duncan Booth <du**********@invalid.invalidwrote:
>
The not too scientific test I did was to copy the font embedding example
from the Reportlab documentation, modify it enough to make it actually
run, and then change the output to have only one glyph. The resulting
PDF is virtually identical. I'm not a reportlab expert though so I may
have made some blindingly obvious beginners mistake (or maybe it only
subsets fonts over a certain size or glyphs outside the ascii range?).

---------- rlab.py ------------
import os, sys
import reportlab
folder = os.path.dirname(reportlab.__file__) + os.sep + 'fonts'
afmFile = os.path.join(folder, 'LeERC___.AFM')
pfbFile = os.path.join(folder, 'LeERC___.PFB')
from reportlab.pdfbase import pdfmetrics
justFace = pdfmetrics.EmbeddedType1Face(afmFile, pfbFile)
faceName = 'LettErrorRobot-Chrome' # pulled from AFM file
pdfmetrics.registerTypeFace(justFace)
justFont = pdfmetrics.Font('LettErrorRobot-Chrome',faceName,'WinAnsiEncoding')
OK, look the other way while I backpedal furiously. The conversation on
the mailing last year was focused on TrueType fonts. Those are subsetted.

EmbeddedType1Face, used for Type 1 fonts, does appear to embed the entire
font.
--
Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
Sep 13 '08 #17

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Andreas Jung | last post by:
Reportlab has some problems with creating PDFs from UTF8 encoded text. For this reason they are using a truetype font rina.ttf which looks *very ugly*. Does anyone know of a suitable free...
5
by: John Roth | last post by:
I've got an interesting little problem that I can't find an answer to after hunting through the doc (2.3.3). I've got a string that contains something that kind of resembles an HTML document. On...
4
by: Active8 | last post by:
I did this once and can't remember how <blush> so I read the reportlab user guid. It says to unzip the reportlab archive - this is on w2k, BTW, with Python23 - to a directory and make a file...
1
by: Joerg | last post by:
I am in the process of creating an international GUI application with C# on ..NET1.1 (Win2k), which is supposed to implement a particular look/design. In order to achieve this, I plan amongst...
1
by: NevilleDNZ | last post by:
Hi, Apologies first as I am not a unicode expert.... indeed I the details probably totally elude me. Not withstanding: how can I convert a binary string containing UTF-8 binary into a python...
5
by: =?Utf-8?B?S2V2aW4gVGFuZw==?= | last post by:
In MFC, CRichEditCtrl contrl, I want to set the codepage for the control to Unicode. I used the following method to set codepage for it (only for ANSI or BIG5, etc, not unicode). How should I...
2
by: Jeffrey Walton | last post by:
Hi All, BMP Strings are a subset of Universal Strings.The BMP string uses approximately 65,000 code points from Universal String encoding. BMP Strings: ISO/IEC 10646, 2-octet canonical form,...
24
by: Donn Ingle | last post by:
Hello, I hope someone can illuminate this situation for me. Here's the nutshell: 1. On start I call locale.setlocale(locale.LC_ALL,''), the getlocale. 2. If this returns "C" or anything...
0
by: jonathon | last post by:
On Mon, Sep 8, 2008 at 01:51, Laszlo Nagy wrote: Use Code2000 http://www.code2000.net/ xan jonathon
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.