universal unicode font for reportlab

Laszlo Nagy

I need to create multi lingual invoices from reportlab. I think it is
possible to use UTF 8 strings but there is a problem with the font. I
could not find any free TTF font that can do latin1, latin2, arabic,
chinese and other languages at the same time. Is there a single font
that is able to handle these languages? (Most of our invoices will be
for EN, FR, DE, HU, SK, CZ, RO but some of them needs to be in Chinese.)

Thanks,

Laszlo

Sep 8 '08 #1

Subscribe Post Reply

8590

Ben Finney

Laszlo Nagy <ga*****@shopzeus.comwrites:

I could not find any free TTF font that can do latin1, latin2,
arabic, chinese and other languages at the same time. Is there a
single font that is able to handle these languages?

The GNU Unifont <URL:http://en.wikipedia.org/wiki/GNU_Unifont>
<URL:http://unifoundry.com/unifont.htmlcovers an impressive range of
the Unicode Basic Multilingual Plane.

Unifont is originally a bitmap font, but was recently made available
in TrueType format
<URL:http://www.lgm.cl/trabajos/unifont/index.en.html>.

Both are available in Debian 'lenny'; the 'unifont' and 'ttf-unifont'
packages, respectively.

--
\ â€œScience doesn't work by vote and it doesn't work by |
`\ authority.â€ â€”Richard Dawkins, _Big Mistake_ (The Guardian, |
_o__) 2006-12-27) |
Ben Finney

Sep 8 '08 #2

Laszlo Nagy

Laszlo Nagy <ga*****@shopzeus.comwrites:

>I could not find any free TTF font that can do latin1, latin2,
arabic, chinese and other languages at the same time. Is there a
single font that is able to handle these languages?

The GNU Unifont <URL:http://en.wikipedia.org/wiki/GNU_Unifont>
<URL:http://unifoundry.com/unifont.htmlcovers an impressive range of
the Unicode Basic Multilingual Plane.

Unifont is originally a bitmap font, but was recently made available
in TrueType format
<URL:http://www.lgm.cl/trabajos/unifont/index.en.html>.

Both are available in Debian 'lenny'; the 'unifont' and 'ttf-unifont'
packages, respectively.

I found out that dejavu is what I need. It covers the languages I need
and more:

http://dejavu.svn.sourceforge.net/vi.../langcover.txt
Thanks four your help!

L

Sep 8 '08 #3

Laszlo Nagy

>>
The GNU Unifont <URL:http://en.wikipedia.org/wiki/GNU_Unifont>
<URL:http://unifoundry.com/unifont.htmlcovers an impressive range of
the Unicode Basic Multilingual Plane.

Unifont is originally a bitmap font, but was recently made available
in TrueType format
<URL:http://www.lgm.cl/trabajos/unifont/index.en.html>.

Both are available in Debian 'lenny'; the 'unifont' and 'ttf-unifont'
packages, respectively.
I found out that dejavu is what I need. It covers the languages I need
and more:

http://dejavu.svn.sourceforge.net/vi.../langcover.txt

Sorry, this did not work either. Dejavu does support cyrillic and greek
characters but I have to load a different ttf for that. They are no
unified. :-( The only one that worked so far was "unifont.tff" but it is
very ugly above point size=10.

Can you tell me what kind of font Geany is using on my Ubuntu system?
The preferences tells that it is "monospace" but when I load
VeraMono.ttf in reportlab, it will not even display latin2 characters.
In contrast, please look at this example that show my test program in Geany:

http://www.shopzeus.com/geany.jpg

It is a real scalable truetype font, displaying latin 1, latin2,
chinese, russian and japanese characters. Is it the same font? Does this
mean that reportlab is buggy? If I could load the same font that geany
uses, it would probably solve my problem forever.

Thanks,

Laszlo

Sep 8 '08 #4

Laszlo Nagy

Iain Dalton wrote:

Why don't you want to use multiple typefaces? Many programs that deal
with multilingual strings use multiple fonts (cf. any Web browser and
Emacs).

You are right, but these PDF documents will show mixed strings. The end
user can enter arbitrary strings into the database, and they must be
presented. For example, the name of a product can be arabic or german.
It might be possible to guess the language used from the unicode string,
and then select a different font. But I don't want to go into that trouble.

It would be a great idea to use pango. Apparently pango is able to
change fonts on the fly and render the requested glyph. However, if I
use pango then I loose the much higher level of abstraction that comes
with reportlab and platypus: I need automatic page headers and footers,
I need to be able to repeat table headers on each page automatically
(when the table doesn't fit one page) etc. Developing my own "platypus"
like engine for pango and PDF rendering is a nightmare.

Better than that, I can develop my own flowable object for platypus: a
special paragraph that changes the used true type font on the fly.
(Split input string into parts, determine language for the parts and
display each part with its own font.) But of course this is a lot of
extra programming.

The simplest solution would be to use a font that is able to handle all
encodings that I need.

Thanks,

Laszlo

Sep 8 '08 #5

Terry Reedy

The simplest solution would be to use a font that is able to handle all
encodings that I need.

My OpenOffice on WinXP uses a unicode font, I believe Lucida Sans
Unicode, that seems to cover the entire BMP. I don't know whether it
was already installed or installed by OO or how one would get to it to
extract it.

Sep 8 '08 #6

Ross Ridge

Terry Reedy <tj*****@udel.eduwrote:

>My OpenOffice on WinXP uses a unicode font, I believe Lucida Sans
Unicode, that seems to cover the entire BMP.

Lucida Sans Unicode only covers a small subset of Unicode. It may seem
to cover a wider range because Windows (and possibly OpenOffice) will
automatically substitute characters from other fonts, if necessary.

>I don't know whether it was already installed or installed by OO or
how one would get to it to extract it.

It's a standard Windows font.

Ross Ridge

--
l/ // Ross Ridge -- The Great HTMU
[oo][oo] rr****@csclub.uwaterloo.ca
-()-/()/ http://www.csclub.uwaterloo.ca/~rridge/
db //

Sep 9 '08 #7

Terry Reedy

Ross Ridge wrote:

Terry Reedy <tj*****@udel.eduwrote:
>My OpenOffice on WinXP uses a unicode font, I believe Lucida Sans
Unicode, that seems to cover the entire BMP.

Lucida Sans Unicode only covers a small subset of Unicode. It may seem
to cover a wider range because Windows (and possibly OpenOffice) will
automatically substitute characters from other fonts, if necessary.

Sorry, I posted the wrong name.
Ariel Unicode MS is the one that seems pretty complete.

>I don't know whether it was already installed or installed by OO or
how one would get to it to extract it.

It's a standard Windows font.

From the MS, I would guess that is a Windows font too ;-).

Sep 9 '08 #8

Ross Ridge

Terry Reedy <tj*****@udel.eduwrote:

>Sorry, I posted the wrong name.
Ariel Unicode MS is the one that seems pretty complete.

....

From the MS, I would guess that is a Windows font too ;-).

It's made by Microsoft, but it's not a standard Windows font. I think
it comes with Microsoft Office.

Ross Ridge

--
l/ // Ross Ridge -- The Great HTMU
[oo][oo] rr****@csclub.uwaterloo.ca
-()-/()/ http://www.csclub.uwaterloo.ca/~rridge/
db //

Sep 9 '08 #9

Terry Reedy

Jeroen Ruigrok van der Werven wrote:

-On [20080909 05:23], Terry Reedy (tj*****@udel.edu) wrote:
>Ariel Unicode MS is the one that seems pretty complete.

Not really. It misses a lot of characters.

Well, it has Latin, Greek, Cyrillic, Hebrew, Arabic, several south
Asian, Tibetan, CJK, Japanese, Korean, and numerous symbols and special
forms. I don't know what it misses, but I think that covers what the OP
asked for.

Sep 9 '08 #10

Laszlo Nagy

Ross Ridge wrote:

Terry Reedy <tj*****@udel.eduwrote:

>Sorry, I posted the wrong name.
Ariel Unicode MS is the one that seems pretty complete.

...

>From the MS, I would guess that is a Windows font too ;-).

It's made by Microsoft, but it's not a standard Windows font. I think
it comes with Microsoft Office.

I need to use HTML anyway. I realized that universal unicode fonts are
above 5MB in size. The report would be a 10KB PDF, but I need to embed
the font before I can send it to anyone. Since some reports needs to be
sent in emails, I need to use something else. I cannot be sending 10MB
emails for "one page" reports.

I ended up implementing the reports in HTML. I'm assuming that the
user's browser is capable of displaying any characters needed. Now there
is another problem: how to print an HTML without page header/footer
information, from a browser? But that is another problem and probably
has nothing to do with Python.

Thanks for your help anyway.

Best,

Laszlo

Sep 10 '08 #11

Duncan Booth

Laszlo Nagy <ga*****@shopzeus.comwrote:

I need to use HTML anyway. I realized that universal unicode fonts are
above 5MB in size. The report would be a 10KB PDF, but I need to embed
the font before I can send it to anyone. Since some reports needs to be
sent in emails, I need to use something else. I cannot be sending 10MB
emails for "one page" reports.

I thought that usually when you embed a font in a PDF only the glyphs which
are actually used in the document get embedded. Unfortunately a quick test
with reportlab seems to show that it doesn't do that optimisation: it looks
as though it just embeds the entire font.

--
Duncan Booth http://kupuguy.blogspot.com

Sep 10 '08 #12

Ross Ridge

Duncan Booth <du**********@suttoncourtenay.org.ukwrote:

>I thought that usually when you embed a font in a PDF only the glyphs which
are actually used in the document get embedded. Unfortunately a quick test
with reportlab seems to show that it doesn't do that optimisation: it looks
as though it just embeds the entire font.

Yah, PDF files normally only contain an embedded subset of the fonts used.
It might possible to use Ghostscript's ps2pdf command (which can take a
PDF file as input) to strip out the unused glyphs from the embedded fonts.

Ross Ridge

--
l/ // Ross Ridge -- The Great HTMU
[oo][oo] rr****@csclub.uwaterloo.ca
-()-/()/ http://www.csclub.uwaterloo.ca/~rridge/
db //

Sep 10 '08 #13

Tim Roberts

Duncan Booth <du**********@invalid.invalidwrote:

>
Laszlo Nagy <ga*****@shopzeus.comwrote:

>I need to use HTML anyway. I realized that universal unicode fonts are
above 5MB in size. The report would be a 10KB PDF, but I need to embed
the font before I can send it to anyone. Since some reports needs to be
sent in emails, I need to use something else. I cannot be sending 10MB
emails for "one page" reports.

I thought that usually when you embed a font in a PDF only the glyphs which
are actually used in the document get embedded. Unfortunately a quick test
with reportlab seems to show that it doesn't do that optimisation: it looks
as though it just embeds the entire font.

No, it does subsetting. There was a debate a year or two ago on the
reportlab list about how the font subset should be named in the resulting
PDF file.

Is it possible you have an older release?
--
Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.

Sep 11 '08 #14

Duncan Booth

Tim Roberts <ti**@probo.comwrote:

Duncan Booth <du**********@invalid.invalidwrote:
>>
Laszlo Nagy <ga*****@shopzeus.comwrote:

>>I need to use HTML anyway. I realized that universal unicode fonts
are above 5MB in size. The report would be a 10KB PDF, but I need to
embed the font before I can send it to anyone. Since some reports
needs to be sent in emails, I need to use something else. I cannot
be sending 10MB emails for "one page" reports.

I thought that usually when you embed a font in a PDF only the glyphs
which are actually used in the document get embedded. Unfortunately a
quick test with reportlab seems to show that it doesn't do that
optimisation: it looks as though it just embeds the entire font.

No, it does subsetting. There was a debate a year or two ago on the
reportlab list about how the font subset should be named in the
resulting PDF file.

Is it possible you have an older release?

It was 2.1 downloaded about 30 minutes before my post.

The not too scientific test I did was to copy the font embedding example
from the Reportlab documentation, modify it enough to make it actually
run, and then change the output to have only one glyph. The resulting
PDF is virtually identical. I'm not a reportlab expert though so I may
have made some blindingly obvious beginners mistake (or maybe it only
subsets fonts over a certain size or glyphs outside the ascii range?).
---------- rlab.py ------------
import os, sys
import reportlab
folder = os.path.dirname(reportlab.__file__) + os.sep + 'fonts'
afmFile = os.path.join(folder, 'LeERC___.AFM')
pfbFile = os.path.join(folder, 'LeERC___.PFB')
from reportlab.pdfbase import pdfmetrics
justFace = pdfmetrics.EmbeddedType1Face(afmFile, pfbFile)
faceName = 'LettErrorRobot-Chrome' # pulled from AFM file
pdfmetrics.registerTypeFace(justFace)
justFont = pdfmetrics.Font('LettErrorRobot-Chrome',faceName,'WinAnsiEncoding')
pdfmetrics.registerFont(justFont)
from reportlab.pdfgen.canvas import Canvas
canvas = Canvas('temp.pdf')
canvas.setFont('LettErrorRobot-Chrome', 32)
if sys.argv:
canvas.drawString(10, 150, 'TTTT TTTTTT TT TT')
canvas.drawString(10, 100, 'TTTTTTTTTTTTTTTTTTTTT')
else:
canvas.drawString(10, 150, 'This should be in')
canvas.drawString(10, 100, 'LettErrorRobot-Chrome')
canvas.save()
-------------------------------

--
Duncan Booth http://kupuguy.blogspot.com

Sep 11 '08 #15

Duncan Booth

Duncan Booth <du**********@invalid.invalidwrote:

I may have made some blindingly obvious beginners mistake

I made the blindingly stupid beginners mistake of cleaning up the code
before posting it and breaking it in the process. The 'if' should of
course say:
if len(sys.argv) 1:

However my original test was done by toggling commented out lines of
code so my conclusion remains the same: the only differences between the
output are the creation date, the page stream, and the digest.

--
Duncan Booth http://kupuguy.blogspot.com

Sep 11 '08 #16

Tim Roberts

Duncan Booth <du**********@invalid.invalidwrote:

>
The not too scientific test I did was to copy the font embedding example
from the Reportlab documentation, modify it enough to make it actually
run, and then change the output to have only one glyph. The resulting
PDF is virtually identical. I'm not a reportlab expert though so I may
have made some blindingly obvious beginners mistake (or maybe it only
subsets fonts over a certain size or glyphs outside the ascii range?).

---------- rlab.py ------------
import os, sys
import reportlab
folder = os.path.dirname(reportlab.__file__) + os.sep + 'fonts'
afmFile = os.path.join(folder, 'LeERC___.AFM')
pfbFile = os.path.join(folder, 'LeERC___.PFB')
from reportlab.pdfbase import pdfmetrics
justFace = pdfmetrics.EmbeddedType1Face(afmFile, pfbFile)
faceName = 'LettErrorRobot-Chrome' # pulled from AFM file
pdfmetrics.registerTypeFace(justFace)
justFont = pdfmetrics.Font('LettErrorRobot-Chrome',faceName,'WinAnsiEncoding')

OK, look the other way while I backpedal furiously. The conversation on
the mailing last year was focused on TrueType fonts. Those are subsetted.

EmbeddedType1Face, used for Type 1 fonts, does appear to embed the entire
font.
--
Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.

Sep 13 '08 #17

by: Andreas Jung | last post by:

Reportlab has some problems with creating PDFs from UTF8 encoded text. For this reason they are using a truetype font rina.ttf which looks *very ugly*. Does anyone know of a suitable free...

Python

Unicode perplex

by: John Roth | last post by:

I've got an interesting little problem that I can't find an answer to after hunting through the doc (2.3.3). I've got a string that contains something that kind of resembles an HTML document. On...

Python

Need help importing/installing reportlab module

by: Active8 | last post by:

I did this once and can't remember how <blush> so I read the reportlab user guid. It says to unzip the reportlab archive - this is on w2k, BTW, with Python23 - to a directory and make a file...

Python

Unicode font in international application

by: Joerg | last post by:

I am in the process of creating an international GUI application with C# on ..NET1.1 (Win2k), which is supposed to implement a particular look/design. In order to achieve this, I plan amongst...

C# / C Sharp

Q: a simple(?) raw-utf-8 conversion to internal type unicode "\304\246\311\231\316\257\316\271\303\222"

by: NevilleDNZ | last post by:

Hi, Apologies first as I am not a unicode expert.... indeed I the details probably totally elude me. Not withstanding: how can I convert a binary string containing UTF-8 binary into a python...

Python

[MFC] CRichEditCtrl how to set codepage for Unicode?

by: =?Utf-8?B?S2V2aW4gVGFuZw==?= | last post by:

In MFC, CRichEditCtrl contrl, I want to set the codepage for the control to Unicode. I used the following method to set codepage for it (only for ANSI or BIG5, etc, not unicode). How should I...

.NET Framework

Universal String (4 Byte Canonical Encoding) and UTF-32

by: Jeffrey Walton | last post by:

Hi All, BMP Strings are a subset of Universal Strings.The BMP string uses approximately 65,000 code points from Universal String encoding. BMP Strings: ISO/IEC 10646, 2-octet canonical form,...

C# / C Sharp

LANG, locale, unicode, setup.py and Debian packaging

by: Donn Ingle | last post by:

Hello, I hope someone can illuminate this situation for me. Here's the nutshell: 1. On start I call locale.setlocale(locale.LC_ALL,''), the getlocale. 2. If this returns "C" or anything...

Python

Re: universal unicode font for reportlab

by: jonathon | last post by:

On Mon, Sep 8, 2008 at 01:51, Laszlo Nagy wrote: Use Code2000 http://www.code2000.net/ xan jonathon

Python

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

universal unicode font for reportlab

Similar topics