473,396 Members | 1,816 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

UTF-8 characters in doctest

Hello,
I have problems with running doctests if I use czech national
characters in UTF-8 encoding.

I have Python script, which begin with encoding definition:

# -*- coding: utf-8 -*-

I have this function with doctest:

def get_inventary_number(block):
"""
>>t = u'''28. České královské insignie
... mědirytina, grafika je zcela vyřezána z papíru - max.
rozměr
... 420×582 neznačeno
... text: opis v levém medailonu: CAROL VI IMP.ELIS.CHR. AVG.
P.P.'''
>>get_inventary_number(t)
(u'nezna\xc4\x8deno', u'28. \xc4\x8cesk\xc3\xa9 kr\xc3\xa1lovsk
\xc3\xa9 insignie\nm\xc4\x9bdirytina, grafika je zcela vy\xc5\x99ez
\xc3\xa1na z pap\xc3\xadru \xe2\x80\x93 max. rozm\xc4\x9br
\n420\xc3\x97582 \ntext: opis v lev\xc3\xa9m medailonu: CAROL VI
IMP.ELIS.CHR. AVG. P.P.')
"""
m = RE_INVENTARNI_CISLO.search(block)
if m: return m.group(1), block.replace(m.group(0), '')
else: return None, block

After running doctest.testmod() I get this error message:

File "vizovice_03.py", line 417, in ?
doctest.testmod()
File "/usr/local/lib/python2.4/doctest.py", line 1841, in testmod
for test in finder.find(m, name, globs=globs,
extraglobs=extraglobs):
File "/usr/local/lib/python2.4/doctest.py", line 851, in find
self._find(tests, obj, name, module, source_lines, globs, {})
File "/usr/local/lib/python2.4/doctest.py", line 910, in _find
globs, seen)
File "/usr/local/lib/python2.4/doctest.py", line 895, in _find
test = self._get_test(obj, name, module, globs, source_lines)
File "/usr/local/lib/python2.4/doctest.py", line 985, in _get_test
filename, lineno)
File "/usr/local/lib/python2.4/doctest.py", line 602, in get_doctest
return DocTest(self.get_examples(string, name), globs,
File "/usr/local/lib/python2.4/doctest.py", line 616, in
get_examples
return [x for x in self.parse(string, name)
File "/usr/local/lib/python2.4/doctest.py", line 577, in parse
(source, options, want, exc_msg) = \
File "/usr/local/lib/python2.4/doctest.py", line 648, in
_parse_example
lineno + len(source_lines))
File "/usr/local/lib/python2.4/doctest.py", line 732, in
_check_prefix
raise ValueError('line %r of the docstring for %s has '
ValueError: line 17 of the docstring for __main__.get_inventary_number
has inconsistent leading whitespace: 'm\xc4\x9bdirytina, grafika je
zcela vy\xc5\x99ez\xc3\xa1na z pap\xc3\xadru \xe2\x80\x93 max. rozm
\xc4\x9br'

I try to fill expected output in docstring according to output from
Python shell, from doctest (if I bypass it in docstring, doctest says
me what he expect and what it get), I try to set variable t as t='some
text' together t=u'some unicode text'. But everything fails.

So my question is: Is it possible to run doctests with UTF-8
characters? And if your answer will be YES, tell me please how...

Thank you for any advice.
Regards
Michal

Sep 19 '07 #1
6 2433
Bzyczek wrote:
So my question is: Is it possible to run doctests with UTF-8
characters? And if your answer will be YES, tell me please how...
Use raw strings in combination with explicit decoding and a little
try-and-error. E. g. this little gem passes ;)

# -*- coding: utf8 -*-
r"""
>>f("äöü".decode("utf8"))
(u'\xe4\xf6\xfc',)
"""
def f(s):
return (s,)

if __name__ == "__main__":
import doctest
doctest.testmod()

Peter
Sep 19 '07 #2
Peter Otten <__*******@web.dewrites:
[...]
# -*- coding: utf8 -*-
r"""
>>>f("äöü".decode("utf8"))
(u'\xe4\xf6\xfc',)
"""
def f(s):
return (s,)
Forgive me if this is a stupid question, but: What purpose does
function f serve?
John
Sep 20 '07 #3
John J. Lee wrote:
Peter Otten <__*******@web.dewrites:
[...]
>def f(s):
return (s,)

Forgive me if this is a stupid question, but: What purpose does
function f serve?
John
Well, it has nothing to do with the unicode bit that came before it. It
just takes an argument, and wraps it in a 1-tuple. Guessing by the
argument of "s", that argument is expected to be a string.

One use I can think of is that sometimes you'll find a function that
returns a string or a list or tuple of strings. If you want to pass that
result on to a for loop, and only loop once on the string (instead of
looping on each letter of the string), you might want to wrap it in a
tuple or a list before passing it to the loop.

Cheers,
Cliff
Sep 21 '07 #4
J. Cliff Dyer wrote:
John J. Lee wrote:
>Peter Otten <__*******@web.dewrites:
[...]

>>def f(s):
return (s,)

Forgive me if this is a stupid question, but: What purpose does
function f serve?
John


Well, it has nothing to do with the unicode bit that came before it. It
just takes an argument, and wraps it in a 1-tuple. Guessing by the
argument of "s", that argument is expected to be a string.

One use I can think of is that sometimes you'll find a function that
returns a string or a list or tuple of strings. If you want to pass that
result on to a for loop, and only loop once on the string (instead of
looping on each letter of the string), you might want to wrap it in a
tuple or a list before passing it to the loop.

Cheers,
Cliff
(replying to my own post)

Sorry. Itchy trigger finger and tired brain. I didn't read the whole
context of the thread. Dunno what it's doing here. Forcing __repr__ to
be called on a print statement? Funny way to do that. Like I said, I
don't know, so I'll leave it to someone else to say.

Cheers,
Cliff
Sep 21 '07 #5
John J. Lee wrote:
Peter Otten <__*******@web.dewrites:
[...]
># -*- coding: utf8 -*-
r"""
>>>>f("äöü".decode("utf8"))
(u'\xe4\xf6\xfc',)
"""
def f(s):
return (s,)

Forgive me if this is a stupid question, but: What purpose does
function f serve?
Like the OP's get_inventary_number() it takes a unicode string and
returns a tuple of unicode strings. I'ts pointless otherwise. I hoped I
had stripped down his code to a point where the analogy was still
recognizable.

Peter
Sep 21 '07 #6
Peter Otten <__*******@web.dewrites:
[...]
>Forgive me if this is a stupid question, but: What purpose does
function f serve?

Like the OP's get_inventary_number() it takes a unicode string and
returns a tuple of unicode strings. I'ts pointless otherwise. I hoped I
had stripped down his code to a point where the analogy was still
recognizable.
Ah, right.
John
Sep 22 '07 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

27
by: EU citizen | last post by:
Do web pages have to be created in unicode in order to use UTF-8 encoding? If so, can anyone name a free application which I can use under Windows 98 to create web pages?
38
by: Haines Brown | last post by:
I'm having trouble finding the character entity for the French abbreviation for "number" (capital N followed by a small supercript o, period). My references are not listing it. Where would I...
7
by: Philipp Lenssen | last post by:
How do I load and save a UTF-8 document in XML in ASP/VBS? Well, the loading* is not the problem actually -- the file is in UTF-8, and understood correctly -- but once saved, the UTF-8 is...
1
by: stevelooking41 | last post by:
Can someone explain why I don't seem unable to use document.write to produce a valid UTF-8 none breaking space sequence (Hex: C2A0) ? I've tried everyway I've been able to find to tell the...
6
by: jmgonet | last post by:
Hello everybody, I'm having troubles loading a Xml string encoded in UTF-8. If I try this code: ------------------------------ XmlDocument doc=new XmlDocument(); String s="<?xml...
1
by: David Bertoni | last post by:
Hi all, I'm trying to resolve what appears to me an inconsistency in the XML 1.0 recommendation involving entities encoding in UTF-16 and the requirement for a byte order mark. Section 4.3.3...
7
by: Jimmy Shaw | last post by:
Hi everybody, Is there any SIMPLE way to convert from UTF-16 to UTF-32? I may be mixed up, but is it possible that all UTF-16 "code points" that are 16 bits long appear just the same in UTF-32,...
23
by: Allan Ebdrup | last post by:
I hava an ajax web application where i hvae problems with UTF-8 encoding oc chineese chars. My Ajax webapplication runs in a HTML page that is UTF-8 Encoded. I copy and paste some chineese chars...
35
by: Bjoern Hoehrmann | last post by:
Hi, For a free software project, I had to write a routine that, given a Unicode scalar value U+0000 - U+10FFFF, returns an integer that holds the UTF-8 encoded form of it, for example, U+00F6...
4
by: =?ISO-8859-2?Q?Boris_Du=B9ek?= | last post by:
Hi, I have an API that returns UTF-8 encoded strings. I have a utf8 codevt facet available to do the conversion from UTF-8 to wchar_t encoding defined by the platform. I have no trouble...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.