472,952 Members | 2,006 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,952 software developers and data experts.

UTF-8 characters in doctest

Hello,
I have problems with running doctests if I use czech national
characters in UTF-8 encoding.

I have Python script, which begin with encoding definition:

# -*- coding: utf-8 -*-

I have this function with doctest:

def get_inventary_number(block):
"""
>>t = u'''28. České královské insignie
... mědirytina, grafika je zcela vyřezána z papíru - max.
rozměr
... 420×582 neznačeno
... text: opis v levém medailonu: CAROL VI IMP.ELIS.CHR. AVG.
P.P.'''
>>get_inventary_number(t)
(u'nezna\xc4\x8deno', u'28. \xc4\x8cesk\xc3\xa9 kr\xc3\xa1lovsk
\xc3\xa9 insignie\nm\xc4\x9bdirytina, grafika je zcela vy\xc5\x99ez
\xc3\xa1na z pap\xc3\xadru \xe2\x80\x93 max. rozm\xc4\x9br
\n420\xc3\x97582 \ntext: opis v lev\xc3\xa9m medailonu: CAROL VI
IMP.ELIS.CHR. AVG. P.P.')
"""
m = RE_INVENTARNI_CISLO.search(block)
if m: return m.group(1), block.replace(m.group(0), '')
else: return None, block

After running doctest.testmod() I get this error message:

File "vizovice_03.py", line 417, in ?
doctest.testmod()
File "/usr/local/lib/python2.4/doctest.py", line 1841, in testmod
for test in finder.find(m, name, globs=globs,
extraglobs=extraglobs):
File "/usr/local/lib/python2.4/doctest.py", line 851, in find
self._find(tests, obj, name, module, source_lines, globs, {})
File "/usr/local/lib/python2.4/doctest.py", line 910, in _find
globs, seen)
File "/usr/local/lib/python2.4/doctest.py", line 895, in _find
test = self._get_test(obj, name, module, globs, source_lines)
File "/usr/local/lib/python2.4/doctest.py", line 985, in _get_test
filename, lineno)
File "/usr/local/lib/python2.4/doctest.py", line 602, in get_doctest
return DocTest(self.get_examples(string, name), globs,
File "/usr/local/lib/python2.4/doctest.py", line 616, in
get_examples
return [x for x in self.parse(string, name)
File "/usr/local/lib/python2.4/doctest.py", line 577, in parse
(source, options, want, exc_msg) = \
File "/usr/local/lib/python2.4/doctest.py", line 648, in
_parse_example
lineno + len(source_lines))
File "/usr/local/lib/python2.4/doctest.py", line 732, in
_check_prefix
raise ValueError('line %r of the docstring for %s has '
ValueError: line 17 of the docstring for __main__.get_inventary_number
has inconsistent leading whitespace: 'm\xc4\x9bdirytina, grafika je
zcela vy\xc5\x99ez\xc3\xa1na z pap\xc3\xadru \xe2\x80\x93 max. rozm
\xc4\x9br'

I try to fill expected output in docstring according to output from
Python shell, from doctest (if I bypass it in docstring, doctest says
me what he expect and what it get), I try to set variable t as t='some
text' together t=u'some unicode text'. But everything fails.

So my question is: Is it possible to run doctests with UTF-8
characters? And if your answer will be YES, tell me please how...

Thank you for any advice.
Regards
Michal

Sep 19 '07 #1
6 2407
Bzyczek wrote:
So my question is: Is it possible to run doctests with UTF-8
characters? And if your answer will be YES, tell me please how...
Use raw strings in combination with explicit decoding and a little
try-and-error. E. g. this little gem passes ;)

# -*- coding: utf8 -*-
r"""
>>f("äöü".decode("utf8"))
(u'\xe4\xf6\xfc',)
"""
def f(s):
return (s,)

if __name__ == "__main__":
import doctest
doctest.testmod()

Peter
Sep 19 '07 #2
Peter Otten <__*******@web.dewrites:
[...]
# -*- coding: utf8 -*-
r"""
>>>f("äöü".decode("utf8"))
(u'\xe4\xf6\xfc',)
"""
def f(s):
return (s,)
Forgive me if this is a stupid question, but: What purpose does
function f serve?
John
Sep 20 '07 #3
John J. Lee wrote:
Peter Otten <__*******@web.dewrites:
[...]
>def f(s):
return (s,)

Forgive me if this is a stupid question, but: What purpose does
function f serve?
John
Well, it has nothing to do with the unicode bit that came before it. It
just takes an argument, and wraps it in a 1-tuple. Guessing by the
argument of "s", that argument is expected to be a string.

One use I can think of is that sometimes you'll find a function that
returns a string or a list or tuple of strings. If you want to pass that
result on to a for loop, and only loop once on the string (instead of
looping on each letter of the string), you might want to wrap it in a
tuple or a list before passing it to the loop.

Cheers,
Cliff
Sep 21 '07 #4
J. Cliff Dyer wrote:
John J. Lee wrote:
>Peter Otten <__*******@web.dewrites:
[...]

>>def f(s):
return (s,)

Forgive me if this is a stupid question, but: What purpose does
function f serve?
John


Well, it has nothing to do with the unicode bit that came before it. It
just takes an argument, and wraps it in a 1-tuple. Guessing by the
argument of "s", that argument is expected to be a string.

One use I can think of is that sometimes you'll find a function that
returns a string or a list or tuple of strings. If you want to pass that
result on to a for loop, and only loop once on the string (instead of
looping on each letter of the string), you might want to wrap it in a
tuple or a list before passing it to the loop.

Cheers,
Cliff
(replying to my own post)

Sorry. Itchy trigger finger and tired brain. I didn't read the whole
context of the thread. Dunno what it's doing here. Forcing __repr__ to
be called on a print statement? Funny way to do that. Like I said, I
don't know, so I'll leave it to someone else to say.

Cheers,
Cliff
Sep 21 '07 #5
John J. Lee wrote:
Peter Otten <__*******@web.dewrites:
[...]
># -*- coding: utf8 -*-
r"""
>>>>f("äöü".decode("utf8"))
(u'\xe4\xf6\xfc',)
"""
def f(s):
return (s,)

Forgive me if this is a stupid question, but: What purpose does
function f serve?
Like the OP's get_inventary_number() it takes a unicode string and
returns a tuple of unicode strings. I'ts pointless otherwise. I hoped I
had stripped down his code to a point where the analogy was still
recognizable.

Peter
Sep 21 '07 #6
Peter Otten <__*******@web.dewrites:
[...]
>Forgive me if this is a stupid question, but: What purpose does
function f serve?

Like the OP's get_inventary_number() it takes a unicode string and
returns a tuple of unicode strings. I'ts pointless otherwise. I hoped I
had stripped down his code to a point where the analogy was still
recognizable.
Ah, right.
John
Sep 22 '07 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

27
by: EU citizen | last post by:
Do web pages have to be created in unicode in order to use UTF-8 encoding? If so, can anyone name a free application which I can use under Windows 98 to create web pages?
38
by: Haines Brown | last post by:
I'm having trouble finding the character entity for the French abbreviation for "number" (capital N followed by a small supercript o, period). My references are not listing it. Where would I...
7
by: Philipp Lenssen | last post by:
How do I load and save a UTF-8 document in XML in ASP/VBS? Well, the loading* is not the problem actually -- the file is in UTF-8, and understood correctly -- but once saved, the UTF-8 is...
1
by: stevelooking41 | last post by:
Can someone explain why I don't seem unable to use document.write to produce a valid UTF-8 none breaking space sequence (Hex: C2A0) ? I've tried everyway I've been able to find to tell the...
6
by: jmgonet | last post by:
Hello everybody, I'm having troubles loading a Xml string encoded in UTF-8. If I try this code: ------------------------------ XmlDocument doc=new XmlDocument(); String s="<?xml...
1
by: David Bertoni | last post by:
Hi all, I'm trying to resolve what appears to me an inconsistency in the XML 1.0 recommendation involving entities encoding in UTF-16 and the requirement for a byte order mark. Section 4.3.3...
7
by: Jimmy Shaw | last post by:
Hi everybody, Is there any SIMPLE way to convert from UTF-16 to UTF-32? I may be mixed up, but is it possible that all UTF-16 "code points" that are 16 bits long appear just the same in UTF-32,...
23
by: Allan Ebdrup | last post by:
I hava an ajax web application where i hvae problems with UTF-8 encoding oc chineese chars. My Ajax webapplication runs in a HTML page that is UTF-8 Encoded. I copy and paste some chineese chars...
35
by: Bjoern Hoehrmann | last post by:
Hi, For a free software project, I had to write a routine that, given a Unicode scalar value U+0000 - U+10FFFF, returns an integer that holds the UTF-8 encoded form of it, for example, U+00F6...
4
by: =?ISO-8859-2?Q?Boris_Du=B9ek?= | last post by:
Hi, I have an API that returns UTF-8 encoded strings. I have a utf8 codevt facet available to do the conversion from UTF-8 to wchar_t encoding defined by the platform. I have no trouble...
0
by: lllomh | last post by:
Define the method first this.state = { buttonBackgroundColor: 'green', isBlinking: false, // A new status is added to identify whether the button is blinking or not } autoStart=()=>{
0
by: Mushico | last post by:
How to calculate date of retirement from date of birth
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 4 Oct 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
0
by: Aliciasmith | last post by:
In an age dominated by smartphones, having a mobile app for your business is no longer an option; it's a necessity. Whether you're a startup or an established enterprise, finding the right mobile app...
0
tracyyun
by: tracyyun | last post by:
Hello everyone, I have a question and would like some advice on network connectivity. I have one computer connected to my router via WiFi, but I have two other computers that I want to be able to...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 1 Nov 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM) Please note that the UK and Europe revert to winter time on...
3
by: nia12 | last post by:
Hi there, I am very new to Access so apologies if any of this is obvious/not clear. I am creating a data collection tool for health care employees to complete. It consists of a number of...
0
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be focusing on the Report (clsReport) class. This simply handles making the calling Form invisible until all of the Reports opened by it have been closed, when it...
0
isladogs
by: isladogs | last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, Mike...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.