By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
457,728 Members | 1,150 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 457,728 IT Pros & Developers. It's quick & easy.

Can Python fix vcard files?

P: n/a
KDE's Kontact PIM breaks quoted-printable vcard files because it
linebreaks in the middle of a word. Take this text for example:
NOTE;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=D7=A9=D7=95=D7=A8=D7=94 =D7=A
8=D7=90=D7=A9=D7=95=D7=A0=D7=94.\n=D7=94=D7=A9=D7= 95=D7=A8=D7=94 =D7=94=D7=
A9=D7=A0=D7=99=D7=94 =D7=9B=D7=\n

The whole thing should be on one line, and the spaces at the beginning
of each line shouldn't be there at all. I have a directory with 422
files corrupted like this.

Can Python go through a directory of files and replace each instance
of "newline-space" with nothing? The system is Ubuntu 8.04 with KDE if
it matters. Thanks.

--
Dotan Cohen

http://what-is-what.com
http://gibberish.co.il
א-ב-ג-ד-ה-ו-ז-ח-ט-י-ך-כ-ל-ם-מ-ן-*-ס-ע-ף-פ-ץ-צ-ק-ר-ש-ת

ä-ö-ü-ß-Ä-Ö-Ü
Oct 14 '08 #1
Share this Question
Share on Google+
8 Replies


P: n/a
On 14 Okt, 02:31, "Dotan Cohen" <dotanco...@gmail.comwrote:
KDE's Kontact PIM breaks quoted-printable vcard files because it
linebreaks in the middle of a word. Take this text for example:
NOTE;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=D7=A9=D7=95=D7=A8=D7=94 =D7=A
*8=D7=90=D7=A9=D7=95=D7=A0=D7=94.\n=D7=94=D7=A9=D7 =95=D7=A8=D7=94 =D7=94=D7=
*A9=D7=A0=D7=99=D7=94 =D7=9B=D7=\n

The whole thing should be on one line, and the spaces at the beginning
of each line shouldn't be there at all. I have a directory with 422
files corrupted like this.
Although I think it's "rude" to break quoted-printable characters in
the middle (as seen above), isn't it permitted by the specification to
wrap lines to a predetermined length? It's been a while since I looked
at the specification, but this is one of the things that
implementations have to be able to handle.
Can Python go through a directory of files and replace each instance
of "newline-space" with nothing? The system is Ubuntu 8.04 with KDE if
it matters. Thanks.
You should file a bug against Kontact: the KDE developers love fixing
bugs, especially in their old work. ;-)

Paul
Oct 14 '08 #2

P: n/a
2008/10/14 Paul Boddie <pa**@boddie.org.uk>:
>Can Python go through a directory of files and replace each instance
of "newline-space" with nothing? The system is Ubuntu 8.04 with KDE if
it matters. Thanks.

You should file a bug against Kontact: the KDE developers love fixing
bugs, especially in their old work. ;-)
I had to reopen an old bug on this:
https://bugs.kde.org/show_bug.cgi?id=68350

I would really appreciate it if the knowledgeable folks here would
chime in on that bug. Thanks!

--
Dotan Cohen

http://what-is-what.com
http://gibberish.co.il
א-ב-ג-ד-ה-ו-ז-ח-ט-י-ך-כ-ל-ם-מ-ן-*-ס-ע-ף-פ-ץ-צ-ק-ר-ש-ת

ä-ö-ü-ß-Ä-Ö-Ü
Oct 14 '08 #3

P: n/a
2008/10/14 Paul Boddie <pa**@boddie.org.uk>:
>Can Python go through a directory of files and replace each instance
of "newline-space" with nothing? The system is Ubuntu 8.04 with KDE if
it matters. Thanks.

You should file a bug against Kontact: the KDE developers love fixing
bugs, especially in their old work. ;-)
I had to reopen an old bug on this:
https://bugs.kde.org/show_bug.cgi?id=68350

I would really appreciate it if the knowledgeable folks here would
chime in on that bug. Thanks!

--
Dotan Cohen

http://what-is-what.com
http://gibberish.co.il
א-ב-ג-ד-ה-ו-ז-ח-ט-י-ך-כ-ל-ם-מ-ן-*-ס-ע-ף-פ-ץ-צ-ק-ר-ש-ת

ä-ö-ü-ß-Ä-Ö-Ü
Oct 14 '08 #4

P: n/a
On 14 Okt, 13:06, Paul Boddie <p...@boddie.org.ukwrote:
On 14 Okt, 02:31, "Dotan Cohen" <dotanco...@gmail.comwrote:
KDE's Kontact PIM breaks quoted-printable vcard files because it
linebreaks in the middle of a word. Take this text for example:
NOTE;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=D7=A9=D7=95=D7=A8=D7=94 =D7=A
*8=D7=90=D7=A9=D7=95=D7=A0=D7=94.\n=D7=94=D7=A9=D7 =95=D7=A8=D7=94 =D7=94=D7=
*A9=D7=A0=D7=99=D7=94 =D7=9B=D7=\n
[...]
Although I think it's "rude" to break quoted-printable characters in
the middle (as seen above), isn't it permitted by the specification to
wrap lines to a predetermined length? It's been a while since I looked
at the specification, but this is one of the things that
implementations have to be able to handle.
The vCard specification (RFC 2426 [1]) refers to RFC 2425 [2], which
says this in section 5.8.1:

A logical line MAY be continued on the next physical line anywhere
between two characters by inserting a CRLF immediately followed by a
single white space character (space, ASCII decimal 32, or horizontal
tab, ASCII decimal 9).

This is like the iCalendar specification (RFC 2445 [3]), section 4.1:

Lines of text SHOULD NOT be longer than 75 octets, excluding the
line
break. Long content lines SHOULD be split into a multiple line
representations using a line "folding" technique. That is, a long
line can be split between any two characters by inserting a CRLF
immediately followed by a single linear white space character (i.e.,
SPACE, US-ASCII decimal 32 or HTAB, US-ASCII decimal 9).

I didn't find anything which forbids splitting quoted-printable
character values in these specifications.

Paul

[1] http://www.ietf.org/rfc/rfc2426.txt
[2] http://www.ietf.org/rfc/rfc2425.txt
[3] http://www.ietf.org/rfc/rfc2445.txt
Oct 14 '08 #5

P: n/a
In message
<1d**********************************@r66g2000hsg. googlegroups.com>, Paul
Boddie wrote:
The vCard specification (RFC 2426 [1]) refers to RFC 2425 [2], which
says this in section 5.8.1:

A logical line MAY be continued on the next physical line anywhere
between two characters by inserting a CRLF immediately followed by a
single white space character (space, ASCII decimal 32, or horizontal
tab, ASCII decimal 9).

I didn't find anything which forbids splitting quoted-printable
character values in these specifications.
What adds to the confusion is that quoted-printable has its own convention
for soft-wrapping long lines, using an equals sign followed by a newline.
Oct 15 '08 #6

P: n/a
On 15 Okt, 06:40, Lawrence D'Oliveiro <l...@geek-
central.gen.new_zealandwrote:
In message <1ddce4a8-e11c-4b06-9859-32d69407e...@r66g2000hsg.googlegroups.com>, Paul Boddie wrote:
I didn't find anything which forbids splitting quoted-printable
character values in these specifications.

What adds to the confusion is that quoted-printable has its own convention
for soft-wrapping long lines, using an equals sign followed by a newline.
I think the necessary approach involves interpreting data in the vCard
"content model" before interpreting data in the quoted-printable
"content model". That is, follow the vCard rules around line
formatting to first reconstruct encoded content, then do what you
would normally do with that encoded content. It's a bit like parsing
XML and then attempting to read text from the document's parsed
representation, rather than just matching a particular region with a
regular expression and finding that it yields "&lt;" and "&gt;"
instead of the expected "<" and ">".

Paul
Oct 15 '08 #7

P: n/a
In message <ma**************************************@python.o rg>, Dotan
Cohen wrote:
2008/10/15 Lawrence D'Oliveiro <ld*@geek-central.gen.new_zealand>:
>>
What adds to the confusion is that quoted-printable has its own
convention for soft-wrapping long lines, using an equals sign followed by
a newline.

My test file has newlines not preceded by an equals sign:
As was mentioned upthread by Paul Boddie, the vCard spec has its own
convention for continuing a value across multiple lines. Provided you stick
to that, you should be OK.
Oct 15 '08 #8

P: n/a
2008/10/15 Lawrence D'Oliveiro <ld*@geek-central.gen.new_zealand>:

Thanks. The RFC pages for vcard (http://www.ietf.org/rfc/rfc2426.txt
and http://www.ietf.org/rfc/rfc2425.txt) are very difficult for me to
read. I'm using the test file to learn, and I will work out the kinks
on other files that I come across. This is for personal use, not
production, so I can be sloppy :)

--
Dotan Cohen

http://what-is-what.com
http://gibberish.co.il
א-ב-ג-ד-ה-ו-ז-ח-ט-י-ך-כ-ל-ם-מ-ן-*-ס-ע-ף-פ-ץ-צ-ק-ר-ש-ת

ä-ö-ü-ß-Ä-Ö-Ü
Oct 15 '08 #9

This discussion thread is closed

Replies have been disabled for this discussion.