472,378 Members | 1,648 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,378 software developers and data experts.

Can Python fix vcard files?

KDE's Kontact PIM breaks quoted-printable vcard files because it
linebreaks in the middle of a word. Take this text for example:
NOTE;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=D7=A9=D7=95=D7=A8=D7=94 =D7=A
8=D7=90=D7=A9=D7=95=D7=A0=D7=94.\n=D7=94=D7=A9=D7= 95=D7=A8=D7=94 =D7=94=D7=
A9=D7=A0=D7=99=D7=94 =D7=9B=D7=\n

The whole thing should be on one line, and the spaces at the beginning
of each line shouldn't be there at all. I have a directory with 422
files corrupted like this.

Can Python go through a directory of files and replace each instance
of "newline-space" with nothing? The system is Ubuntu 8.04 with KDE if
it matters. Thanks.

--
Dotan Cohen

http://what-is-what.com
http://gibberish.co.il
א-ב-ג-ד-ה-ו-ז-ח-ט-י-ך-כ-ל-ם-מ-ן-*-ס-ע-ף-פ-ץ-צ-ק-ר-ש-ת

ä-ö-ü-ß-Ä-Ö-Ü
Oct 14 '08 #1
8 1894
On 14 Okt, 02:31, "Dotan Cohen" <dotanco...@gmail.comwrote:
KDE's Kontact PIM breaks quoted-printable vcard files because it
linebreaks in the middle of a word. Take this text for example:
NOTE;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=D7=A9=D7=95=D7=A8=D7=94 =D7=A
*8=D7=90=D7=A9=D7=95=D7=A0=D7=94.\n=D7=94=D7=A9=D7 =95=D7=A8=D7=94 =D7=94=D7=
*A9=D7=A0=D7=99=D7=94 =D7=9B=D7=\n

The whole thing should be on one line, and the spaces at the beginning
of each line shouldn't be there at all. I have a directory with 422
files corrupted like this.
Although I think it's "rude" to break quoted-printable characters in
the middle (as seen above), isn't it permitted by the specification to
wrap lines to a predetermined length? It's been a while since I looked
at the specification, but this is one of the things that
implementations have to be able to handle.
Can Python go through a directory of files and replace each instance
of "newline-space" with nothing? The system is Ubuntu 8.04 with KDE if
it matters. Thanks.
You should file a bug against Kontact: the KDE developers love fixing
bugs, especially in their old work. ;-)

Paul
Oct 14 '08 #2
2008/10/14 Paul Boddie <pa**@boddie.org.uk>:
>Can Python go through a directory of files and replace each instance
of "newline-space" with nothing? The system is Ubuntu 8.04 with KDE if
it matters. Thanks.

You should file a bug against Kontact: the KDE developers love fixing
bugs, especially in their old work. ;-)
I had to reopen an old bug on this:
https://bugs.kde.org/show_bug.cgi?id=68350

I would really appreciate it if the knowledgeable folks here would
chime in on that bug. Thanks!

--
Dotan Cohen

http://what-is-what.com
http://gibberish.co.il
א-ב-ג-ד-ה-ו-ז-ח-ט-י-ך-כ-ל-ם-מ-ן-*-ס-ע-ף-פ-ץ-צ-ק-ר-ש-ת

ä-ö-ü-ß-Ä-Ö-Ü
Oct 14 '08 #3
2008/10/14 Paul Boddie <pa**@boddie.org.uk>:
>Can Python go through a directory of files and replace each instance
of "newline-space" with nothing? The system is Ubuntu 8.04 with KDE if
it matters. Thanks.

You should file a bug against Kontact: the KDE developers love fixing
bugs, especially in their old work. ;-)
I had to reopen an old bug on this:
https://bugs.kde.org/show_bug.cgi?id=68350

I would really appreciate it if the knowledgeable folks here would
chime in on that bug. Thanks!

--
Dotan Cohen

http://what-is-what.com
http://gibberish.co.il
א-ב-ג-ד-ה-ו-ז-ח-ט-י-ך-כ-ל-ם-מ-ן-*-ס-ע-ף-פ-ץ-צ-ק-ר-ש-ת

ä-ö-ü-ß-Ä-Ö-Ü
Oct 14 '08 #4
On 14 Okt, 13:06, Paul Boddie <p...@boddie.org.ukwrote:
On 14 Okt, 02:31, "Dotan Cohen" <dotanco...@gmail.comwrote:
KDE's Kontact PIM breaks quoted-printable vcard files because it
linebreaks in the middle of a word. Take this text for example:
NOTE;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=D7=A9=D7=95=D7=A8=D7=94 =D7=A
*8=D7=90=D7=A9=D7=95=D7=A0=D7=94.\n=D7=94=D7=A9=D7 =95=D7=A8=D7=94 =D7=94=D7=
*A9=D7=A0=D7=99=D7=94 =D7=9B=D7=\n
[...]
Although I think it's "rude" to break quoted-printable characters in
the middle (as seen above), isn't it permitted by the specification to
wrap lines to a predetermined length? It's been a while since I looked
at the specification, but this is one of the things that
implementations have to be able to handle.
The vCard specification (RFC 2426 [1]) refers to RFC 2425 [2], which
says this in section 5.8.1:

A logical line MAY be continued on the next physical line anywhere
between two characters by inserting a CRLF immediately followed by a
single white space character (space, ASCII decimal 32, or horizontal
tab, ASCII decimal 9).

This is like the iCalendar specification (RFC 2445 [3]), section 4.1:

Lines of text SHOULD NOT be longer than 75 octets, excluding the
line
break. Long content lines SHOULD be split into a multiple line
representations using a line "folding" technique. That is, a long
line can be split between any two characters by inserting a CRLF
immediately followed by a single linear white space character (i.e.,
SPACE, US-ASCII decimal 32 or HTAB, US-ASCII decimal 9).

I didn't find anything which forbids splitting quoted-printable
character values in these specifications.

Paul

[1] http://www.ietf.org/rfc/rfc2426.txt
[2] http://www.ietf.org/rfc/rfc2425.txt
[3] http://www.ietf.org/rfc/rfc2445.txt
Oct 14 '08 #5
In message
<1d**********************************@r66g2000hsg. googlegroups.com>, Paul
Boddie wrote:
The vCard specification (RFC 2426 [1]) refers to RFC 2425 [2], which
says this in section 5.8.1:

A logical line MAY be continued on the next physical line anywhere
between two characters by inserting a CRLF immediately followed by a
single white space character (space, ASCII decimal 32, or horizontal
tab, ASCII decimal 9).

I didn't find anything which forbids splitting quoted-printable
character values in these specifications.
What adds to the confusion is that quoted-printable has its own convention
for soft-wrapping long lines, using an equals sign followed by a newline.
Oct 15 '08 #6
On 15 Okt, 06:40, Lawrence D'Oliveiro <l...@geek-
central.gen.new_zealandwrote:
In message <1ddce4a8-e11c-4b06-9859-32d69407e...@r66g2000hsg.googlegroups.com>, Paul Boddie wrote:
I didn't find anything which forbids splitting quoted-printable
character values in these specifications.

What adds to the confusion is that quoted-printable has its own convention
for soft-wrapping long lines, using an equals sign followed by a newline.
I think the necessary approach involves interpreting data in the vCard
"content model" before interpreting data in the quoted-printable
"content model". That is, follow the vCard rules around line
formatting to first reconstruct encoded content, then do what you
would normally do with that encoded content. It's a bit like parsing
XML and then attempting to read text from the document's parsed
representation, rather than just matching a particular region with a
regular expression and finding that it yields "&lt;" and "&gt;"
instead of the expected "<" and ">".

Paul
Oct 15 '08 #7
In message <ma**************************************@python.o rg>, Dotan
Cohen wrote:
2008/10/15 Lawrence D'Oliveiro <ld*@geek-central.gen.new_zealand>:
>>
What adds to the confusion is that quoted-printable has its own
convention for soft-wrapping long lines, using an equals sign followed by
a newline.

My test file has newlines not preceded by an equals sign:
As was mentioned upthread by Paul Boddie, the vCard spec has its own
convention for continuing a value across multiple lines. Provided you stick
to that, you should be OK.
Oct 15 '08 #8
2008/10/15 Lawrence D'Oliveiro <ld*@geek-central.gen.new_zealand>:

Thanks. The RFC pages for vcard (http://www.ietf.org/rfc/rfc2426.txt
and http://www.ietf.org/rfc/rfc2425.txt) are very difficult for me to
read. I'm using the test file to learn, and I will work out the kinks
on other files that I come across. This is for personal use, not
production, so I can be sloppy :)

--
Dotan Cohen

http://what-is-what.com
http://gibberish.co.il
א-ב-ג-ד-ה-ו-ז-ח-ט-י-ך-כ-ל-ם-מ-ן-*-ס-ע-ף-פ-ץ-צ-ק-ר-ש-ת

ä-ö-ü-ß-Ä-Ö-Ü
Oct 15 '08 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Brad Tilley | last post by:
Can Python parse a mbox file and forward each individual message within that file to someone else? For example, let's say I have a 10MB mbox file that has 678 messages. I'd like to send each of...
2
by: RickMuller | last post by:
I really appreciate the ease that the distutils make distributing Python modules. However, I have a question about using them to distribute non-Python (i.e. text) data files that support Python...
2
by: koolest1 | last post by:
Im working with Python 2.2 on my red hat linux system. Is there any way to write python codes in separate files and save them so that i can view/edit them in the future? Actually I've just started...
0
by: rapier71 | last post by:
I uninstalled a program I thought was a game from my compaq desktop computer. It was called python. Is it important and how do I fix it if it is?
17
by: Sunburned Surveyor | last post by:
I was thinking of a way I could make writing Python Class Files a little less painful. I was considering a Ptyhon script that read a file with a list of property names and method names and then...
4
by: TkNeo | last post by:
I am trying to upgrade from python 2.3 to 2.4 but not all machines can be upgraded. Can you guys tell me if this scenario is possible. 1. Any machine that uses .py files that use libraries that...
0
by: Tommy Nordgren | last post by:
On Oct 14, 2008, at 3:23 AM, Dotan Cohen wrote: Sure! all the bytes in multibyte UTF-8 characters are above 128 in value. Thus, they won't be replaced. ----------------------------------...
2
by: Kemmylinns12 | last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and efficiency. While initially associated with cryptocurrencies...
0
by: antdb | last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine In the overall architecture, a new "hyper-convergence" concept was proposed, which integrated multiple engines and...
0
hi
by: WisdomUfot | last post by:
It's an interesting question you've got about how Gmail hides the HTTP referrer when a link in an email is clicked. While I don't have the specific technical details, Gmail likely implements measures...
0
Oralloy
by: Oralloy | last post by:
Hello Folks, I am trying to hook up a CPU which I designed using SystemC to I/O pins on an FPGA. My problem (spelled failure) is with the synthesis of my design into a bitstream, not the C++...
0
by: Carina712 | last post by:
Setting background colors for Excel documents can help to improve the visual appeal of the document and make it easier to read and understand. Background colors can be used to highlight important...
1
by: Johno34 | last post by:
I have this click event on my form. It speaks to a Datasheet Subform Private Sub Command260_Click() Dim r As DAO.Recordset Set r = Form_frmABCD.Form.RecordsetClone r.MoveFirst Do If...
0
by: jack2019x | last post by:
hello, Is there code or static lib for hook swapchain present? I wanna hook dxgi swapchain present for dx11 and dx9.
0
DizelArs
by: DizelArs | last post by:
Hi all) Faced with a problem, element.click() event doesn't work in Safari browser. Tried various tricks like emulating touch event through a function: let clickEvent = new Event('click', {...
0
by: F22F35 | last post by:
I am a newbie to Access (most programming for that matter). I need help in creating an Access database that keeps the history of each user in a database. For example, a user might have lesson 1 sent...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.