Python for Vcard Parsing in UTF16

R Wood

Greetings -

A recent Perl experiment hasn't turned out so well, which has piqued my
interest in Python. The project is this: take a Vcard file exported from
Apple's Addressbook and use a language that is good at parsing text to convert
it into a mutt alias file. There are better ways to use Mutt with Mac's
addressbook, but I want to be able to periodically convert my working
addressbook file into an alias file I can then transfer across all my different
machines - two Macs, two Linux, and one FreeBSD. It's basically a couple of
regexes that look for FN: followed by a name and convert all the words of the
name into a single structure separated by underscores, followed by the email
addresses. You would wind up with

alias Linus_Torvalds Linus Torvalds <lt@linux.com>

To me this was a natural task for Perl. Turns out however, there's a catch.
Apple exports the file in UTF-16 to ensure anyone with Chinese characters in
their addressbook gets a legitimate Vcard file. And of course Perl somewhat
chokes on UTF. I've found several ways to do it that involve complicated
downloads and installations of Perl modules, but that defeats the purpose of
making it simple. In an ideal world you should be able to say "try this cool
script" and be done with it. Once you have to say "go to CPAN, download and
compile this module, then ..." it gets less exciting.

I know nothing about Python except that it interests me and has interested me
since I first learned the Rekall database frontend (Linux) runs on it. I just
ordered Learning Python and if that works out satisfactorily I'm going to go
back for Programming Python. In the meantime, I thought I would pose the
question to this newsgroup: would Python be useful for a parsing exercise like
this one?

Apr 21 '07 #1

Subscribe Reply

3909

Alex Martelli

R Wood <rw***@therandymon.comwrote:
...

alias Linus_Torvalds Linus Torvalds <lt@linux.com>

To me this was a natural task for Perl. Turns out however, there's a catch.
Apple exports the file in UTF-16 to ensure anyone with Chinese characters in
their addressbook gets a legitimate Vcard file. And of course Perl somewhat
chokes on UTF. I've found several ways to do it that involve complicated
downloads and installations of Perl modules, but that defeats the purpose of
making it simple. In an ideal world you should be able to say "try this cool
script" and be done with it. Once you have to say "go to CPAN, download and
compile this module, then ..." it gets less exciting.

I know nothing about Python except that it interests me and has interested me
since I first learned the Rekall database frontend (Linux) runs on it. I just
ordered Learning Python and if that works out satisfactorily I'm going to go
back for Programming Python. In the meantime, I thought I would pose the
question to this newsgroup: would Python be useful for a parsing exercise like
this one?

Sure, Python and Perl (and Ruby) should be equally suitable for the
task, so, if Python appears more suitable by having built-in unicode
capabilities, go for it. I'm a bit uncertain about the UTF-16 export
though; I know some applications do use it (e.g., Microsoft Entourage),
but I thought Apple's Address Book didn't, and, having just tried a
VCard export from mine, it looks quite ASCII to me. Maybe you've set
some kind of preference, or...?
Alex

Apr 22 '07 #2

R Wood

Alex Martelli wrote:

R Wood <rw***@therandymon.comwrote:
...
>alias Linus_Torvalds Linus Torvalds <lt@linux.com>

To me this was a natural task for Perl. Turns out however, there's a
catch. Apple exports the file in UTF-16 to ensure anyone with Chinese
characters in
their addressbook gets a legitimate Vcard file. And of course Perl
somewhat
chokes on UTF.

Sure, Python and Perl (and Ruby) should be equally suitable for the
task, so, if Python appears more suitable by having built-in unicode
capabilities, go for it. I'm a bit uncertain about the UTF-16 export
though; I know some applications do use it (e.g., Microsoft Entourage),
but I thought Apple's Address Book didn't, and, having just tried a
VCard export from mine, it looks quite ASCII to me. Maybe you've set
some kind of preference, or...?
Alex

I did the same thing. Apple's clever. If your addressbook doesn't have any
higher characters, ie nothing but ASCII, it will export your addressbook in
ASCII. But if you have anything else (in my case, Spanish, French, and
Italian) it goes for UTF16. I first thought it was UTF8 but realized since
Apple supports all sorts of Asian languages really well they need UTF16 to
deal with it, and importing the exported file into Jedit using UTF16
encoding confirmed that's what it is.

Apr 22 '07 #3

Adam Atlas

On Apr 21, 7:28 pm, R Wood <r...@therandymon.comwrote:

I know nothing about Python except that it interests me and has interested me
since I first learned the Rekall database frontend (Linux) runs on it. I just
ordered Learning Python and if that works out satisfactorily I'm going to go
back for Programming Python. In the meantime, I thought I would pose the
question to this newsgroup: would Python be useful for a parsing exercise like
this one?

Here's a little function that takes some `str`-type data (i.e. what
you'd get from doing open(...).read()) and, assuming it's a Vcard,
detects its encoding and converts it to a canonical `unicode` object.

def fix_encoding(s):
m = u'BEGIN:VCARD'
for c in ('ascii', 'utf_16_be', 'utf_16_le', 'utf_8'):
try: u = unicode(s, c)
except UnicodeDecodeError: continue
if m in u: return u
return None

Apr 24 '07 #4

Adam Atlas

On Apr 21, 7:28 pm, R Wood <r...@therandymon.comwrote:

To me this was a natural task for Perl. Turns out however, there's a catch.
Apple exports the file in UTF-16 to ensure anyone with Chinese characters in
their addressbook gets a legitimate Vcard file.

Here's a function that, given a `str` containing a vcard in some
encoding, guesses the encoding and returns a canonical representation
as a `unicode` object.

def fix_encoding(s):
m = u'BEGIN:VCARD'
for c in ('ascii', 'utf_16_be', 'utf_16_le', 'utf_8'):
try: u = unicode(s, c)
except UnicodeDecodeError: continue
if m in u: return u
return None

Apr 24 '07 #5

Similar topics

3076

Python vs. Perl

by: Michael McGarry | last post by:

Hi, I am just starting to use Python. Does Python have all the regular expression features of Perl? Is Python missing any features available in Perl? Thanks, Michael

Python

3904

ASP and "vCard"

by: Ed | last post by:

What's the easiest way of developing an address book using ASP that is compatible with vCard? I'd like to be able to do this without 3rd party components. Thanks.

ASP / Active Server Pages

7913

Create Outlook vcard

by: scott | last post by:

Anyone have a link to syntax that could create an outlook vcard file format? all examples i find are for doing it inside outlook with .net. i'm just looking for an ASP solution.

ASP / Active Server Pages

2764

XSLT stylesheet vcard RDF to VCF

by: Jason Karns | last post by:

Does anyone know of any stylesheets out that are already built to transform the RDF vCard format (XML) to regular vCard file format (.vcf)? I've searched google for a while now and haven't found...

.NET Framework

2759

Make VCard?

by: localhost | last post by:

Looking for some sample code to emit a vcard (for Outlook) directly to a web browser after clicking on a button. Thanks.

ASP.NET

3546

manipulating vcard

by: yochessyo | last post by:

Hi, Is there a way in VB.net to create or manipulate vcard? Thank you for your help.

Visual Basic .NET

361

Weekly Python Patch/Bug Summary

by: Kurt B. Kaiser | last post by:

Patch / Bug Summary ___________________ Patches : 378 open ( +3) / 3298 closed (+34) / 3676 total (+37) Bugs : 886 open (-24) / 5926 closed (+75) / 6812 total (+51) RFE : 224 open...

Python

2173

evolution-python

by: eloi-ribeiro.blogspot.com | last post by:

Hi everyone, I would like to use a python script to export (save as...) all my contacts in Evolution to VCard format (.vcf). I am a beginner so I don't knock how to use evolution-python module....

Python

1951

Can Python fix vcard files?

by: Dotan Cohen | last post by:

KDE's Kontact PIM breaks quoted-printable vcard files because it linebreaks in the middle of a word. Take this text for example:...

Python

7260

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

7161

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

7101

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

7525

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

5686

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

4746

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp

3222

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

1596

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

C# / C Sharp

802

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP