473,513 Members | 2,513 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Python for Vcard Parsing in UTF16

Greetings -

A recent Perl experiment hasn't turned out so well, which has piqued my
interest in Python. The project is this: take a Vcard file exported from
Apple's Addressbook and use a language that is good at parsing text to convert
it into a mutt alias file. There are better ways to use Mutt with Mac's
addressbook, but I want to be able to periodically convert my working
addressbook file into an alias file I can then transfer across all my different
machines - two Macs, two Linux, and one FreeBSD. It's basically a couple of
regexes that look for FN: followed by a name and convert all the words of the
name into a single structure separated by underscores, followed by the email
addresses. You would wind up with

alias Linus_Torvalds Linus Torvalds <lt@linux.com>

To me this was a natural task for Perl. Turns out however, there's a catch.
Apple exports the file in UTF-16 to ensure anyone with Chinese characters in
their addressbook gets a legitimate Vcard file. And of course Perl somewhat
chokes on UTF. I've found several ways to do it that involve complicated
downloads and installations of Perl modules, but that defeats the purpose of
making it simple. In an ideal world you should be able to say "try this cool
script" and be done with it. Once you have to say "go to CPAN, download and
compile this module, then ..." it gets less exciting.

I know nothing about Python except that it interests me and has interested me
since I first learned the Rekall database frontend (Linux) runs on it. I just
ordered Learning Python and if that works out satisfactorily I'm going to go
back for Programming Python. In the meantime, I thought I would pose the
question to this newsgroup: would Python be useful for a parsing exercise like
this one?
Apr 21 '07 #1
4 3909
R Wood <rw***@therandymon.comwrote:
...
alias Linus_Torvalds Linus Torvalds <lt@linux.com>

To me this was a natural task for Perl. Turns out however, there's a catch.
Apple exports the file in UTF-16 to ensure anyone with Chinese characters in
their addressbook gets a legitimate Vcard file. And of course Perl somewhat
chokes on UTF. I've found several ways to do it that involve complicated
downloads and installations of Perl modules, but that defeats the purpose of
making it simple. In an ideal world you should be able to say "try this cool
script" and be done with it. Once you have to say "go to CPAN, download and
compile this module, then ..." it gets less exciting.

I know nothing about Python except that it interests me and has interested me
since I first learned the Rekall database frontend (Linux) runs on it. I just
ordered Learning Python and if that works out satisfactorily I'm going to go
back for Programming Python. In the meantime, I thought I would pose the
question to this newsgroup: would Python be useful for a parsing exercise like
this one?
Sure, Python and Perl (and Ruby) should be equally suitable for the
task, so, if Python appears more suitable by having built-in unicode
capabilities, go for it. I'm a bit uncertain about the UTF-16 export
though; I know some applications do use it (e.g., Microsoft Entourage),
but I thought Apple's Address Book didn't, and, having just tried a
VCard export from mine, it looks quite ASCII to me. Maybe you've set
some kind of preference, or...?
Alex
Apr 22 '07 #2
Alex Martelli wrote:
R Wood <rw***@therandymon.comwrote:
...
>alias Linus_Torvalds Linus Torvalds <lt@linux.com>

To me this was a natural task for Perl. Turns out however, there's a
catch. Apple exports the file in UTF-16 to ensure anyone with Chinese
characters in
their addressbook gets a legitimate Vcard file. And of course Perl
somewhat
chokes on UTF.

Sure, Python and Perl (and Ruby) should be equally suitable for the
task, so, if Python appears more suitable by having built-in unicode
capabilities, go for it. I'm a bit uncertain about the UTF-16 export
though; I know some applications do use it (e.g., Microsoft Entourage),
but I thought Apple's Address Book didn't, and, having just tried a
VCard export from mine, it looks quite ASCII to me. Maybe you've set
some kind of preference, or...?
Alex
I did the same thing. Apple's clever. If your addressbook doesn't have any
higher characters, ie nothing but ASCII, it will export your addressbook in
ASCII. But if you have anything else (in my case, Spanish, French, and
Italian) it goes for UTF16. I first thought it was UTF8 but realized since
Apple supports all sorts of Asian languages really well they need UTF16 to
deal with it, and importing the exported file into Jedit using UTF16
encoding confirmed that's what it is.

Apr 22 '07 #3
On Apr 21, 7:28 pm, R Wood <r...@therandymon.comwrote:
I know nothing about Python except that it interests me and has interested me
since I first learned the Rekall database frontend (Linux) runs on it. I just
ordered Learning Python and if that works out satisfactorily I'm going to go
back for Programming Python. In the meantime, I thought I would pose the
question to this newsgroup: would Python be useful for a parsing exercise like
this one?
Here's a little function that takes some `str`-type data (i.e. what
you'd get from doing open(...).read()) and, assuming it's a Vcard,
detects its encoding and converts it to a canonical `unicode` object.

def fix_encoding(s):
m = u'BEGIN:VCARD'
for c in ('ascii', 'utf_16_be', 'utf_16_le', 'utf_8'):
try: u = unicode(s, c)
except UnicodeDecodeError: continue
if m in u: return u
return None

Apr 24 '07 #4
On Apr 21, 7:28 pm, R Wood <r...@therandymon.comwrote:
To me this was a natural task for Perl. Turns out however, there's a catch.
Apple exports the file in UTF-16 to ensure anyone with Chinese characters in
their addressbook gets a legitimate Vcard file.
Here's a function that, given a `str` containing a vcard in some
encoding, guesses the encoding and returns a canonical representation
as a `unicode` object.

def fix_encoding(s):
m = u'BEGIN:VCARD'
for c in ('ascii', 'utf_16_be', 'utf_16_le', 'utf_8'):
try: u = unicode(s, c)
except UnicodeDecodeError: continue
if m in u: return u
return None

Apr 24 '07 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

17
3076
by: Michael McGarry | last post by:
Hi, I am just starting to use Python. Does Python have all the regular expression features of Perl? Is Python missing any features available in Perl? Thanks, Michael
1
3904
by: Ed | last post by:
What's the easiest way of developing an address book using ASP that is compatible with vCard? I'd like to be able to do this without 3rd party components. Thanks.
3
7913
by: scott | last post by:
Anyone have a link to syntax that could create an outlook vcard file format? all examples i find are for doing it inside outlook with .net. i'm just looking for an ASP solution.
1
2764
by: Jason Karns | last post by:
Does anyone know of any stylesheets out that are already built to transform the RDF vCard format (XML) to regular vCard file format (.vcf)? I've searched google for a while now and haven't found...
4
2759
by: localhost | last post by:
Looking for some sample code to emit a vcard (for Outlook) directly to a web browser after clicking on a button. Thanks.
3
3546
by: yochessyo | last post by:
Hi, Is there a way in VB.net to create or manipulate vcard? Thank you for your help.
0
361
by: Kurt B. Kaiser | last post by:
Patch / Bug Summary ___________________ Patches : 378 open ( +3) / 3298 closed (+34) / 3676 total (+37) Bugs : 886 open (-24) / 5926 closed (+75) / 6812 total (+51) RFE : 224 open...
2
2173
by: eloi-ribeiro.blogspot.com | last post by:
Hi everyone, I would like to use a python script to export (save as...) all my contacts in Evolution to VCard format (.vcf). I am a beginner so I don't knock how to use evolution-python module....
8
1951
by: Dotan Cohen | last post by:
KDE's Kontact PIM breaks quoted-printable vcard files because it linebreaks in the middle of a word. Take this text for example:...
0
7260
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
7161
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
1
7101
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
7525
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
5686
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
4746
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3222
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
1596
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
802
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.