473,225 Members | 1,325 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,225 software developers and data experts.

Unicode/utf-8 data in SQL Server

I'm working with a MS SQL Server database created by a program from a
fine US company who seems to have gotten run over by the Unicode truck.
In their infinite wisdom they've decided to store Unicode data directly
in regular varchar fields, utf-8 encoded! (on the bright side, it is
properly utf-8 encoded). One of our customers then wants to use a csv
file created from a report to import in Excel and is getting an
attitude when the text shows up "all garbled" (which I can
understand...)

One method that works is to use Python to pull down the result set from
the database, accumulate the entire result text as a big unicode string
(while decode('utf-8') all text fields in the process) separating each
field with a tab, before encode('utf-16') the result string and writing
it to a file opened in binary mode. This ensures that the file gets a
bom, that it's in a format (utf-16) that Excel can import, and
hopefully tabs are less common than commas in the source data :-( The
csv module doesn't support Unicode.

The customer is of the firm belief that our national characters
() are part of ascii, presumably because they're
single-byte-encoded in iso-8859-1. He has no understanding for the
issues (either by choice or experience) so there is no purpose to
trying to explain the differences... Be that as it may, he might be
satisfied with a csv file in that (iso-8859-1) encoding since the local
version of Excel can import it transparently (with significant
behind-the-scenes magic I believe...?)

The Python script mentioned above has to be run on the server, since it
doesn't accept remote connections, I'm of course the only one with
access, and I'd like to remove myself from the loop. I've looked at
creating a view on the database that would cast or convert the data,
but all I've run into are vague references to StrConv, which seems to
be a VB function. Giving the customer a macro that he could run in
Excel after importing the data would probably be ok as well, so I also
tried creating an Excel VB macro using the StrConv function, but (a) it
isn't entirely clear to me that this function can do this, and (b) the
third argument to the function is an LCID, a Locale ID, which is
numeric and not defined anywhere I can find it...

Anyone have any advice?

tia,
-- bjorn

Aug 8 '06 #1
4 4875
bjorn:
I also
tried creating an Excel VB macro using the StrConv function, but (a) it
isn't entirely clear to me that this function can do this, and (b) the
third argument to the function is an LCID, a Locale ID, which is
numeric and not defined anywhere I can find it...
http://www.microsoft.com/globaldev/r.../lcid-all.mspx

Neil
Aug 8 '06 #2

thebjorn wrote:
I'm working with a MS SQL Server database created by a program from a
fine US company who seems to have gotten run over by the Unicode truck.
In their infinite wisdom they've decided to store Unicode data directly
in regular varchar fields, utf-8 encoded! (on the bright side, it is
properly utf-8 encoded). One of our customers then wants to use a csv
file created from a report to import in Excel and is getting an
attitude when the text shows up "all garbled" (which I can
understand...)

One method that works is to use Python to pull down the result set from
the database, accumulate the entire result text as a big unicode string
(while decode('utf-8') all text fields in the process) separating each
field with a tab, before encode('utf-16') the result string and writing
it to a file opened in binary mode. This ensures that the file gets a
bom, that it's in a format (utf-16) that Excel can import, and
hopefully tabs are less common than commas in the source data :-( The
csv module doesn't support Unicode.
Last time I looked, *Excel* didn't support csv files in utf-N :-(
>
The customer is of the firm belief that our national characters
() are part of ascii, presumably because they're
single-byte-encoded in iso-8859-1. He has no understanding for the
issues (either by choice or experience) so there is no purpose to
trying to explain the differences... Be that as it may, he might be
satisfied with a csv file in that (iso-8859-1) encoding since the local
version of Excel can import it transparently (with significant
behind-the-scenes magic I believe...?)
No magic AFAICT. The bog-standard Windows kit in (north/west/south
Europe + the English-speaking world) uses code page 1252 (Python:
'cp1252') which is an MS-molested iso-885-1.

The customer should be very happy if you do
text.decode('utf-8').encode('cp1252') -- not only should the file
import into Excel OK, he should be able to view it in
Word/Notepad/whatever.

HTH,
John

Aug 9 '06 #3
John Machin a crit :
The customer should be very happy if you do
text.decode('utf-8').encode('cp1252') -- not only should the file
import into Excel OK, he should be able to view it in
Word/Notepad/whatever.
+
text.decode('utf-8').encode('cp1252',errors='replace')

As cp1252 may not cover all utf8 chars.

Laurent.
Aug 9 '06 #4
Laurent Pointal wrote:
John Machin a crit :
The customer should be very happy if you do
text.decode('utf-8').encode('cp1252') -- not only should the file
import into Excel OK, he should be able to view it in
Word/Notepad/whatever.

+
text.decode('utf-8').encode('cp1252',errors='replace')

As cp1252 may not cover all utf8 chars.
In that case, the OP may well want to use 'xmlcharrefreplace' or
'backslashreplace' as they stand out more than '?' *and* the original
Unicode is recoverable if necessary e.g.:

#>>msg = u'\u0124\u0114\u0139\u013B\u0150'
>>print msg
HELLO
#>>msg.encode('cp1252', 'replace')
'?????'
#>>msg.encode('cp1252', 'xmlcharrefreplace')
'ĤĔĹĻŐ'
#>>msg.encode('cp1252', 'backslashreplace')
'\\u0124\\u0114\\u0139\\u013b\\u0150'
#>>

Cheers,
John

Aug 9 '06 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Michael Weir | last post by:
I'm sure this is a very simple thing to do, once you know how to do it, but I am having no fun at all trying to write utf-8 strings to a unicode file. Does anyone have a couple of lines of code...
8
by: Francis Girard | last post by:
Hi, For the first time in my programmer life, I have to take care of character encoding. I have a question about the BOM marks. If I understand well, into the UTF-8 unicode binary...
3
by: Naresh Agarwal | last post by:
Hi XML uses UTF-8 by default. Is that correct? Also, can we use Unicode in XML? thanks, Naresh
27
by: EU citizen | last post by:
Do web pages have to be created in unicode in order to use UTF-8 encoding? If so, can anyone name a free application which I can use under Windows 98 to create web pages?
15
by: John Salerno | last post by:
Forgive my newbieness, but I don't quite understand why Unicode is still something that needs special treatment in Python (and perhaps elsewhere). I'm reading Dive Into Python right now, and it...
40
by: apprentice | last post by:
Hello, I'm writing an class library that I imagine people from different countries might be interested in using, so I'm considering what needs to be provided to support foreign languages,...
12
by: damjan | last post by:
This may look like a silly question to someone, but the more I try to understand Unicode the more lost I feel. To say that I am not a beginner C++ programmer, only had no need to delve into...
2
by: willie | last post by:
John Machin: # I'll go ahead and concede defeat since you appear to be on the # verge of a heart attack :) # I can see that I lack clarity so I don't blame you. # By UTF-8 string, I mean...
17
by: Adam Olsen | last post by:
As was seen in another thread, there's a great deal of confusion with regard to surrogates. Most programmers assume Python's unicode type exposes only complete characters. Even CPython's own...
7
by: 7stud | last post by:
Based on this example and the error: ----- u_str = u"abc\u9999" print u_str UnicodeEncodeError: 'ascii' codec can't encode character u'\u9999' in position 3: ordinal not in range(128) ------
1
isladogs
by: isladogs | last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, Mike...
0
by: veera ravala | last post by:
ServiceNow is a powerful cloud-based platform that offers a wide range of services to help organizations manage their workflows, operations, and IT services more efficiently. At its core, ServiceNow...
0
by: VivesProcSPL | last post by:
Obviously, one of the original purposes of SQL is to make data query processing easy. The language uses many English-like terms and syntax in an effort to make it easy to learn, particularly for...
3
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 3 Jan 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). For other local times, please check World Time Buddy In...
0
by: jianzs | last post by:
Introduction Cloud-native applications are conventionally identified as those designed and nurtured on cloud infrastructure. Such applications, rooted in cloud technologies, skillfully benefit from...
0
by: mar23 | last post by:
Here's the situation. I have a form called frmDiceInventory with subform called subfrmDice. The subform's control source is linked to a query called qryDiceInventory. I've been trying to pick up the...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM). In this month's session, the creator of the excellent VBE...
0
by: stefan129 | last post by:
Hey forum members, I'm exploring options for SSL certificates for multiple domains. Has anyone had experience with multi-domain SSL certificates? Any recommendations on reliable providers or specific...
1
by: davi5007 | last post by:
Hi, Basically, I am trying to automate a field named TraceabilityNo into a web page from an access form. I've got the serial held in the variable strSearchString. How can I get this into the...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.