473,583 Members | 3,571 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

text decoding from dataset, hmm... help appreciated.

Hi all,

Here is my problem:

I have a SQL Server 2000 DB with various NVarChar, NText fields in its
tables.
For some stupid reason the data was inserted into these fields in UTF8
encoding.

However when you retrieve these values into a dataset and ToString() them
some
characters come out as garbage.

So therefore I have started writing a throw away app that will go through
all the
relevant tables and fields decoding and then updating with unicode values.

However I'm a bit confused and stuck. There seem to be lots of classes that
sound exactly like what I want however they either don't convert to unicode
or they expect the data to be in a byte array. Trying to convert the dataset
values from object to a byte array causes a invalid cast exception.

Any ideas or links to detail info about the encoding stuff would be much
appreciated.

Regards,
Peter
Jul 21 '05 #1
40 3194
Peter Row <pe*******@oxfo rdcc.co.uk> wrote:
Here is my problem:

I have a SQL Server 2000 DB with various NVarChar, NText fields in its
tables.
For some stupid reason the data was inserted into these fields in UTF8
encoding.
That shouldn't make any odds - the encoding shouldn't matter at all, as
what will end up in the database is unicode characters.
However when you retrieve these values into a dataset and ToString() them
some characters come out as garbage.

So therefore I have started writing a throw away app that will go through
all the
relevant tables and fields decoding and then updating with unicode values.

However I'm a bit confused and stuck. There seem to be lots of classes that
sound exactly like what I want however they either don't convert to unicode
or they expect the data to be in a byte array. Trying to convert the dataset
values from object to a byte array causes a invalid cast exception.

Any ideas or links to detail info about the encoding stuff would be much
appreciated.


See http://www.pobox.com/~skeet/csharp/unicode.html

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #2
Peter Row <pe*******@oxfo rdcc.co.uk> wrote:
Here is my problem:

I have a SQL Server 2000 DB with various NVarChar, NText fields in its
tables.
For some stupid reason the data was inserted into these fields in UTF8
encoding.
That shouldn't make any odds - the encoding shouldn't matter at all, as
what will end up in the database is unicode characters.
However when you retrieve these values into a dataset and ToString() them
some characters come out as garbage.

So therefore I have started writing a throw away app that will go through
all the
relevant tables and fields decoding and then updating with unicode values.

However I'm a bit confused and stuck. There seem to be lots of classes that
sound exactly like what I want however they either don't convert to unicode
or they expect the data to be in a byte array. Trying to convert the dataset
values from object to a byte array causes a invalid cast exception.

Any ideas or links to detail info about the encoding stuff would be much
appreciated.


See http://www.pobox.com/~skeet/csharp/unicode.html

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #3
Hi Jon,

I told Peter in the language.vb group to ask this in here, because there was
the expert Jon Skeet, I think you can better.

(Now my advise looks as nothing and maybe I do it myself although I am not
that expert of course)
Cor
Jul 21 '05 #4
Hi Jon,

I told Peter in the language.vb group to ask this in here, because there was
the expert Jon Skeet, I think you can better.

(Now my advise looks as nothing and maybe I do it myself although I am not
that expert of course)
Cor
Jul 21 '05 #5
Cor Ligthert <no**********@p lanet.nl> wrote:
I told Peter in the language.vb group to ask this in here, because there was
the expert Jon Skeet, I think you can better.

(Now my advise looks as nothing and maybe I do it myself although I am not
that expert of course)


Once Peter has read through the article I referenced, he should
understand things better - so he'll be able to come back with more
detailed questions, hopefully.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #6
Cor Ligthert <no**********@p lanet.nl> wrote:
I told Peter in the language.vb group to ask this in here, because there was
the expert Jon Skeet, I think you can better.

(Now my advise looks as nothing and maybe I do it myself although I am not
that expert of course)


Once Peter has read through the article I referenced, he should
understand things better - so he'll be able to come back with more
detailed questions, hopefully.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #7
Hi,

Well regardless of "it shouldn't make any odds" it DOES.

One thing to point out here that I didn't mention in the original post
is this is in connection with a port from VB6 to VB.NET.

The original VB6 version did the storing in UTF8, the .NET port comes along
and on certain characters causes a problem.

Example an apostrophe stored in the database using UTF8 encoding when
an ADO.NET dataset ToString()'s the value instead of an apostrophe you get
2 or 3 nonsense characters.

Anyhow after many hours of frustration and trying to understand all the
..NET encoding classes I discovered that it is not possible to do what I want
with native .NET code.

This is because when using a dataset the values have already been implicitly
converted
and inaddtion you have to use ToString() any way, by this point the UTF8
value
has been corrupted and the decode to Unicode doesn't work.

Using a datareader and it's GetBytes() method doesn't help either because
that
method only works on Text and NText database types and I need it to work on
NVarChar as well.

But there was light at the end of the tunnel (well in my case any way).
I used COM Interop to use ADO 2.7 and a custom inhouse C++ DLL.
The former to get the data without mangling it behind my back the latter
to decode the UTF8 to Unicode that .NET likes. This saved me throwing away
everything I'd already done.

This sounds bad, but this is a one-off use util, once all the databases that
the
VB6 version used have been "fixed" to proper unicode and not utf8 this util
will never
be used again.

Regards,
Peter

"Jon Skeet [C# MVP]" <sk***@pobox.co m> wrote in message
news:MP******** *************** *@msnews.micros oft.com...
Peter Row <pe*******@oxfo rdcc.co.uk> wrote:
Here is my problem:

I have a SQL Server 2000 DB with various NVarChar, NText fields in its
tables.
For some stupid reason the data was inserted into these fields in UTF8
encoding.


That shouldn't make any odds - the encoding shouldn't matter at all, as
what will end up in the database is unicode characters.
However when you retrieve these values into a dataset and ToString() them some characters come out as garbage.

So therefore I have started writing a throw away app that will go through all the
relevant tables and fields decoding and then updating with unicode values.
However I'm a bit confused and stuck. There seem to be lots of classes that sound exactly like what I want however they either don't convert to unicode or they expect the data to be in a byte array. Trying to convert the dataset values from object to a byte array causes a invalid cast exception.

Any ideas or links to detail info about the encoding stuff would be much
appreciated.


See http://www.pobox.com/~skeet/csharp/unicode.html

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Jul 21 '05 #8
Hi,

Well regardless of "it shouldn't make any odds" it DOES.

One thing to point out here that I didn't mention in the original post
is this is in connection with a port from VB6 to VB.NET.

The original VB6 version did the storing in UTF8, the .NET port comes along
and on certain characters causes a problem.

Example an apostrophe stored in the database using UTF8 encoding when
an ADO.NET dataset ToString()'s the value instead of an apostrophe you get
2 or 3 nonsense characters.

Anyhow after many hours of frustration and trying to understand all the
..NET encoding classes I discovered that it is not possible to do what I want
with native .NET code.

This is because when using a dataset the values have already been implicitly
converted
and inaddtion you have to use ToString() any way, by this point the UTF8
value
has been corrupted and the decode to Unicode doesn't work.

Using a datareader and it's GetBytes() method doesn't help either because
that
method only works on Text and NText database types and I need it to work on
NVarChar as well.

But there was light at the end of the tunnel (well in my case any way).
I used COM Interop to use ADO 2.7 and a custom inhouse C++ DLL.
The former to get the data without mangling it behind my back the latter
to decode the UTF8 to Unicode that .NET likes. This saved me throwing away
everything I'd already done.

This sounds bad, but this is a one-off use util, once all the databases that
the
VB6 version used have been "fixed" to proper unicode and not utf8 this util
will never
be used again.

Regards,
Peter

"Jon Skeet [C# MVP]" <sk***@pobox.co m> wrote in message
news:MP******** *************** *@msnews.micros oft.com...
Peter Row <pe*******@oxfo rdcc.co.uk> wrote:
Here is my problem:

I have a SQL Server 2000 DB with various NVarChar, NText fields in its
tables.
For some stupid reason the data was inserted into these fields in UTF8
encoding.


That shouldn't make any odds - the encoding shouldn't matter at all, as
what will end up in the database is unicode characters.
However when you retrieve these values into a dataset and ToString() them some characters come out as garbage.

So therefore I have started writing a throw away app that will go through all the
relevant tables and fields decoding and then updating with unicode values.
However I'm a bit confused and stuck. There seem to be lots of classes that sound exactly like what I want however they either don't convert to unicode or they expect the data to be in a byte array. Trying to convert the dataset values from object to a byte array causes a invalid cast exception.

Any ideas or links to detail info about the encoding stuff would be much
appreciated.


See http://www.pobox.com/~skeet/csharp/unicode.html

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Jul 21 '05 #9
Peter Row <pe*******@oxfo rdcc.co.uk> wrote:
Well regardless of "it shouldn't make any odds" it DOES.
Then chances are the data isn't being posted properly to start with.
One thing to point out here that I didn't mention in the original post
is this is in connection with a port from VB6 to VB.NET.

The original VB6 version did the storing in UTF8, the .NET port comes along
and on certain characters causes a problem.
It's just possible that VB6 is converting the string into UTF8, but
then treating each byte in the resulting sequence as a character, and
using the default encoding on the equivalent string. You'd certainly
get bizarre effects then.
Example an apostrophe stored in the database using UTF8 encoding when
an ADO.NET dataset ToString()'s the value instead of an apostrophe you get
2 or 3 nonsense characters.
What kind of apostrophe, and what kind of nonsense characters? If you
have an example of which Unicode character you're trying to get, and
which Unicode characters you're actually getting back (as integer
values, preferrably) that would be great.

See http://www.pobox.com/~skeet/csharp/d...ngunicode.html for
examples of how to get those character values.
Anyhow after many hours of frustration and trying to understand all the
.NET encoding classes I discovered that it is not possible to do what I want
with native .NET code.
Well, it might be - but it'll be a bit of a hack.
This is because when using a dataset the values have already been implicitly
converted and inaddtion you have to use ToString() any way, by this point
the UTF8 value has been corrupted and the decode to Unicode doesn't work.
No, I believe it's corrupt by the time it's in the database. It reminds
me of a similar problem someone else posted about a while ago.
Using a datareader and it's GetBytes() method doesn't help either
because that method only works on Text and NText database types and I
need it to work on NVarChar as well.

But there was light at the end of the tunnel (well in my case any way).
I used COM Interop to use ADO 2.7 and a custom inhouse C++ DLL.
The former to get the data without mangling it behind my back the latter
to decode the UTF8 to Unicode that .NET likes. This saved me throwing away
everything I'd already done.

This sounds bad, but this is a one-off use util, once all the databases that
the VB6 version used have been "fixed" to proper unicode and not utf8 this util
will never be used again.


The VB6 code shouldn't have to do any encoding at all - but I'm not
familiar enough with ADO and VB6 to say exactly what needs to be done.
If it *is* the problem I think it is, then you might be able to get the
real data using (assuming you've got the text variable from the
database):

byte[] databaseValueAs Binary = Encoding.Defaul t.GetBytes(text );
string realText = Encoding.UTF8.G etString(databa seValueAsBinary );

That's worth a try...

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

40
517
by: Peter Row | last post by:
Hi all, Here is my problem: I have a SQL Server 2000 DB with various NVarChar, NText fields in its tables. For some stupid reason the data was inserted into these fields in UTF8 encoding. However when you retrieve these values into a dataset and ToString() them
0
7821
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
8172
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
8320
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
0
6577
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
1
5697
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
5370
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3814
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
1
2328
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
0
1152
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.