473,794 Members | 2,765 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Reading XML Encoding errors

AGP
I am programming an XML reader in VB.NET 2005 and it works fairly well.
Once in a while though I encounter an old XML file without the header
<?xml version="1.0" encoding="UTF-8"?>
It craps out on the Load with an error similar to "Invalid character in the
given encoding. Line 3, position 5475070".
After some research the character in question is the copyright character. My
question is how can i force the reader to assume UTF-8?
It seems like my other newer files do not have this problem, just my older
files. I want to be able to catch this error
and then attempt to load the file. It also seems like this older file does
not have a BOM so Im assuming the XML reader has no idea how to interpret
it. Im hoping i can force a UTF-8 read of the XML file.

As a secondary question, it seems like these older XML files were
originally written out as one or two huge lines. is there a way to output a
copy
that is more user readable in the node-type format with line breaks and all?

Thanks for any help
AGP

Sep 30 '07 #1
4 4417
"AGP" <si**********@s ofthome.netschr ieb:
Once in a while though I encounter an old XML file without the header
<?xml version="1.0" encoding="UTF-8"?>
It craps out on the Load with an error similar to "Invalid character in
the given encoding. Line 3, position 5475070".
After some research the character in question is the copyright character.
My question is how can i force the reader to assume UTF-8?
It seems like my other newer files do not have this problem, just my older
files. I want to be able to catch this error
and then attempt to load the file. It also seems like this older file does
not have a BOM so Im assuming the XML reader has no idea how to interpret
it. Im hoping i can force a UTF-8 read of the XML file.
IIRC UTF-8 is the default encoding for XML files.

--
M S Herfried K. Wagner
M V P <URL:http://dotnet.mvps.org/>
V B <URL:http://dotnet.mvps.org/dotnet/faqs/>

Sep 30 '07 #2
AGP

"Herfried K. Wagner [MVP]" <hi************ ***@gmx.atwrote in message
news:O1******** ******@TK2MSFTN GP03.phx.gbl...
"AGP" <si**********@s ofthome.netschr ieb:
>Once in a while though I encounter an old XML file without the header
<?xml version="1.0" encoding="UTF-8"?>
It craps out on the Load with an error similar to "Invalid character in
the given encoding. Line 3, position 5475070".
After some research the character in question is the copyright character.
My question is how can i force the reader to assume UTF-8?
It seems like my other newer files do not have this problem, just my
older files. I want to be able to catch this error
and then attempt to load the file. It also seems like this older file
does not have a BOM so Im assuming the XML reader has no idea how to
interpret it. Im hoping i can force a UTF-8 read of the XML file.

IIRC UTF-8 is the default encoding for XML files.
ok so then why do i get the error? I did a test and loaded the XML into
notepad
and then saved that file as Text UTF-8 and it seems that file is read
correctly. So my
question is why does the original not load properly?

AGP
Oct 1 '07 #3
"AGP" <si**********@s ofthome.netschr ieb:
>IIRC UTF-8 is the default encoding for XML files.

ok so then why do i get the error? I did a test and loaded the XML into
notepad
and then saved that file as Text UTF-8 and it seems that file is read
correctly. So my
question is why does the original not load properly?
Maybe it's stored in an encoding other than UTF-8, Windows ANSI, for
example.

--
M S Herfried K. Wagner
M V P <URL:http://dotnet.mvps.org/>
V B <URL:http://dotnet.mvps.org/dotnet/faqs/>

Oct 1 '07 #4
AGP wrote:
I will ask but the source could be from a variety of providers so I may end
up
with no concrete answer as to whats used for encoding. However i did open
the file in Notepad and added the XML declaration and then just did a plain
old save and the file still errors out. But in my mind of there is no
declaration then
the functions assume a UTF-8 correct? But not sure why if this is the case
why
I still get an error.
Try whether using
New StreamReader("f ile.xml", Encoding.Defaul t)
works with those files.
--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Oct 2 '07 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
3152
by: Nick | last post by:
Hi ! I want to load an old Pascal-Dos-File where records stand in. When i view the file in a HEX-Editor it's clear how to acces these Strings and chars in that file. Since these are old 8BIT chars (C# uses 16BIT) i read the file bytewise and convert the bytes to chars using ENCODER.getChars(). From the chars i make a big String which should be the file as i see in the HEX-Editor. But there are many errors
6
44122
by: Neil Patel | last post by:
I have a log file that puts the most recent record at the bottom of the file. Each line is delimited by a \r\n Does anyone know how to seek to the end of the file and start reading backwards?
5
5857
by: Ed West | last post by:
Hi, I am trying to read a file, make changes, and write it to a new file. The original file has the copyright character © which is ascii 169 I believe, which is more than 7 bits. I am using typical StreamReader object to read in the file, but it is not getting it correctly. If I make the encoding type Ascii, it turns it into a question mark. If I use UTF7 or UTF8, it ignores it. Unicode was jibberish. Any ideas? Here is the file I...
3
2118
by: Nelson R. | last post by:
Hi, im using a form to get some input from the user. This form is in a HTML file. When I post the form directly to my email, i receive all fields correctly. Example test.html: <FORM action="MAILTO:myemail@work.com" method=post enctype="text/plain">
9
6789
by: jeff M via .NET 247 | last post by:
I'm still having problems reading EBCDIC files. Currently itlooks like the lower range (0 to 127) is working. I have triedthe following code pages 20284, 20924, 1140, 37, 500 and 20127.By working I get the correct answer by taking the decimal valueand using that as an index to an array that will map to thecorrect EBCDIC value in hex. By larger values, an example would be "AA" in EBCDIC hex wouldgive me the value of 63 in decimal (ASCII) when...
0
1377
by: tshad | last post by:
I can't seem to retrieve messages that are not in my mailbox from Exchange. If I am reading mail from my Exchange server, I will get messages that are in my inbox that have already been read but not deleted (outlook). What I am trying to get are messages that haven't been read yet? Is there a way to do this? Here is the code I am using.
8
12485
by: =?gb2312?B?yMvR1MLkyNXKx8zs0cSjrM37vKvM7NHEsru8+7z | last post by:
I lookup the utf-8 form of delta from the link. http://www.fileformat.info/info/unicode/char/0394/index.htm and then I want to print it in the python ( I work under windows) #!/usr/bin/python #coding=utf-8 print "\xce\x94"
4
1737
by: AGP | last post by:
I am programming an XML reader in VB.NET 2005 and it works fairly well. Once in a while though I encounter an old XML file without the header <?xml version="1.0" encoding="UTF-8"?> It craps out on the Load with an error similar to "Invalid character in the given encoding. Line 3, position 5475070". After some research the character in question is the copyright character. My question is how can i force the reader to assume UTF-8? It seems...
3
7975
by: Benny the Guard | last post by:
I have a CSV file created by VisualBasic in UTF-8. If I open the file in vi/emacs I see the Byte-Order marker (BOM), <feff> So now when I read the file: import codecs f = open ('myfile') test = f.readline () print test.decode ('utf-8')
0
9671
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9518
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
1
10161
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10000
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9035
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7538
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6777
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5560
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4112
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.