473,698 Members | 2,047 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Validation of XHTML with danish characters

I have a problem validating a simple piece of XHTML containing danish
characters. Trying to validate the following piece of XHTML gives the error
mentioned beneath. If I remove the first line (the XML part) the document
validates fine. Does anyone have an idea how to solve this problem without
changing the characters to #xxx; or ø I've triede with UTF-8.
*************** *************** *************** *****
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dt d">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="da" lang="da">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>Docume nt title</title>
</head>

<body>
<p>This is a danish document with the Danish letters æ ø and å</p>
</body>

</html>

The error:
"Sorry, I am unable to validate this document because on line 11 it
contained one or more bytes that I cannot interpret as us-ascii (in other
words, the bytes found are not valid values in the specified Character
Encoding). Please check both the content of the file and the character
encoding indication."
Jul 20 '05 #1
15 4374
"Nicolai Pedersen" <np@dynamicsyst ems.dk> wrote:
I have a problem validating a simple piece of XHTML containing danish
characters.


This is long and sad story, and you would be confused after the
explanation. The short advice is simple: stop playing with XHTML;
upgrade to HTML 4.01. After all, it's just a matter of syntactic
trivialities, but playing by XHTML rules gives you a headache
if you don't know them well (and maybe even if you do).

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 20 '05 #2
I'm currently upgrading from 4.01 to XHTML. Not for the fun of it - but
because I really need to do it.

"Jukka K. Korpela" <jk******@cs.tu t.fi> wrote in message
news:Xn******** *************** ******@193.229. 0.31...
"Nicolai Pedersen" <np@dynamicsyst ems.dk> wrote:
I have a problem validating a simple piece of XHTML containing danish
characters.


This is long and sad story, and you would be confused after the
explanation. The short advice is simple: stop playing with XHTML;
upgrade to HTML 4.01. After all, it's just a matter of syntactic
trivialities, but playing by XHTML rules gives you a headache
if you don't know them well (and maybe even if you do).

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 20 '05 #3
On Mon, 27 Oct 2003 12:13:33 +0100, "Nicolai Pedersen"
<np@dynamicsyst ems.dk> wrote:
I'm currently upgrading from 4.01 to XHTML. Not for the fun of it - but
because I really need to do it.


Why?

Jim.
--
comp.lang.javas cript FAQ - http://jibbering.com/faq/

Jul 20 '05 #4
On Mon, 27 Oct 2003, Nicolai Pedersen wrote:

[in a usenet posting which appeared to be lacking its MIME header and
content-type, and therefore *ought* to have contained only us-ascii
characters...]
I have a problem validating a simple piece of XHTML containing danish
characters. Trying to validate the following piece of XHTML gives the error
mentioned beneath.
Then the important detail is likely to be something not included in
your report. Better quote a URL where we can investigate this for
ourselves.
If I remove the first line (the XML part) the document
validates fine.
That has me puzzled, but I'm confident that if you gave a URL then
you'd get a prompt explanation, if not from me then from one of the
other contributors.
Does anyone have an idea how to solve this problem
We don't really know what the "problem" is yet - you've reported some
of the symptoms, but IMHO some important detail is missing.
without changing the characters to #xxx; or &oslash;
There should be no necessity for that, even if it does bring some
benefits in terms of document (mis)handling.
I've triede with UTF-8.
You don't make iso-8859-1-encoded characters magically change to utf-8
merely by declaring them so. If they genuinely were utf-8-encoded,
then that would be different.
The error:
"Sorry, I am unable to validate this document because on line 11 it
contained one or more bytes that I cannot interpret as us-ascii (in other
words, the bytes found are not valid values in the specified Character
Encoding).


Something seems to have convinced the processor that your document is
us-ascii-encoded? Maybe the web server?

good luck
Jul 20 '05 #5
Thank you for your answer - it gave me the hint for the source of the error:

I was uploading the script as a file to the validator service - when
uploading it to my webserver and revalidating using the URL, everything
works fine.

"Alan J. Flavell" <fl*****@ph.gla .ac.uk> wrote in message
news:Pi******** *************** ********@ppepc5 6.ph.gla.ac.uk. ..
On Mon, 27 Oct 2003, Nicolai Pedersen wrote:

Something seems to have convinced the processor that your document is
us-ascii-encoded? Maybe the web server?

good luck

Jul 20 '05 #6
On Mon, 27 Oct 2003, Jukka K. Korpela wrote:
This is long and sad story, and you would be confused after the
explanation.


If you've understood the problem based on what the hon Usenaut posted,
then I'm interested to know what it is. Maybe it's already been
posted or FAQed in some form, but if it was, I confess to not being
aware of it.
Jul 20 '05 #7
On Mon, 27 Oct 2003, Nicolai Pedersen wrote:
X-Newsreader: Microsoft Outlook Express 6.00.2800.1158

I have a problem validating a simple piece of XHTML containing danish
characters.

<p>This is a danish document with the Danish letters ? ? and ?</p>


Start with repairing your simulation of a newsreader. Here you go:

Tools > Options > Send
Mail Sending Format > Plain Text Settings > Message format MIME
News Sending Format > Plain Text Settings > Message format MIME
Encode text using: None

Jul 20 '05 #8
"Alan J. Flavell" <fl*****@ph.gla .ac.uk> wrote:
If you've understood the problem based on what the hon Usenaut
posted, then I'm interested to know what it is.


I have tried to actively forget all the confusion that XHTML causes
since, as I wrote, the simple answer is to stay away from it. (Some
day, there might be some actual use for XHTML, but I hope that then the
worst oddities have been fixed.)

If you compose a document containing the OP's sample HTML, in
ISO-8859-1 encoding, and submit it to validation via the file upload
facility at http://validator.w3.org/ , then the problem reported
will appear. It's strange that the validator refuses to look at
the document content, which twice specifies ISO-8859-1, but what can we
do? Yes, we _could_ use the extended interface, which lets us specify
the encoding the third time, and then we get

Note: The HTTP Content-Type header sent by your web browser (unknown)
did not contain a "charset" parameter, but the Content-Type was one of
the XML text/* sub-types (text/xml). The relevant specification (RFC
3023) specifies a strong default of "us-ascii" for such documents so we
will use this value regardless of any encoding you may have indicated
elsewhere. If you would like to use a different encoding, you should
arrange to have your browser send this new encoding information.

which looks pretty strange after a _file upload_ submission.

But it's less strange than experiences that I have seen when
the "CSS Validator" has been used on an XHTML document containing 8-bit
characters in ISO-8859-1 encoding and the "CSS Validator" called
the W3C "markup validator", which choked on it. As I wrote, I'm
actively trying to forget the mess.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 20 '05 #9
On Mon, 27 Oct 2003, Jukka K. Korpela wrote:
I have tried to actively forget all the confusion that XHTML causes
since, as I wrote, the simple answer is to stay away from it.
;-}

But there's an interesting point here, nevertheless, which has nothing
directly to do with XML/XHTML and a lot to do with i18n form
submission, and, I now realise, relating tangentially to my form-i18n
page. Oh, and by chance Google just presented me with a discussion
thread which is also relevant to the underlying principle, in a way...
http://lists.w3.org/Archives/Public/...3Sep/0025.html etc.
Yes, we _could_ use the extended interface, which lets us specify
the encoding the third time, and then we get

Note: The HTTP Content-Type header sent by your web browser (unknown)
did not contain a "charset" parameter, but the Content-Type was one of
the XML text/* sub-types (text/xml).
This assertion presumably relates to the file upload "control" of the
multipart/form-data submission, yes? I'm not in the least surprised
by the absence of a "charset" specification, but I'm puzzled by the
fact that it's saying it was content-type "text/xml". Would this have
been sent by your client agent, or are they spoofing it in order to
make their validator accept it?

[quotation continues...] The relevant specification (RFC
3023) specifies a strong default of "us-ascii" for such documents so we
will use this value regardless of any encoding you may have indicated
elsewhere. If you would like to use a different encoding, you should
arrange to have your browser send this new encoding information.
Hmmm, yes, they have a point, despite its unfriendliness.
which looks pretty strange after a _file upload_ submission.


Oh, I don't know: the client agent is in a far better position to know
what encoding to assign to this portion of the multipart/form-data
submission, than is any other participant in the proceedings.

What it basically means is: because implementers have been avoiding
implementing the necessary features of the i18n specifications (in
some cases alleging that they couldn't do it because it would upset
other incomplete implementations ), this kind of file upload can't do
the job that is needed at this point.

If the validator folk were to start applying heuristics at this point
then they'd defeat their own purpose, presumably. It's a shame about
the users who are caught out by this, though.

As you may recall, my thesis has always been that no text file is
complete without external information about its character encoding,
and that it's an architectural error to smuggle that information into
content of the file itself. But I've long since lost that battle,
what with the http meta thingy, the <?xml...encodin g thingy. I could
almost live with the BOM, but of course the BOM doesn't solve anything
for non-Unicode encodings.

And Mark C made dire threats about the dangers of going anywhere near
ISO-2022 (which I hadn't even mentioned!) when I got involved in a
discussion about character code support in PINE recently.

I think the bottom line here is that the file upload feature of the
validator is of very limited usefulness, given the shortcomings which
have been raised here, and needs Some Big Text to warn users of the
pitfalls, relative to putting the content onto a server and pointing
the validator/checker at its URL.

thanks for the explanation!

all the best
Jul 20 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
2652
by: Lars Michael | last post by:
Hi, I'm on a Win2000 SP4 using IIS5, PHP4 (CGI-version) and MySQL My problem is this: Instead of getting nicely formatted danish characters when I use PHP to retrieve data from MySQL, all I get is a '}'-character. If I access the database through a DOS promt, everything looks just like it should. All characters are formatted correctly.
18
17512
by: LarsM | last post by:
Hi all, I am new to XML, but I use it for an RSS feed. I have one problem, which I have really been struggling with. My XML document is generated from the contents of a MySQL database. It is UTF-8 encoded. However, the Danish special characters appear wrong.
9
2713
by: Pemburger | last post by:
From: pemburger@aol.com I've tried the W3C MarkUp Validation Service for the following web page: http://www.coverscript.com The report given by W3C shows 300 plus errors? I am not able to understand their explanations on the errors. As per W3C's info that if I am having problems with validation -- try for some
21
8451
by: Zenobia | last post by:
I can't understand the warning I'm getting from the W3C validator. Here it is, along with the source code that it is not fully satisfied with. What meta-tags should I be including? Here is the warning I got from the W3C validator. Note: The HTTP Content-Type header sent by your web browser (unknown) did not contain a "charset" parameter, but the Content-Type was one of the XML text/* sub-types (text/xml). The relevant specification...
3
2864
by: Skippytpe | last post by:
Does anyone have an idea why the form validation in the following page wouldn't be working? I had been using XHTML 1.0 transitional which allowed me to use the form attribute 'name.' I could then just point the regular expression test to document.login.frmEmployeeNumber.value and have it validate. Now that I'm at XHTML 1.1 strict, I can only use form id's so I *thought* I could pull the elements out as I have below, but it's not...
14
2144
by: JNariss | last post by:
Hello, I am fairly new to asp and jscript and am now in the process of learning some form validation. I am taking one step at a time by validating each field and testing it before moving onto the next one to be sure I am correct. I ran into a problem with my validation when I added an else if code to my code. Here is what I tried to do: Form (ITTermination) has a field (EmployeeName) which I would like to validate to check for no...
4
1385
by: timothy.pollard | last post by:
Hi all A few weeks ago a nice man called Evertjan helped me create a form validation system that took a table of four columns of checkboxes and: - allowed only one checkbox in each row to be checked - totalised the number of checked boxes in each column The system works fine until you have more than 10 rows, at which point the myrow substr(1,1) fails to correctly identify the row number
5
1922
by: kevinmajor1 | last post by:
Hello, all. I'm currently trying to write a script that will perform form validation upon submission. I know that more validation should be performed on the server's side...this is just a test to see if I can get the client side to work. My problem is that I'm trying to make a validator that is 100% separated from the XHTML code. I don't even want an event handler assignment in the form. Unfortunately, I can't see how to grab a hold...
2
4564
by: Radu | last post by:
Hi. I have been working at home on a web project (VSNET 2005 SP1). Now I have brought the project at work, and I suddenly have plenty of warnings like: Validation (XHTML 1.0 Transitional) - Attribute..... is not a valid attribute of..... Validation (XHTML 1.0 Transitional) - Attribute..... is considered outdated. A newer construct is recommended. Validation (XHTML 1.0 Transitional) - Attribute values must be enclosed in quotation marks.
0
8603
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9157
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9026
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
8861
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7723
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6518
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4366
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
2328
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2001
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.