473,385 Members | 1,409 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Forms and encoding

I'd like to implement some sort of search function on my site, so I took
Google sample code and tried it, i.e. basically:

<form method="GET" action="http://www.google.com/search">
<input type="hidden" name="as_sitesearch" value="www.relinquiere.com">
<input type="text" name="q" size="15" value="">
<input type="image" id="submit" value="" src="..." ...>
</form>

It works fine, most of the time: if I type in accented characters, they
get somehow misinterpreted.

My test page is : http://wwww.relinquiere.com/search.html
As you can see by yourself, the charset parameter in Content-Type is
ISO-8859-1 (that's intended) so I expect my client to send the request
(when submitting the form) using the same encoding (even if it is not
required to do so).

Here is the request when I enter "préhistorique" in my search box:
GET /search?as_sitesearch=www.relinquiere.com&q=pr%E9hi storique&x=8&y=8
HTTP/1.1

where %E9 is actually the value for "é" in the latin-1 répertoire. But
Google interprets it as "pr?historique". If I enter some UTF-8 data in
the search field, this works fine (accented characters are correctly
passed to Google). Does it mean that Google expects UTF-8 data? or that
something is wrong with my form?

Then I added a hidden field to my form:
<input type="hidden" name="ie" value="ISO88591">
as you can see in: http://www.relinquiere.com/search-latin-1.html

(I assume that this "ie" field stood for "input encoding" so that Google
can interpret the received data as Latin-1)

Now, entering "préhistorique" as before works and returns one page. Here
is the request sent to Google:
GET
/search?as_sitesearch=www.relinquiere.com&ie=ISO885 91&q=pr%E9historique&x=9&y=3
HTTP/1.1

What I conclude is that Google needs to be told what encoding is used
for the parameters, which is fair, but this raises a big issue: how am I
supposed to know what encoding my visitors use?

Imagine that a French-speaking Japanese visits my site: he will receive
my page encoded in ISO-8859-1, enter some text (let's assume this text
is made of latin characters - is this possible in Japanese encoding?),
submit the form, and now what? Will his input be encoded in ISO-8859-1 too?
--
Want to spend holidays in France ? Check http://www.relinquiere.com/
Jul 23 '05 #1
2 2973
Vincent Poinot <vi***************************@wanadoo.fr> wrote:
I'd like to implement some sort of search function on my site, so I
took Google sample code and tried it, i.e. basically: - - It works fine, most of the time: if I type in accented characters,
they get somehow misinterpreted.
Yep, and you're right: it's an encoding problem.
Does it mean that Google expects UTF-8
data? or that something is wrong with my form?
Apparently Google expects UTF-8 by default.
<input type="hidden" name="ie" value="ISO88591"> - - Now, entering "préhistorique" as before works and returns one page.
That's interesting. I don't know whether Google recognizes the misspelled
name of the encoding or just uses ISO-8859-1 when it does not understand
the value of the ie field, but in any case the correct method is to use
an IANA registered name for the encoding, preferably the preferred MIME
name:
<input type="hidden" name="ie" value="ISO-8859-1">
(Hyphens are significant in character encoding names.)
What I conclude is that Google needs to be told what encoding is used
for the parameters, which is fair, but this raises a big issue: how
am I supposed to know what encoding my visitors use?
The browser normally uses, for form submission, the encoding of the page
where the form appears. (In theory, you could specify otherwise by using
the accept-charset attribute in the <form> element, but as far as I know,
no browser supports it.) So you should just check that you have specified
that encoding properly, preferably in HTTP headers, or at least in a
<meta> tag.
Imagine that a French-speaking Japanese visits my site: he will
receive my page encoded in ISO-8859-1, enter some text (let's assume
this text is made of latin characters - is this possible in Japanese
encoding?), submit the form, and now what? Will his input be encoded
in ISO-8859-1 too?


As far as I've understood, his browser should send the characters as
ISO-8859-1 encoded and does so. If you tried something less common like
ISO-8859-15, problems would arise. But ISO-8859-1 should work fine, at
least in all browsing situations where your ISO-8859-1 encoded page is
legible in the first place!

You might wish to check Alan Flavell's treatise on character encoding
problems in forms:
http://ppewww.ph.gla.ac.uk/%7eflavel...form-i18n.html

(What Google does with accented letters is an interesting story, though
beyond the scope of the group. It gives strangely different results for
préhistorique
prehistorique
+préhistorique
+prehistorique
but it generally treats e.g. e and é as equivalent.)

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 23 '05 #2
Jukka K. Korpela wrote:
Vincent Poinot <vi***************************@wanadoo.fr> wrote:

<input type="hidden" name="ie" value="ISO88591">
That's interesting. I don't know whether Google recognizes the misspelled
name of the encoding or just uses ISO-8859-1 when it does not understand
the value of the ie field, but in any case the correct method is to use
an IANA registered name for the encoding, preferably the preferred MIME
name:
<input type="hidden" name="ie" value="ISO-8859-1">
(Hyphens are significant in character encoding names.)

Thanks for the tip: I changed that (and it still works, of course). Just
ouf of curiosity, as you suggested, I also tried to give Google some
garbage instead of a proper encoding name... and it also returns correct
results!
(http://www.google.com/search?as_site...orique&x=0&y=0)
Imagine that a French-speaking Japanese visits my site: he will
receive my page encoded in ISO-8859-1, enter some text (let's assume
this text is made of latin characters - is this possible in Japanese
encoding?), submit the form, and now what? Will his input be encoded
in ISO-8859-1 too?

As far as I've understood, his browser should send the characters as
ISO-8859-1 encoded and does so. If you tried something less common like
ISO-8859-15, problems would arise. But ISO-8859-1 should work fine, at
least in all browsing situations where your ISO-8859-1 encoded page is
legible in the first place!

Yes, I guess this is where I was heading to: as long as I stick to
ISO-8859-1, everything should be fine. However, this whole mechanism
looks pretty fragile to me when it comes to more exotic encodings...
You might wish to check Alan Flavell's treatise on character encoding
problems in forms:
http://ppewww.ph.gla.ac.uk/%7eflavel...form-i18n.html

Already read that: excellent and very useful indeed.
--
Want to spend holidays in France ? Check http://www.relinquiere.com/
Jul 23 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Billy Jacobs | last post by:
I have a website which has both secure and non-secure pages. I want to uses forms authentication. How do I accomplish this? Originally I had my web.config file in the root with Forms...
6
by: ALthePal | last post by:
Hi, I'm not sure if we are able to or even how to loop through the web forms in a VB.NET project during design time. In MSAccess we are able to go through the database -> forms collection and...
3
by: David Winter | last post by:
If I set up a form and have an ASP script process the users's input, which component defines the character set of that input? - His browser/OS? - The encoding attribute in the doctype...
3
by: Nick | last post by:
I am working a new application...well actually a series of applications for my company. They want internal users to be able to go to a site and everything regarding security is transparent,...
5
by: Matthew Thompson | last post by:
I have as issue I am finding hard to research. I use a stored proecdure in SQL 2000 to provide search capability for our database of news stories and articles. Being an international magazine...
1
by: techfuzz | last post by:
I'm posting my problem experience and solution I found here for other ASP.NET developers. I have a web application that uses Forms Authentication with Active Directory to control access. In...
3
by: Kris van der Mast | last post by:
Hi, I've created a little site for my sports club. In the root folder there are pages that are viewable by every anonymous user but at a certain subfolder my administration pages should be...
2
by: Rob | last post by:
I was working on a project and everything was going fine, then all of a sudden the form set as my startup object stopped loading. I tried setting some others as the startup object, and some of my...
8
by: inpuarg | last post by:
I 'm developing a c# (.net 2.0) windows forms application and in this application i want to connect to a java servlet page (HTTPS) (which is servlet 2.4 and which may be using Web Based SSO Sun...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.