473,231 Members | 1,591 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,231 software developers and data experts.

character sets

Hi Folk
Here I am writing my first php / mysql site, almost ready, and now this... charactersets....

The encoding that I use on my webpage is:

<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">

When people enter new data I use

$newvalue = htmlentities($_POST["newvalue"], ENT_QUOTES)

I then SQL this into my table and next I display the value

e.g. <DIV CLASS="content">'.$newvalue.'</DIV>

All of this works fine, BUT, funny characters that may have been entered through the form (e.g. Word-Style quotation marks,
e-accent-grave, etc..) are taking on a whole new life. I put in an e with an accent and it changed into a chinese character.

I tried to run

$link = mysql_connect($host, $username, $password);
$charset = mysql_character_set_name($link);
printf ("character set is %s\n", $charset);

but that only gave me an error.

I searched on google, but many of the notes are in other languages.... ;-)

Does anyone have any hints in English?

TIA

- Nicolaas



Jul 17 '05 #1
7 2881
NC
WindAndWaves wrote:

The encoding that I use on my webpage is:
<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">

When people enter new data I use
$newvalue = htmlentities($_POST["newvalue"], ENT_QUOTES)

I then SQL this into my table and next I display the value
e.g. <DIV CLASS="content">'.$newvalue.'</DIV>

All of this works fine, BUT, funny characters that may have been
entered through the form (e.g. Word-Style quotation marks,
e-accent-grave, etc..) are taking on a whole new life.
I put in an e with an accent and it changed into a chinese
character.


You have two options to fix this:

1. Convert your strings from UTF-8 into, say, ISO-8859-1,
before storing them in the database:

$string = iconv('UTF-8', 'ISO-8859-1', $string);

You will need your PHP installation to be compiled
with iconv extension to do that.

2. Set your MySQL server's character set to UTF-8.

First, check if you currently have UTF-8 support.
Run this query:

SHOW VARIABLES;

find the `character_sets` variable in the output and
verify that `utf8` is listed among the character sets
currently supported. If there's no support for UTF-8,
install or configure it (see MySQL documentation for
details).

If and when you have UTF-8 support, you can set
UTF-8 as the default character set for your database:

ALTER DATABASE db_name
DEFAULT CHARACTER SET utf8;

Alternatively, you can change character set setting
on a per-connection basis by sending this query:

SET NAMES 'utf8';

first thing after establishing a connection to the
database.

Cheers,
NC

Jul 17 '05 #2
On 1 Feb 2005 13:40:47 -0800, "NC" <nc@iname.com> wrote:
You have two options to fix this:

1. Convert your strings from UTF-8 into, say, ISO-8859-1,
before storing them in the database:

$string = iconv('UTF-8', 'ISO-8859-1', $string);
That's a lossy conversion though, so be careful.
You will need your PHP installation to be compiled
with iconv extension to do that.

2. Set your MySQL server's character set to UTF-8.

First, check if you currently have UTF-8 support.
Run this query:

SHOW VARIABLES;

find the `character_sets` variable in the output and
verify that `utf8` is listed among the character sets
currently supported. If there's no support for UTF-8,
install or configure it (see MySQL documentation for
details).

If and when you have UTF-8 support, you can set
UTF-8 as the default character set for your database:

ALTER DATABASE db_name
DEFAULT CHARACTER SET utf8;

Alternatively, you can change character set setting
on a per-connection basis by sending this query:

SET NAMES 'utf8';

first thing after establishing a connection to the
database.


If you don't mind any length functions returning the wrong values (i.e.
returning byte length not character length), you could probably even get away
with storing UTF-8 in MySQL without setting anything - provided it doesn't
attempt to do any character set conversions, just stores strings as-is.

But basically you have to be very careful when working with character set
encodings, since you've got to know what you're dealing with at each step, and
whether any function's going to try and interpret the encoded bytes into a
character, or just pass it on.

--
Andy Hassall / <an**@andyh.co.uk> / <http://www.andyh.co.uk>
<http://www.andyhsoftware.co.uk/space> Space: disk usage analysis tool
Jul 17 '05 #3
On Tue, 1 Feb 2005 21:02:15 +1300, "WindAndWaves" <ac****@ngaru.com> wrote:
Here I am writing my first php / mysql site, almost ready, and now this... charactersets....

The encoding that I use on my webpage is:

<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">
Send a proper character set header; using <meta> for content type and
encodings is generally for situations where HTTP headers don't exist, e.g.
reading off a filesystem.

header("Content-type: text/html; charset=utf-8");

http://uk2.php.net/header
When people enter new data I use

$newvalue = htmlentities($_POST["newvalue"], ENT_QUOTES)
That defaults to ISO-8859-1; if you pass it UTF-8 without setting the third
parameter, you'll corrupt your data.

http://uk2.php.net/htmlentities
I then SQL this into my table and next I display the value

e.g. <DIV CLASS="content">'.$newvalue.'</DIV>

All of this works fine, BUT, funny characters that may have been entered through the form (e.g. Word-Style quotation marks,
e-accent-grave, etc..)
These characters all exist in UTF-8 - as does almost every character.
are taking on a whole new life. I put in an e with an accent and it changed into a chinese character.
Can you give a short self-contained example demonstrating it?
I tried to run

$link = mysql_connect($host, $username, $password);
$charset = mysql_character_set_name($link);
printf ("character set is %s\n", $charset);

but that only gave me an error.


According to the manul there's no such function. There's
mysqli_character_set_name, from the new PHP5 mysqli extension - but not in the
old mysql extension.

--
Andy Hassall / <an**@andyh.co.uk> / <http://www.andyh.co.uk>
<http://www.andyhsoftware.co.uk/space> Space: disk usage analysis tool
Jul 17 '05 #4

"Andy Hassall" <an**@andyh.co.uk> wrote in message news:3m********************************@4ax.com...
On Tue, 1 Feb 2005 21:02:15 +1300, "WindAndWaves" <ac****@ngaru.com> wrote:
Here I am writing my first php / mysql site, almost ready, and now this... charactersets....

The encoding that I use on my webpage is:

<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">


Send a proper character set header; using <meta> for content type and
encodings is generally for situations where HTTP headers don't exist, e.g.
reading off a filesystem.

header("Content-type: text/html; charset=utf-8");

http://uk2.php.net/header

When people enter new data I use

$newvalue = htmlentities($_POST["newvalue"], ENT_QUOTES)


That defaults to ISO-8859-1; if you pass it UTF-8 without setting the third
parameter, you'll corrupt your data.

http://uk2.php.net/htmlentities
I then SQL this into my table and next I display the value

e.g. <DIV CLASS="content">'.$newvalue.'</DIV>

All of this works fine, BUT, funny characters that may have been entered through the form (e.g. Word-Style quotation marks,
e-accent-grave, etc..)


These characters all exist in UTF-8 - as does almost every character.
are taking on a whole new life. I put in an e with an accent and it changed into a chinese character.


Can you give a short self-contained example demonstrating it?
I tried to run

$link = mysql_connect($host, $username, $password);
$charset = mysql_character_set_name($link);
printf ("character set is %s\n", $charset);

but that only gave me an error.


According to the manul there's no such function. There's
mysqli_character_set_name, from the new PHP5 mysqli extension - but not in the
old mysql extension.

--
Andy Hassall / <an**@andyh.co.uk> / <http://www.andyh.co.uk>

Hi Andy and NC

I since discovered that a lot of functions arenot supported by my provider. Namely, UTF-8 is not supported in MySQL and PHP does
not support, for example, the conversion functions that you mention above.

I think I will have to stick with a pretty plane type of characterset and make the Japanese pages by hand.

Thank you for your helpful answers.

- Nicolaas
Jul 17 '05 #5

"WindAndWaves" <ac****@ngaru.com> wrote in message
news:OC*********************@news.xtra.co.nz...

"Andy Hassall" <an**@andyh.co.uk> wrote in message
news:3m********************************@4ax.com...
On Tue, 1 Feb 2005 21:02:15 +1300, "WindAndWaves" <ac****@ngaru.com>
wrote:
>Here I am writing my first php / mysql site, almost ready, and now
>this... charactersets....
>
>The encoding that I use on my webpage is:
>
><META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">
Send a proper character set header; using <meta> for content type and
encodings is generally for situations where HTTP headers don't exist,
e.g.
reading off a filesystem.

header("Content-type: text/html; charset=utf-8");

http://uk2.php.net/header
>
>When people enter new data I use
>
>$newvalue = htmlentities($_POST["newvalue"], ENT_QUOTES)


That defaults to ISO-8859-1; if you pass it UTF-8 without setting the
third
parameter, you'll corrupt your data.

http://uk2.php.net/htmlentities
>I then SQL this into my table and next I display the value
>
>e.g. <DIV CLASS="content">'.$newvalue.'</DIV>
>
>All of this works fine, BUT, funny characters that may have been entered
>through the form (e.g. Word-Style quotation marks,
>e-accent-grave, etc..)


These characters all exist in UTF-8 - as does almost every character.
>are taking on a whole new life. I put in an e with an accent and it
>changed into a chinese character.


Can you give a short self-contained example demonstrating it?
>I tried to run
>
>$link = mysql_connect($host, $username, $password);
>$charset = mysql_character_set_name($link);
>printf ("character set is %s\n", $charset);
>
>but that only gave me an error.


According to the manul there's no such function. There's
mysqli_character_set_name, from the new PHP5 mysqli extension - but not
in the
old mysql extension.

--
Andy Hassall / <an**@andyh.co.uk> / <http://www.andyh.co.uk>

Hi Andy and NC

I since discovered that a lot of functions arenot supported by my
provider. Namely, UTF-8 is not supported in MySQL


Wrong. MySQL 4.1 supports various character sets. Take a look at
http://dev.mysql.com/doc/mysql/en/charset.html
and PHP does
not support, for example, the conversion functions that you mention above.
Wrong again. Take a look at the multi-byte string conversion functions at
http://www.php.net/manual/en/ref.mbstring.php
--
Tony Marston

http://www.tonymarston.net
I think I will have to stick with a pretty plane type of characterset and
make the Japanese pages by hand.

Thank you for your helpful answers.

- Nicolaas

Jul 17 '05 #6

"Tony Marston" <to**@NOSPAM.demon.co.uk> wrote in message
[...........]
Wrong. MySQL 4.1 supports various character sets. Take a look at
http://dev.mysql.com/doc/mysql/en/charset.html


This is what my ISP dude said:
My understanding is unicode support is only in version 4.1 and above and we have no servers running

4.1 yet

this may change in the future

all our newver servers are running 4.0x

and PHP does
not support, for example, the conversion functions that you mention above.


Wrong again. Take a look at the multi-byte string conversion functions at
http://www.php.net/manual/en/ref.mbstring.php


and the same sort of thing seems to apply to PHP, although I am running
PHP Version 4.3.4. PHP simply does not recognise the functions.

It seems like I need to have my own dedicated server and this will costs...

Thanks for your answer.

- Nicolaas
Jul 17 '05 #7

"WindAndWaves" <ac****@ngaru.com> wrote in message
news:RI*********************@news.xtra.co.nz...
> and PHP does
> not support, for example, the conversion functions that you mention
> above.
Wrong again. Take a look at the multi-byte string conversion functions at
http://www.php.net/manual/en/ref.mbstring.php


and the same sort of thing seems to apply to PHP, although I am running
PHP Version 4.3.4. PHP simply does not recognise the functions.


This is an optional extension, so it must be explictly enabled when your
version of PHP is built. A proper ISP would be prepared to configure in this
option for you.
Also, a proper ISP would be running PHP 4.3.10, not 4.3.4

--
Tony Marston

http://www.tonymarston.net
It seems like I need to have my own dedicated server and this will
costs...

Thanks for your answer.

- Nicolaas

Jul 17 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Thom McGrath | last post by:
I have a text area that people should type in (duh) which will later be displayed for other users via HTML. I've taken care of the HTML aspect in a pretty cool way, but I worry about character sets...
0
by: Thiko | last post by:
Hi According to the official mysql manual: http://www.mysql.com/doc/en/Charset-SHOW-CHARSET.html The syntax to show all available character sets is the SHOW CHARACTER SET command. It takes...
19
by: Ian | last post by:
I'm using the following meta tag with my documents: <meta http-equiv="Content-Type" content= "text/html; charset=us-ascii" /> and yet using character entities like &rsquo; and &mdash; It...
5
by: PEK | last post by:
I need some code that convert a multi-byte string to a Unicode string, and Unicode to multi-byte. I work mostly in Windows and know how to solve it there, but I would like to have some platform...
4
by: siliconmike | last post by:
All I know is that there are 8 bit numbers from 0 to 255 mapped to characters like A, B, C, D and some strange looking ones (like the ones used to make boxes in old PC text modes) all these being...
1
by: Vishal | last post by:
Hello! My client has a need to be able to store Japanese characters in their PeopleSoft database. So we need to change the character set from from Latin1_General (1252) to Japanese character set...
37
by: chandy | last post by:
Hi, I have an Html document that declares that it uses the utf-8 character set. As this document is editable via a web interface I need to make sure than high-ascii characters that may be...
21
by: aegis | last post by:
7.4#1 states The header <ctype.h> declares several functions useful for classifying and mapping characters.166) In all cases the argument is an int, the value of which shall be representable as an...
9
by: jraul | last post by:
1) Am I correct that C++ does not have a defined character set? In particular, a platform might not use the ASCII character set? 2) C++ supports wchar_t types. But again, this has no defined...
0
by: peridian | last post by:
Hi, Can I do a bulk "find and replace" on data in SQL Server to convert character set data? I have data coming in from multiple external sources. That data is not always in UTF-8 or ASCII...
3
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 3 Jan 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). For other local times, please check World Time Buddy In...
0
by: jianzs | last post by:
Introduction Cloud-native applications are conventionally identified as those designed and nurtured on cloud infrastructure. Such applications, rooted in cloud technologies, skillfully benefit from...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM). In this month's session, the creator of the excellent VBE...
0
by: fareedcanada | last post by:
Hello I am trying to split number on their count. suppose i have 121314151617 (12cnt) then number should be split like 12,13,14,15,16,17 and if 11314151617 (11cnt) then should be split like...
0
by: stefan129 | last post by:
Hey forum members, I'm exploring options for SSL certificates for multiple domains. Has anyone had experience with multi-domain SSL certificates? Any recommendations on reliable providers or specific...
1
by: davi5007 | last post by:
Hi, Basically, I am trying to automate a field named TraceabilityNo into a web page from an access form. I've got the serial held in the variable strSearchString. How can I get this into the...
0
by: MeoLessi9 | last post by:
I have VirtualBox installed on Windows 11 and now I would like to install Kali on a virtual machine. However, on the official website, I see two options: "Installer images" and "Virtual machines"....
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.