473,804 Members | 3,277 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

What is mb_internal_enc oding() excactly?


Hi,

[Exuse me for a rather lengthy post. I try to explain as well as I can
what I do understand on multibyte encoding and what not.]

Background: I am working on a multilanguage project now, so I decided to
switch to UTF-8 completely to avoid troubles with unicode character.

I hope somebody can review my approach and comment on it.
I am working on:
Server: Apache/2.2.3 (Debian) PHP/5.2.0-8+etch11
I am testing on FF2/FF3/IE7.
What I did so far:
Please interupt anything that is wrong/vague/stupid. ;-)

1) Every page contains this header:
Content-Type: text/html; charset=UTF-8
and has the following doctype:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
(All HTML is checked against W3C validator, so far so good.)

2) My Database (Postgres8.1) is created using UTF-8 encoding.
(As I didn't overrule anything for any table or column, all my text-like
fields use UTF-8)

3) I do NOT specify any character encoding in a META-tag.
(Ill-advised by W3C, they say the header takes precedence over
META-tags, and using the META tag may confuse some clients)

4) Whenever I need strlen($aString ) or something similar, I use the
multibytevarian t mb_strlen($aStr ing,'UTF-8').

5) When I need to display a random string (from the database for
example), I use:
htmlspecialchar s($someStrFromD B,ENT_QUOTES,'U TF-8');
If I must put a value in a text-element or textarea in a form, I use the
same.

6) I use ADODB5 as database abstractionlaye r. It has a build-in
qstr-method that makes the passed string safe for use in SQL.

7) I get my multibyte characters from here for testing:
http://freenet-homepage.de/prilop/multilingual-1.html

So far, so good (as far as I can tell).
php.net says the following for mb_strlen:
int mb_strlen ( string $str [, string $encoding ] )
Parameters
str: The string being checked for length.
encoding : The encoding parameter is the character encoding. If it is
omitted, the internal character encoding value will be used.
--I do not understand what this 'internal character encoding value' is.

The page points to: mb_internal_enc oding()
Which reads:
Set/Get the internal character encoding

Return Values: If encoding is set, then Returns TRUE on success or FALSE
on failure. If encoding is omitted, then the current character encoding
name is returned.
If I echo mb_internal_enc oding() it says: ISO-8859-1
I wonder where PHP did get that value from.

I tried saving my PHP file in UTF-8, but it stays on ISO-8859-1.

My main questions are:
1) What is this mb_internal_enc oding excactly?
It that something set during compilation?
Should I overwite it to UTF-8, or is using the extra parameter in all
mb_* functions good enough (and set it to UTF-8)?

2) Should I put in all my forms accept-charset="UTF-8" or is that set
implicity by my header (which always contain: Content-Type: text/html;
charset=UTF-8)?

3) Is it wise to safe all my PHP files in UTF-8?

I hope somebody can enlighten me a little on these issues. :-)
Thanks for your time!

Regards,
Erwin Moller
--
=============== =============
Erwin Moller
Now dropping all postings from googlegroups.
Why? http://improve-usenet.org/
=============== =============
Sep 17 '08 #1
4 6985
Erwin Moller wrote:
>
Hi,

[Exuse me for a rather lengthy post. I try to explain as well as I can
what I do understand on multibyte encoding and what not.]

Background: I am working on a multilanguage project now, so I decided to
switch to UTF-8 completely to avoid troubles with unicode character.

I hope somebody can review my approach and comment on it.
I am working on:
Server: Apache/2.2.3 (Debian) PHP/5.2.0-8+etch11
I am testing on FF2/FF3/IE7.
What I did so far:
Please interupt anything that is wrong/vague/stupid. ;-)

1) Every page contains this header:
Content-Type: text/html; charset=UTF-8
and has the following doctype:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
(All HTML is checked against W3C validator, so far so good.)

2) My Database (Postgres8.1) is created using UTF-8 encoding.
(As I didn't overrule anything for any table or column, all my text-like
fields use UTF-8)

3) I do NOT specify any character encoding in a META-tag.
(Ill-advised by W3C, they say the header takes precedence over
META-tags, and using the META tag may confuse some clients)

4) Whenever I need strlen($aString ) or something similar, I use the
multibytevarian t mb_strlen($aStr ing,'UTF-8').

5) When I need to display a random string (from the database for
example), I use:
htmlspecialchar s($someStrFromD B,ENT_QUOTES,'U TF-8');
If I must put a value in a text-element or textarea in a form, I use the
same.

6) I use ADODB5 as database abstractionlaye r. It has a build-in
qstr-method that makes the passed string safe for use in SQL.

7) I get my multibyte characters from here for testing:
http://freenet-homepage.de/prilop/multilingual-1.html

So far, so good (as far as I can tell).
php.net says the following for mb_strlen:
int mb_strlen ( string $str [, string $encoding ] )
Parameters
str: The string being checked for length.
encoding : The encoding parameter is the character encoding. If it is
omitted, the internal character encoding value will be used.

--I do not understand what this 'internal character encoding value' is.

The page points to: mb_internal_enc oding()
Which reads:
Set/Get the internal character encoding

Return Values: If encoding is set, then Returns TRUE on success or FALSE
on failure. If encoding is omitted, then the current character encoding
name is returned.

If I echo mb_internal_enc oding() it says: ISO-8859-1
I wonder where PHP did get that value from.

I tried saving my PHP file in UTF-8, but it stays on ISO-8859-1.

My main questions are:
1) What is this mb_internal_enc oding excactly?
It that something set during compilation?
Should I overwite it to UTF-8, or is using the extra parameter in all
mb_* functions good enough (and set it to UTF-8)?

2) Should I put in all my forms accept-charset="UTF-8" or is that set
implicity by my header (which always contain: Content-Type: text/html;
charset=UTF-8)?

3) Is it wise to safe all my PHP files in UTF-8?

I hope somebody can enlighten me a little on these issues. :-)
Thanks for your time!

Regards,
Erwin Moller

I was also investigating this the other day. As for your concern of
where PHP gets the internal coding setting, it comes from the
[mbstring] portion of the php.ini config. If the directives are
commented out, it seems to default to ISO-8859-1.

Other than that, I'm just as curious as you. :-)

--
Curtis
Sep 17 '08 #2
AqD
On Sep 17, 5:58*pm, Erwin Moller
<Since_humans_r ead_this_I_am_s pammed_too_m... @spamyourself.c omwrote:
Hi,

[Exuse me for a rather lengthy post. I try to explain as well as I can
what I do understand on multibyte encoding and what not.]

Background: I am working on a multilanguage project now, so I decided to
switch to UTF-8 completely to avoid troubles with unicode character.

I hope somebody can review my approach and comment on it.
I am working on:
Server: Apache/2.2.3 (Debian) PHP/5.2.0-8+etch11
I am testing on FF2/FF3/IE7.

What I did so far:
Please interupt anything that is wrong/vague/stupid. ;-)

1) Every page contains this header:
Content-Type: text/html; charset=UTF-8
and has the following doctype:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
(All HTML is checked against W3C validator, so far so good.)
Yes
>
2) My Database (Postgres8.1) is created using UTF-8 encoding.
(As I didn't overrule anything for any table or column, all my text-like
fields use UTF-8)
If you're using mysql, be careful that you have to set your client
encoding for connection. If you don't (a lot of 'unicode' projects
don't do that), it would treat your utf-8 sql statements as latin1 and
convert them wrongly inside the db.

To set the encoding, you need to call functions such as
mysqli_set_char set. It also affects the string escape method.
>
3) I do NOT specify any character encoding in a META-tag.
(Ill-advised by W3C, they say the header takes precedence over
META-tags, and using the META tag may confuse some clients)
some clients like IE4? ;) Basically all websites here (mis-)use the
meta tag for charset instead of setting the header. As long as the
encoding is latin1-compatible (like utf8), it should be fine.

I stopped listening to their advices or reading their references for a
long time. If you want something to work, it's better to test it with
real implementations (i.e. the browsers).
>
4) Whenever I need strlen($aString ) or something similar, I use the
multibytevarian t mb_strlen($aStr ing,'UTF-8').
Same for sub-string and any other operations on string characters. But
there are performance issues and I hope you'll not run into them ;)
>
5) When I need to display a random string (from the database for
example), I use:
htmlspecialchar s($someStrFromD B,ENT_QUOTES,'U TF-8');
If I must put a value in a text-element or textarea in a form, I use the
same.
yes
>
6) I use ADODB5 as database abstractionlaye r. It has a build-in
qstr-method that makes the passed string safe for use in SQL.
safe only for the correct encoding. You need to set the encoding like
I wrote above. If ADODB doesn't provide the method to change encoding,
you can do a query "SET NAMES utf8" after connecting - I'm not sure
how this works with the escape function though.
>
7) I get my multibyte characters from here for testing:http://freenet-homepage.de/prilop/multilingual-1.html

So far, so good (as far as I can tell).

php.net says the following for mb_strlen:
int mb_strlen *( string $str *[, string $encoding *] )
Parameters
str: The string being checked for length.
encoding : The encoding parameter is the character encoding. If it is
omitted, the internal character encoding value will be used.

--I do not understand what this 'internal character encoding value' is.

The page points to: mb_internal_enc oding()
Which reads:
Set/Get the internal character encoding
It's the default encoding for certain mbstring functiosn. Not
"internal". The mbstring extension (except for some regex functions)
can be used to deal with strings of more than encodings at the same
once.
>
Return Values: If encoding is set, then Returns TRUE on success or FALSE
on failure. If encoding is omitted, then the current character encoding
name is returned.

If I echo mb_internal_enc oding() it says: ISO-8859-1
I wonder where PHP did get that value from.

I tried saving my PHP file in UTF-8, but it stays on ISO-8859-1.

My main questions are:
1) What is this mb_internal_enc oding excactly?
It that something set during compilation?
Should I overwite it to UTF-8, or is using the extra parameter in all
mb_* functions good enough (and set it to UTF-8)?
php.ini

You can also set it in the beginning of code. Don't use the extra
parameter unless you want to deal other encodings - as I said some
regex fuctions don't have it, because they save states between
different calls and the encoding cannot change during it.
>
2) Should I put in all my forms *accept-charset="UTF-8" or is that set
implicity by my header (which always contain: Content-Type: text/html;
charset=UTF-8)?
No need.
3) Is it wise to safe all my PHP files in UTF-8?
yes, and do not save with utf-8 signature.
Sep 18 '08 #3
On Sep 18, 2:08*am, AqD <aquila.d...@gm ail.comwrote:
On Sep 17, 5:58*pm, Erwin Moller

3) I do NOT specify any character encoding in a META-tag.
(Ill-advised by W3C, they say the header takes precedence over
META-tags, and using the META tag may confuse some clients)

some clients like IE4? ;) Basically all websites here (mis-)use the
meta tag for charset instead of setting the header. As long as the
encoding is latin1-compatible (like utf8), it should be fine.

I stopped listening to their advices or reading their references for a
long time. If you want something to work, it's better to test it with
real implementations (i.e. the browsers).
I think the meta option is provided because in some environments you
don't have full control of the headers being generated (eg: hosted
solutions). I could be wrong on this.

I don't know why a client would get confused if they got the character
encoding in both the header and a meta tag... perhaps if they were
different?
>
6) I use ADODB5 as database abstractionlaye r. It has a build-in
qstr-method that makes the passed string safe for use in SQL.

safe only for the correct encoding. You need to set the encoding like
I wrote above. If ADODB doesn't provide the method to change encoding,
you can do a query "SET NAMES utf8" after connecting - I'm not sure
how this works with the escape function though.
The mysql_real_esca pe_string takes into account the character encoding
the database is expecting.. not sure about your DBAL though.
[quote]
--I do not understand what this 'internal character encoding value' is.
The page points to: mb_internal_enc oding()
Which reads:
Set/Get the internal character encoding

It's the default encoding for certain mbstring functiosn. Not
"internal". The mbstring extension (except for some regex functions)
can be used to deal with strings of more than encodings at the same
once.

That's what I gathered, 'internal encoding' is a bit misleading, I
tend to think of it more as a 'default' encoding.. many of the mb
functions take in a character encoding as an optional parameter, if
you don't supply it this parameter, it will assume that the encoding
of the input string is the 'internal' (ie: default) one.

HTH

Taras
Sep 19 '08 #4
AqD
On Sep 19, 7:41*pm, Taras_96 <taras...@gmail .comwrote:
On Sep 18, 2:08*am,AqD<aqu ila.d...@gmail. comwrote:
On Sep 17, 5:58*pm, Erwin Moller
3) I do NOT specify any character encoding in a META-tag.
(Ill-advised by W3C, they say the header takes precedence over
META-tags, and using the META tag may confuse some clients)
some clients like IE4? ;) Basically all websites here (mis-)use the
meta tag for charset instead of setting the header. As long as the
encoding is latin1-compatible (like utf8), it should be fine.
I stopped listening to their advices or reading their references for a
long time. If you want something to work, it's better to test it with
real implementations (i.e. the browsers).

I think the meta option is provided because in some environments you
don't have full control of the headers being generated (eg: hosted
solutions). I could be wrong on this.

I don't know why a client would get confused if they got the character
encoding in both the header and a meta tag... perhaps if they were
different?
If it's different, browser should use the encoding from header (I
tested this before). But the meta tag only works with ASCII/iso8859-1
based encodings, not UCS2 or UCS4.
>

6) I use ADODB5 as database abstractionlaye r. It has a build-in
qstr-method that makes the passed string safe for use in SQL.
safe only for the correct encoding. You need to set the encoding like
I wrote above. If ADODB doesn't provide the method to change encoding,
you can do a query "SET NAMES utf8" after connecting - I'm not sure
how this works with the escape function though.

The mysql_real_esca pe_string takes into account the character encoding
the database is expecting.. not sure about your DBAL though.
True but most developers only set the database encoding not connection
encoding, which is assumed to be latin1 by mysql, so they end up
storing data in wrong encoding in database even through the text on
webpages are correct ;) The problem is still very *popular" now - you
can check the code of some open-source projects such as phpbb and
xoops.
Sep 22 '08 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
3892
by: Randell D. | last post by:
I have just recompiled, upgraded to PHP 4.3.4. As an exercise (and curiosity) I've decided to test out PDF functions and got the test in the PHP online manual working. I had one problem whereby the example refered to Times New Roman - I didn't have this, but I did have Times-Roman which worked. My question - How do I know what fonts I *do* have available? I've run phpinfo() and gd_info() which I hope is enough to tell some wise person...
52
6444
by: Tony Marston | last post by:
Several months ago I started a thread with the title "What is/is not considered to be good OO programming" which started a long and interesting discussion. I have condensed the arguments into a single article which can be viewed at http://www.tonymarston.net/php-mysql/good-bad-oop.html I fully expect this to be the start of another flame war, so sharpen your knives and get stuck in!
5
20694
by: Richie | last post by:
What I want is to have a link to a file (it could be a .zip, .exe, .jpg, ..txt or even .html) and when the user clicks on it they are prompted with the Save As box, as opposed to it opening in the browser. I think I need to do something like: $fp = fopen($filetooutput, "rb"); $thefile = fread($fp, filesize($thefiletooutput)); header("Content-type: ???/???"); not sure what to put instead of the ??? echo $thefile;
5
9699
by: vishal | last post by:
hello vishal here. i have seen some scripts which includes file like include('time.inc') so can anyone tell me what this file contain. and what is extension
2
3112
by: thecrow | last post by:
Alright, what the hell is going on here? In the following code, I expect the printed result to be: DEBUG: frank's last name is burns. Instead, what I get is: DEBUG: frank's last name is burns. Here is the code: $frank = "burns";
125
14869
by: Sarah Tanembaum | last post by:
Beside its an opensource and supported by community, what's the fundamental differences between PostgreSQL and those high-price commercial database (and some are bloated such as Oracle) from software giant such as Microsoft SQL Server, Oracle, and Sybase? Is PostgreSQL reliable enough to be used for high-end commercial application? Thanks
1
1563
by: bdawg | last post by:
what i want to do is create several radio buttons and a textbox for searching purposes. the search will perform a search depending on which button the user selects. here is what i have now: - for each radio button, i use the onClick event handler, which calls a function called showMe() - showMe() {
121
10196
by: typingcat | last post by:
First of all, I'm an Asian and I need to input Japanese, Korean and so on. I've tried many PHP IDEs today, but almost non of them supported Unicode (UTF-8) file. I've found that the only Unicode support IDEs are DreamWeaver 8 and Zend PHP Studio. DreamWeaver provides full support for Unicode. However, DreamWeaver is a web editor rather than a PHP IDE. It only supports basic IntelliSense (or code completion) and doesn't have anything...
8
3192
by: Midnight Java Junkie | last post by:
Dear Colleagues: I feel that the dumbest questions are those that are never asked. I have been given the opportunity to get into .NET. Our organization has a subscription with Microsoft that basically entitled to us to just about every .Net development tool you can imagine. I cant even begin to mention them. To begin with, my background is not that of a programmer, but a systems engineer and the closest I have come to "programming"...
0
9587
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10588
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10340
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10324
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10085
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9161
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5527
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
3827
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2998
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.